Analysing Mathematical Word Problem Solving with Secondary Education CLIL Students: A Pilot Study

The purpose of this study is to investigate to what extent the use of L2 in math tests influences bilingual education learners’ process of word problem solving in a mandatory secondary education school with Content and Language Integrated Learning (CLIL). The reading comprehension level of the students was analysed using a standards-based assessment and the questions used in Programme for International Student Assessment (PISA) tests. The word problems were selected according to the students’ level of reading-comprehension and mathematical competence. Leaners also had to answer a questionnaire, which was used to analyse if contextual factors were affecting mathematical performance in L2. To this end, the questionnaire included some questions related to the bilingual history of the students and their perception about solving word problems in English. Data were analysed through one-way or two-way ANOVA tests to find out which factors were relevant. Results show that solving word problems is not only affected by the use of L2, but that it also depends on the mathematical difficulty, irrespective of the students’ level of language proficiency. The findings, hence, imply that interaction between linguistic difficulty and mathematical complexity is at the centre of the issues affecting word problem solving.


Introduction
The interrelation between language and content matter in mathematics has a major significance: "Language-dependent knowledge representations are particularly evident in the domain of mathematics learning" (Grabner, Saalbach, & Eckstein 2012, p. 147). It occurs so because, on one hand, the language of mathematics is particularly problematic (Morgan, 2007) due to its unique register (Halliday, 1978) and, on the other hand, because mathematical processing demands a great amount of procedural knowledge essentially linked with language (Hiebert & Lefevert, 1987).
Particularly regarding the topic of this study, Swetz (2012) points out that word problems are frequently used to teach mathematics because they have been used as the primary means of instruction for thousands of years. Also, according to Bernardo (2002), word problems have always been an important part of mathematics education, in which the linguistic component is a fundamental part because problems are embedded within a text and difficulties are related to terminology: "formal mathematical language is characterized by lack of redundancy and refers to the standard use of terminology (mathematical register)" (Novotná, Hadj-Moussová, & Hofmannová, 2005, p. 3). Providing an adequate assessment of mathematical performance in an immersion context of bilingual education is the key, as it is likely to give a misleading impression of the students' academic abilities. Comprehension of content through L2 has been investigated with primary education learners in bilingual settings, including CLIL, particularly with respect to its significance when assessing mathematics skills (Abedi & Lord, 2001;Cummins, Kintsch, Reuser, and Weimer, 1988;Jiménez-Jiiménez, 2015;Kempert, Saalbach, & Hardy, 2001;Moschkovich, 2005Moschkovich, , 2007Moschkovich, , 2015Ouazizi, 2016;Smit & Van Eerde 2011). Besides, investigating the effect of language proficiency in the programmes where mathematics is conducted in English (as the L2) at other levels of education is increasingly frequent (Adanur, Yagiz, & Isik, 2004 The context where this investigation takes place -the bilingual education programme in the Spanish Region of Andalusia -leads to an interweaving language with content as proposed in CLIL. Although previous research indicates that there could be an impact on mathematical performance when word problems are presented in L2, this particular area has not yet been studied in secondary education in the Andalusian context. After more than a decade of bilingualism in the region, it is important to know whether or not the proficiency level in the L2 in mathematics influences assessment of content matter. Specifically, the main objective is to weigh the importance of L2 in word problem solving and to find the other variables at work in the students' performance in mathematics. During the next sections, the theoretical foundations for this study will be described, namely the interactions between language and content (in L1 and L2) and interaction of L2, word problem solving processes, impact of reading comprehension proficiency level, and the study of socio economic factors and their relation to academic achievement. Analysis of the data will help obtain several useful organizational and pedagogical conclusions with respect to assessment of students' performance in mathematics.

Interaction between language and content in word problems
One of the most difficult assignments that a bilingual education teacher has to face is the balance between the linguistic and the cognitive demands of students' tasks: "the goal of professional development for content area teachers should be to frame their planning in terms of the question: What is the language my students need to succeed in this task?" (Hansen-Thomas, Langman, & Farias, 2018, p. 211). As Cummins (2000) states, language and content will be acquired most successfully when students are challenged cognitively and provided with the contextual and linguistic scaffolding required for successful task comple- Mathematics, as every other academic discipline, has a complex and specialized language that is different from everyday conversation (Sigley & Wilkinson 2015). Therefore, proficiency in language may affect mathematical achievement on instructional and testing levels (Kempert, Saalbach, & Hardy 2011). The CLIL Matrix (Coyle, Hood, & Marsh, 2010) explains the relationship between language and content, as observed in Figure 1. Ideally, the focus should be directed to quadrant 2 because the language is not going to impede learning. Nevertheless, moving periodically to quadrant 3 will lead to a progression in language learning without affecting the cognitive challenge of the learner. Quadrant 4 aims at fostering the linguistic potential of the students, whilst the specific work included in quadrant 1 could help build initial confidence in learners (Coyle et al., 2010). Word problems could be included in quadrant 3, and this should be a goal to achieve in the linguistic performance of our learners. It may happen when the linguistic demands are too high for a particular group of students. Adaptations can be made in order to assess their mathematical skills, hence moving to quadrant 2. Otherwise, if the task is in quadrant 2, there is risk that the linguistic proficiency of the students is not properly developed.
According to Abedi (2002), the linguistic complexity of test items unrelated to the content being assessed may at least be partly responsible for the performance gap between non-native speakers and native speakers in an immersion context. Problematic features appear at three levels: syntactic (involving complex sentences, multiple subordinate clauses, nested constructions, long noun phrases, and passive voice), lexical (concerning unfamiliar words, unfamiliar phrases, and unfamiliar connotations of words with multiple meanings), and background (focusing on knowledge in word problems, sentence and paragraph level, and word phrases) (Moschkovich, 2015).
The link between language and mathematics is especially evident in the case of word problems. It is known that children's performance on arithmetic word problems is a reliable predictor of their subsequent mathematical competence (Kempert, Saalbach, & Hardy 2011).
In fact, a highly technical, precise and densely structured language is required in the mathematics register (Sigley & Wilkinson, 2015), where solving arithmetic problems is a cognitive task that relies on language processing (Van Rinsveld, Brunner, Landerl, Schiltz, & Ugen, 2015). Thus, the learning of mathematics is more strongly related to language processes than previously assumed (Kempert, Saalbach, & Hardy, 2011;Surmont et al., 2016). According to Bialystok (2001), as language and mathematics share common critical features, such as abstract mental representation, conventional notations, and interpretive function, mathematics is a domain where cognitive effects on bilinguals are likely to occur.
Language proficiency affects mathematical achievement at the levels of instruction and testing (Kempert, Saalbach, & Hardy 2011).
Hence, both oral and written language are central to mathematical teaching and learning, which rely on the discourse structures of description, sequencing, procedural iteration, and justifications (Sigley & Wilkinson, 2015). Furthermore, several studies state that the field of 24 Analysing Mathematical Word Problem Solving with Secondary Education CLIL Students: A Pilot Study UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES mathematics reasoning is consistent with threshold hypothesis (Cummins, 2000). Even when students may not be in full command of the mathematics register until they understand math (Sigley & Wilkinson 2015), presenting mathematic problems with simplified linguistic instructions (Abedi & Lord 2001) seemed to help them overcome their linguistic complexity (Van Rinsveld et al. 2016). Following Bialystok (2001), a generous interpretation of studies is that, if language proficiency is at least adequate for understanding the problem, bilingualism has no effect on mathematical problem solving.

Language proficiency and mathematical complexity
The impact of a high linguistic proficiency on mathematics has been generally reported as positive: "bilingual pupils have an advantage in mathematics when they are highly competent in both languages, compared to their monolingual peers" (Surmont et al., 2016, p. 322). From a different perspective, Abedi and Lord (2001) also show that there is a real interaction between language and mathematics achievement.
Their research included a math test with the items that are simplified in terms of language, and with modified vocabulary and linguistic structures not related to mathematics, which are compared to the original ones. They claim that this interplay has to be considered in mathematics assessment research and practice, also that language adjustments must be considered carefully. To this end, Lorenzo (2008) establishes three different categories that teachers in bilingual education contexts use: simplify the text (as it could make the test almost meaningless), elaborate it (as an attempt to reduce complexity), or re-discursify (making the outcome a student-centred text with coherent remaining ideas and enough complexity to challenge the student learning processes).
Whereas there are specific grammatical patterns to the mathematics register that include dense noun phrases, subordinators, nominalizations, logical connectors and verbs employed in arguments, justifications and constructions of mathematical ideas (Sigley & Wilkinson, 2015), it is suggested that simplifying the linguistic structure of word problems presented in L2 increases the mathematical performance of learners in L2 (Abedi & Lord, 2001). Other comprehension problems at surface level (i.e., an unknown verb or noun) may result in an extra load of memory resources that are therefore not available for arithmetic calculations (Kempert, Saalbach, & Hardy, 2011).
Given the difficulties, a cognitive modelling process is required to solve mathematics word problems, in which students identify and extract relevant pieces of information and, at the same time, suppress any misleading or irrelevant linguistic or numerical information (Kempert, Saalbach, & Hardy, 2011) that is embedded in the problem context.
According to a prominent model of arithmetic word problem solving, the whole process takes place within three steps: forming a situational model by structuring its relevant features, extracting a mathematical problem model (translation from linguistic code to mathematical relations), and, finally, do calculations, interpret, and validate results (Kempert, Saalbach, & Hardy 2011).
As shown above, variations in the linguistic aspect of word problems affect problem representation and problem-solving performance.
A focus on the problem structure of word problems when the text is presented leads to significant improvements in problem-solving accuracy (Bernardo, 2002). The author found that Filipino-English bilingual students solved word problems when they were written in their first language (Filipino) because they understood the text. He also finds that rewording the problem texts resulted in smaller gains in accuracy with the problems worded in English compared with the problems in Filipino. In other words, bilingual education students who have to solve word problems in a foreign language may have more difficulties understanding them. Kempert, Saalbach, and Hardy (2011) have also found that proficiency in the language of testing had the strongest influence on the students' mathematical achievement in both L1 and L2. At the same time, bilingual education students also partly compensated their language deficits by displaying an enhanced ability in attentional control.
According to Abedi and Lord (2001), the following features can be revised in order to adapt the linguistic register of mathematical word problems to the linguistic level of students: However, as Lorenzo (2008) points out, linguistic adaptations must be taken with care, as undesired outcomes may appear in simplification or elaboration processes. Kempert, Saalbach, and Hardy (2011) state that proficiency in the instructional language and arithmetic skills can strongly predict students' ability to solve mathematical word problems. In order to classify students by linguistic proficiency, reading comprehension could be used as a suitable instrument (Schleicher, Zimmer, Evans, & Clements, 2009). For this purpose, a standards-based assessment process becomes a fundamental tool to observe if differences among students can be set while a questionnaire can be passed to determine participants' language histories and to obtain an accurate picture of their degree of bilingualism (Jiménez-Jiménez, 2015).

Method
This study can be classified as ex post facto because the independent variables -e.g., the language in which word problems are presented or the reading comprehension according to the Common European Framework of Reference for Languages (CEFRL) level -are not manipulated. A part of the study is descriptive with the use of the context questionnaire, which includes information that must also be considered as an empirical variable. Research question 1 is answered by studying the effect of one independent variable so they can be considered as prospective with one independent, simple variable. Research question 2 analyses the effect of two independent variables over the dependent variable, so it can be considered as prospective with more than one independent, factorial variable.  Source: Own elaboration.

Objectives
The main purpose of this study is to investigate if using the foreign language influences the word problem solving process when assessing the mathematical content. Following the aims of this paper, two research questions have been elaborated as follows: 1. Does the L2 level of students influence mathematical word problem solving?
2. How does mathematical complexity interact with language in the word problem solving process?

Context and participants
The study was conducted at a high school in the city of Córdoba (Andalusia, Spain) where bilingual education follows a CLIL approach. Two groups of bilingual education learners, with a total of 53 students con- Originally, the two groups were composed of 27 and 28 students, respectively, for a total of 55. Eventually, two students were not included in the investigation because they had failed several subjects, including Mathematics and English as a Foreign Language. The mean age of the students on the day the PISA test was conducted was 15.55 years old.
As for the general context concerning the bilingual education programme in Andalusia in secondary education, high schools with bilingual education must have at least two non-language subjects (which must cover a minimum of 30% of the schedule per week -9 out of 30 hours -) in a foreign language. At least 50% of the content in those subjects must be instructed in the L2. When assessing content sub-

Data collection procedure
The research was carried out during the academic year of 2016-2017.
A first data-collecting calendar was established with the criterion of making the administration of each type of paper within the same day possible. Another criterion was the lapse of time, trying to avoid differences in their English or mathematics proficiency level or in their perception of using English as a medium to learn mathematics. The students, parents and managing staff were informed about the study prior to the data collection process, and students signed their consent to participate. The students were also informed that they could voluntarily quit the study, and that they did not receive any rewards (financial or of any other type) for participating. The three instruments used to collect the data were a reading comprehension level test, a questionnaire, and two versions of the same word problem test.

Reading comprehension test
Participants took a standards-based assessment B1 test from the book A brief explanation of the questionnaire was given during the first 10 minutes of the class. Students were told about the importance of the questions, that gathering individual perceptions and experiences was relevant, that there were no right or wrong answers, and that the utmost sincerity was needed. They had 50 minutes to complete it.

PISA word problem test
The mathematical word problem test was made using selected PI-SA-released mathematics items (INEE, 2013;OECD, 2006OECD, , 2013. Questions were available in both English and Spanish, and they were also balanced in terms of general difficulty. This level of difficulty was previously evaluated based on the percentage of right answers for the word problems in the OECD and Spain (INEE, 2013). Three word problems were pre-selected from each content area: arithmetic and algebra, geometry and functions, and graphics. To fit the extension of the test to the time available, one word problem from each content area was discarded, for a total of 6 word problems and 11 questions.
To avoid the possible bias caused by the difficulty of the language or the order in which the word problems were presented to the students, questions were arranged in two booklets (paper A and paper B), both with six word problems, three in Spanish and three in English. The two booklets had the same word problems, but in different languages it. All word problems in English had a glossary adapted to the B1 reading comprehension level of most of the students. The most important thing was to evaluate mathematical performance. Questions on the vocabulary were allowed, but not about any mathematical content or processes although a very few were made.

Results
In this section of the paper, we will try to address the research questions using the statistical techniques detailed in the previous section.
All data were analysed using the IBM © SPSS © Statistics 24 software.
Research question 1: Does the L2 level of students influence solving mathematical word problems?
To answer the first research question, a one-way ANOVA was performed with language as the independent variable and the total score per language as the dependent variable. Each student has, therefore, two scores-one for the word problems written in English and the other one for the word problems written in Spanish. As shown in Figure 2, descriptive data indicate that the mean for the word problems written in English (M = 5.000; SD = 3.039) is lower than the mean for the word problems in Spanish (M= 6.440; SD = 2.737).
The ANOVA test was conducted in order to assess whether or not the difference was significant. Homogeneity of variances can be assumed with a p-value = 0.234 for the Levene test. The test revealed that the difference of the scores obtained in solving the word problems in English or Spanish was statistically significant (F = 6.569 > F1,105,0.05 = 3.932). The p-value = 0.012 confirmed that the scores in English were significantly lower than the scores in Spanish at 0.05 level of significance.
The importance of this finding is that the language affects the assessment of mathematical proficiency. This has to be taken into consideration when evaluating word problem solving processes in bilingual education contexts.

UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES
Two one-way ANOVA tests were also completed with the CEFR reading comprehension level as the independent variable, and the total score of the word problems written in English as the dependent value. As seen in Figures 3 and 4, descriptive data shows that the mean for the word problems written in English was similar for each CEFR level (Table 2).   Research question 2: How does mathematical complexity interact with language in the word problem solving process?

Interaction
It must be noted that the discussion on mathematical complexity should include the cognitive language processes that allow them to learn what was previously unknown. For this reason, emphasising the interaction between content and language by means of maximising communication becomes a truly decisive factor. The first analysis performed was a two-way ANOVA test with language and math difficulty as factors and the score obtained in each case as the dependent value.
Questions were classified into high, medium and low mathematical difficulty according to the percentage of students with the maximum score and the average score. Then, the word problems to be included were selected. The idea was to classify all the questions for each word problem according to cognitive difficulty, and that the maximum number of word problems could be arranged per language at a particular level (see Table 2). Word problem 6 was discarded because it had one low and one medium difficulty question, whereas word problem 1 was not selected because the rest of the low difficulty word problems (2 and 3) were written in a different language. Hence, the final selection included word problems 2 and 3 for low mathematical difficulty and word problems 4 and 5 for high mathematical difficulty.
Descriptive data (see Figure 5) coincide in the fact that the score was higher in Spanish than in English, although the difference between the two languages seems to be greater for word problems with the highest mathematical difficulty. which was confirmed by the p-value = 0.004. The test also indicates that interaction of the two factors, difficulty and language, was not statistically significant (F = 0.441 > F1,105,0.05 = 3.932), which was confirmed by the p-value = 0.508. Analysis of the partial eta squared showed that the difference was mostly explained by the mathematical difficulty, as language was responsible for 7.9% of the differences and interaction for 0.4%.

Separate analysis
Given the fact that the mathematical difficulty was the main factor affecting the differences, a separate one-way ANOVA was performed for high and low difficulty levels. Language was the independent variable and the score was the dependent variable. Again, descriptive data (see Figure 6) shows that the difference was greater for word problems with high mathematical difficulty than for problems with low  The ANOVA test for high difficulty word problems shows that homogeneity of variances could be assumed with a p-value = 0.127 for the Levene test. ANOVA informed that the difference of the scores obtained in solving high-difficulty word problems in English or Spanish was statistically significant (F = 7.494 > F1,52,0.05 = 4.027). P-value = 0.009 confirmed that the scores in English were significantly lower than those in Spanish at 0.05 significance level.
On the other hand, the ANOVA test for low-difficulty word problems indicated that homogeneity of variances could not be assumed with a p-value = 0.012 for the Levene test. ANOVA pointed out that the difference of the scores obtained in solving low-difficulty word problems in English or Spanish was not statistically significant (F = 2.338 < F1,52,0.05 = 4.027). P-value = 0.132 confirmed that the scores in English and Spanish did not differ significantly at 0.05 significance level.
Although there was no significant interaction of language and mathematical difficulty, the separate analysis revealed that, when the mathematical difficulty of the word problems was low, there was no significant difference in the scores. However, when the difficulty was high, the difference was significant. This must be taken into consideration when assessing word problem solving processes in bilingual education contexts.
Although the CEFR reading comprehension level did not seem to affect the word problem solving processes, this could be due to the fact that only five students were classified as being in level A2. According to this finding, the mathematical register of the word problems may not be necessarily adapted to the linguistic level of the students.

Discussion
Several main ideas can be drawn from the analysis of the data. Two of them have clear pedagogical implications in the assessment of mathematical performance regarding word problem solving processes. The first one is that language affects evaluation, as the word problems written in English have significantly lower scores than those in Spanish. This is consistent with the students' perception that solving word problems in English is more difficult than doing the same in Spanish.

UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES
It also concurs with other findings that language proficiency affects mathematical achievement in testing (Bernardo, 1999;Bernardo, 2002;Kempert, Saalbach, & Hardy, 2011), but not with other studies that declare that, if language proficiency is adequate for understanding the problem, bilingualism has no effect on mathematical problem-solving (Bialystok, 2001;Surmont et al., 2016). The reason for this discrepancy is clear in the researched context, in which a significant number of students exhibit a non-ideal command of the English language, and thus the influence of the language on word solving problems is higher than in contexts where the language is comprehensible. The second one is that this difference appears in word problems with high mathematical difficulty, but not when the difficulty is low. It can be concluded that one possibility to help the assessment of mathematical proficiency using word problem solving processes in bilingual education contexts is to choose word problems with low mathematical difficulty to be written in the L2 (quadrant 4 in the CLIL matrix), and to complement them with the other word problems written in the students' L1. Use of word problems in both languages in the same paper could be recommended at the beginning of bilingual education, expecting responses in the learners' L1 as well until students are cognitively not discouraged through language and complexity (Lyster, 2007).
According to the interaction between language and word solving problems, students also reported that word problem solving processes require more time in L2 than in L1, and the unknown non-mathematical vocabulary is problematic. Studies state that simplified linguistic instructions help students overcome the linguistic complexity of the word problems (Abedi & Lord, 2001;Van Rinsveld et al., 2016), and known vocabulary unloads memory resources that are needed when comprehension difficulties occur at surface level, which can lead to the unavailability for carrying out arithmetic calculations (Kempert, Saalbach, & Hardy, 2011 Finally, no contextual factors-such as family income, parental education, and parental occupation-have been found (as described in Caro & Cortés, 2012;Ensminger, Forrest, Riley, Kang, Green, Starfield, & Ryan, 2000;Hattie, 2009;Rutkowski & Rutkowski, 2013;Sirin, 2005;White, 1982;Willms & Tramonte, 2015) to influence students' performance when solving the word problems written in English. This could be due to the fact that the group is quite homogeneous. Another reason that can explain the moderate influence of contextual factors could be the decrease of their influence as the students grow up. Thus, in line with the findings (Jäppinen, 2005), home environment or the socio-economic statuses of parents are clearly influential during the first stages of education, but decline along time when they are covered and homogenised by other variables (Lancaster, 2018).
One of the implications of this study has to do with the levels of assessment and to the pedagogical dimension and curricular aspects which associated to the assessment procedure. Any evaluation test contains questions with different degrees of complexity, but according to the findings in research questions 1 and 2, the judgement of the mathematical skills of bilingual education learners should be based on tests where the word problems presented in L2 are chosen among those with low mathematical difficulty. The complex word problems written in L1 can also be used to completely estimate mathematical competence level. If a math teacher wants to translate questions from a test in L1 into L2, all these factors must be considered in order to diminish any possible mismatch of the mathematical performance due to linguistic difficulties: a) choosing the easiest word problems to be translated into L2; b) eliminating some questions to provide extra time; and c) including a glossary (also suggested by Dalrymple, Karagiannakis, & Papadopoulos, 2012, as well as by Tavares, 2015). However, it can be claimed that some of these suggestions may be unfair for peers in monolingual education (a) or that they could not actually support the UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES use of non-mathematical vocabulary (c). In any case, the data analysed suggest, in line with the findings of Ouazizi (2016) and Surmont et al. (2016), that the content matter and the language benefit reciprocally as they help activate underlying brain mechanisms and foster metalinguistic awareness: "increased metalinguistic awareness could lead to a better understanding of and insight into the abstract language of math" (Surmont et al., 2016, p. 329).

Conclusion and Limitations
The data was analysed through ANOVA tests mostly used with a minimum of three groups. Perhaps a t-test of independent samples would have been a better option to answer research question 1, since this instrument is commonly used for comparing two groups.
It should be noted that the research took place in only one high school in the city centre of an average city in Andalusia, a fact that definitely affects the external validity and, thus, makes it difficult to extrapolate the results to other contexts. Nevertheless, participants are students who could be representatives of high schools with similar contextual and socio-economic terms. In any case, further research is needed in order to guarantee that the conclusions are valid in general, including schools with different situations (e.g., urban, suburban or rural) and contexts. Another factor that could affect the external validity is the homogeneity of the group in terms of CEFR reading comprehension level.
Hence, further studies with students having different reading comprehension levels should be carried out to confirm if solving word problems in English is truly affected by their reading performance, but also to verify if the reading performance combined with the mathematical difficulty can be used for the adequate selection of the word problems, particularly in tests in which the main objective is to assess the mathematical performance. Finally, homogeneity also appears in the contextual factors. For this reason, a profound investigation regarding context and mathematical performance in English as an L2 is required.