Language and Content Outcomes of CLIL and EMI: A Systematic Review

Around the world, language teachers are shifting to content-based instruction (CBI) as a way to teach English, most commonly in the form of Content and Language Integrated Learning (CLIL) or English-Medium Instruction (EMI). With the spread of CBI around the world, it is important to understand how this shift in teaching has affected student outcomes. Using a systematic literature review approach, this study examines current literature on the effect of CBI on language and content outcomes. Twenty-five articles met the inclusion criteria for this study and were examined. The results show mixed findings on the effectiveness of CBI on student outcomes, with the majority of studies showing either positive or neutral effects for CBI when compared with non-CBI classrooms. However, the study also reveals multiple methodological issues that cause difficulties for any strong conclusions about CBI to be made. In addition, while CLIL in Spain has received a lot of research attention, other countries remain understudied. Therefore, this study concludes with a call for future research of CBI outcomes that examine a variety of countries and account for the methodological flaws identified.


Introduction
Over the last few decades, the way English is being taught has undergone a massive change internationally, shifting from teaching English as a foreign language to using English as a medium of instruction (Dearden, 2015).This new form of language teaching is known as Content-Based Instruction (CBI), an umbrella-term that describes classrooms where "students are taught academic content in a language they are still learning" (Lightbown, 2014, p. 3).CBI may be practiced in various forms, with the most common being Content and Language Integrated Learning (CLIL) or English-Medium Instruction (EMI) (Brinton & Snow, 2017).
CBI has seen expeditious growth around the world that has generally outpaced research (Eurydice, 2006;Neghina, 2017).Given the massive change in teaching practice, it is important that an assessment be conducted on whether the goals of content and language outcomes have been met.In order to assess this, our study uses a systematic search of the literature to review what we know about CBI student outcomes.Specifically, this paper examines how CBI teaching practices compare with traditional language teaching.

Theoretical Framework
Two theories inform our interpretation of the CBI literature -the input hypothesis and the cognitive load theory.The input hypothesis (Krashen, 1985) suggests that language acquisition can only occur if the input does not exceed the learner's acquired language level, i+1 level higher.Krashen (1985) argues that such input should be authentic and comprehensible, meaning that it should be neither too easy nor too difficult for the learners.Though Krashen's input hypothesis has been challenged for being vague, over ambitious, and even dangerous for its emphasis on simplified speech (Liu, 2015), the basis of the hypothesis may be helpful in understanding CBI.Zhao and Dixon (2017)  "i+1" should not only refer to language but also to content, thus saying content should also not exceed students' content knowledge plus one level higher.When viewed in this way, effective CBI may be precipitated on the premise that language must not only be noticed, but it also, in terms of content, must be comprehensible.
Cognitive load theory (Sweller, 1988) describes how cognitive resources can be overloaded during learning tasks when learners find their focus split between disparate sources of information related to a learning goal.CBI has the potential to take students' attention away from a learning goal and cause cognitive overload (Piesche, Jonkmann, Fiege, & Keßler, 2016).By introducing both content and language simultaneously, students could potentially find their focus split between trying to understand content and comprehend language.Thus, cognitive load theory would require that, to be successful in CBI, students need both the prerequisite content background knowledge as well as sufficient language ability in order for their cognitive resources to be focused on the learning objective of the class.

Method
For this paper, we reviewed empirical studies on students' learning outcomes in CBI classrooms published in peer-reviewed journals between 2008 and 2018, as any research beyond ten years may be too outdated to represent current CBI programs.Five academic databases were searched: Educational Resources Information Center (ERIC), Linguistics and Language Behavior Abstracts (LLBA), Scopus, PsychINFO, and Web of Science.Abstracts were searched using the following: "English Medium Instruction" OR "EMI" OR "Content and Language Integrated Learning" OR "CLIL" OR "Content based instruction" OR "CBI" OR "Content Based Language Teaching" OR "CBLT" AND "Teaching" NOT "French."We found 645 references with 488 remaining after duplicate removal.
Inclusion and exclusion criteria were used to further narrow the literature.A study was included if (1) the course instructional language was English and the majority of the population's L1 was not English (i.e., EFL setting); (2) it was designated/entitled/described as teaching content through English; and (3) it directly measured students' learning outcomes in CBI and non-CBI settings.On the other hand, we excluded book chapters, systematic reviews, meta-analyses, or commentaries and also articles addressing English for academic purposes (EAP) or English for specific purposes (ESP).After considering inclusion and exclusion criteria, 25 of the 488 articles remained and were included in the study.

Results
The twenty-five articles included in the study can be found in Table 1.
Twenty-two of the studies were reported as CLIL programs, balancing language and content objectives, whereas three were reported as EMI, typically with a sole focus on content objectives.The included studies come from only two continents, Europe (N=23) and Asia (N=2).Of the European studies, most were from Spain (N=17), along with single studies from Austria, Belgium, Cyprus, the Czech Republic, and Germany.There was one additional study from Europe that examined Germany, Italy, and the Netherlands together.In Asia, one study came from Hong Kong and another from Taiwan.Seven studies were conducted in primary schools, twelve studies were in secondary schools, and an additional two studies had participants from both primary and secondary schools.Additionally, four studies were conducted in tertiary contexts.
Nineteen studies examined language outcomes and six examined content outcomes.The following sections detail the findings.

Language outcomes
CBI has been proposed as a replacement for traditional language teaching by many of its supporters, so naturally much research has been conducted to examine whether CBI can produce better language outcomes.

Overall language proficiency
The findings from five studies looking at general proficiency levels show that CBI has had mixed results for language proficiency outcomes for One study at the tertiary level conducted by Yang (2015) explored general language outcomes also reported positive language outcomes, although not to the same degree.Yang (2015) studied the language proficiency of tertiary students enrolled in an English Tourism degree program.Using scores from a national English proficiency test, it was found that students enrolled in the program scored higher than their non-EMI classmates on receptive skills but not higher than the national average, and no difference was found for productive skills.Source: Own elaboration.

Receptive language skills
In an attempt to find more specific answers on CBI's effect on student language outcomes, researchers have conducted studies specifically examining receptive language skills.Much of the research on receptive outcomes of CBI has focused on receptive vocabulary.Six studies reported mixed findings on CBI's effectiveness for developing receptive vocabulary with three finding significant differences in favor of CBI (Canga-Alonso, 2015a, 2015b;Xanthou, 2011) and three finding no significant differences between CBI and non-CBI groups (Agustín-Llach, 2017; Arribas, 2016;Gierlinger & Wagner, 2016).The three studies showing positive effects of CBI on receptive vocabulary all took place with primary school students, whereas those showing no difference studied both secondary (N=2) and primary (N=1) schools.

Productive language skills
CBI research on productive skills is much more varied than receptive  Agustín-Llach (2016, 2017) showed similar lexical production in the writing of CLIL and non-CLIL primary students.In terms of lexical errors, Manzano-Vázquez (2014) found no differences in the frequency of lexical errors between CLIL and non-CLIL secondary students after accounting for the outliers in the non-CLIL group.
Research on CBI outcomes for speaking is as equally varied as writing, focusing on pronunciation (Rallo-Fabra & Juan-Garau, 2011), negotiation strategies for comprehensibility (Mesquida & Juan-Garau, 2013), turn-taking (Moore, 2011), and morphosyntactic development (Lazaro Ibarrola, 2012).Pronunciation and negotiation strategies are skills that may help second language users increase their comprehensibility in communication.Research suggests CBI students may have an advantage in both of these skills.Rallo Fabra and Juan-Garau (2011) found that CLIL secondary students produced more intelligible and less accented speech than those in a traditional classroom.While Mesquida and Juan-Garau (2013) found no significant differences between CLIL and non-CLIL secondary students in terms of the amount of negotiation strategies used, the authors reported that CLIL students used a wider variety of strategies, possibly meaning these students could negotiate meaning in different manners and situations.Collaborative turn-taking is another technique that may lead to improved communication.Moore (2011) found that, while mainstream foreign language students took more turns overall, the CLIL students took more collaborative turns that facilitated mutual interaction and linguistic/affective support for their conversation partner.Finally, Lazaro-Ibarrola (2012) suggests that CLIL learners show faster growth in morphosyntactic development, leading toward more advanced communication.

Content outcomes
In many CBI classrooms, the purpose is twofold: both "acquiring subject knowledge and competences as well as skills and competences in the foreign language" (Georgiou, 2012, p. 495).In other words, not only students' acquisition of English skills but also their level of understanding of the subject matter should be a critical criterion for evalu-

UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES
ating the effectiveness of this teaching approach.The following studies examine the academic performance of students across various academic disciplines such as math and science in primary or secondary schools as well as tertiary-level courses such as business, accounting, finance, and history.

Mathematics outcomes
Two studies examined the effect of mathematics CLIL instruction at the secondary level (Binterová, Petrášková, & Komínková, 2014;Ouazizi, 2016).In both studies, the students who received CLIL instruction performed better on the mathematics test compared to those who received instruction in their mother tongue.Binterová et al. (2014) assessed students' word problem solving skills and used two mathematics didactic tests (one in English and the other one in the students' mother tongue).Since word problems in mathematics require comprehension of the language, the authors concluded that the CLIL method is more effective in improving students' language skills as well as problem solving skills in math.Ouazizi (2016) used a mathematical test of quadratic equations to measure students' mathematics knowledge.Both the CLIL and non-CLIL group achieved relatively high scores on the test, but the CLIL group scored slightly higher, though it should be acknowledged that the CLIL group had increased instruction time compared to the non-CLIL group, making conclusions difficult.

Science outcomes
Two studies (Fung & Yip, 2014;Piesche et al., 2016) have examined science learning, with both studies finding negative effects for CBI on students' acquisition of science concepts.In Piesche et al. (2016), sixthgrade students in a CLIL science class obtained lower scores than their counterparts in the non-CLIL class on both an immediate post-test and a follow-up test.The CLIL students also did not show strong long-term retention of the science content, contrary to the prominent hypothesis that a bilingual environment would foster long-term memory of the content (Piesche et. al., 2016).However, the study participants were new to CLIL and more time may have been needed for them to get used to the CLIL environment.
Similarly, content instruction in L2 seems to be less effective for low-achievers.For example, Fung and Yip (2014) explored the effect of a tenth-grade physics intervention in EMI and CMI (Chinese, L1, as the medium of instruction) classrooms.Among the three physics ability groups, the low-ability EMI students attained a lower level of achievement compared to the low-ability CMI students, while high-ability EMI students performed better than their high-achieving peers in CMI classrooms.Therefore, it seems that various learner factors such as their L2 proficiency, prior experience in CBI, and their prior knowledge of the content area could have a significant influence on the effectiveness of CBI on content learning.

Tertiary-level content outcomes
At the tertiary-level, we found two CBI studies (Dafouz, Camacho, & Urquia, 2014;Hernandez-Nanclares & Jimenez-Munoz, 2017) that examined the academic outcomes of students.Both studies found slightly better performance in the CBI group, but not to a statistically significant degree.For example, Hernandez-Nanclares and Jimenez-Munoz (2017) examined the effect of CBI instruction in a World Economy and World Economy History course.The results of the written examination showed that the CLIL group performed slightly better than the non-CLIL group, although the average grade on the final exam was similar.
Similarly, Dafouz et al. (2014) examined the effect of EMI on Spanish undergraduate students' academic performance in accounting, finance, and history, and compared their performance across these three disciplinary subjects through coursework and final grades.Both groups (EMI and non-EMI) obtained very similar results in the three subjects regardless of the language of instruction.Moreover, the students in the history course obtained slightly higher results than the other two subjects in spite of the higher verbal demands.However, since the instructors for these courses were different, the scoring of each instructor could vary, which is also related to an issue of homogeneity of evaluation criteria.In addition, although the authors claimed that these three groups of students were comparable, their English proficiency was not measured.Due to these methodological issues, it may be difficult to conclude CBI advantages in tertiary education.

Discussion
The mixed results found in the literature make it difficult to arrive at any conclusions about the effectiveness of CBI.While there have been positive findings, there are many issues that call the results of the studies into question.Bruton (2011) outlines many of these issues.For one, the CBI programs in most of the research reviewed were elective programs that may naturally attract students with higher aptitudes or motivation toward foreign language learning.Second, in many of the studies reviewed, CBI courses were reported as having extra instruction time.Given this, it is completely feasible that the gains are simply a result of more instruction.Additionally, the research instruments used, particularly in the receptive skill studies, were designed to measure general everyday language proficiency, or basic interpersonal communicative skills (BICS) (Cummins, 1984).However, much of the learning that occurs in a CBI classroom targets cognitive academic language proficiency (CALP).With instruments only measuring BICS, the CALP language gains were likely not accounted for.
Putting aside these limitations, most of the studies exploring language outcomes have found CBI programs to do as well or better than non-CBI programs.Krashen's (1985) input hypothesis can possibly explain why this may be the case, even in the face of seemingly more difficult content.The input hypothesis requires two conditions for language acquisition to occur: (1) a lot of language input, and (2) input that does not exceed one level higher than the learner's current language level.CBI potentially meets both of these conditions, possibly better than traditional language teaching.For the first condition, CBI provides a lot of natural language input in the classroom, such as opportunities for input through teacher lectures, student conversation, content-related videos, textbooks, and other sources.This flood of language through a variety of resources provides a rich language environment full of input opportunities.
However, input alone does not guarantee language acquisition; the second condition of comprehensible input must be met, an area that CBI once again may have an advantage.As one might expect in a content classroom, CBI classrooms may be rich in content media such as videos, pictures, diagrams, and other visual representations that help make language in the CBI classroom comprehensible.Additionally, content learned in CBI classrooms is often content students have already learned, or at least have the prerequisite knowledge for, in their L1.When students connect their L1 content knowledge to input they receive in a CBI classroom, the connection can help make the language of the content area comprehensible.Finally, the teaching of content provides opportunities to experience similar language repetitiously throughout units, therefore providing certain language input multiple times.For example, in a science classroom, vocabulary such as experiment or mass will likely be used across units, allowing for multiple encounters.This is in contrast to a traditional language classroom where the scope is broader and similar language may not be used across units.Though many study limitations limit determinations that can be made about the advantages or disadvantages of CBI for language outcomes, when viewing the conditions of CBI classrooms through the lens of the input hypothesis, CBI seems promising for encouraging language acquisition.Similar to language outcomes, studies which examined students' understanding of the content have shown contradictory results depending on various research contexts, such as type of subject matter, participants' English proficiency, and measurement of the students' content comprehension.Overall, CBI seems to have positive effects on students' content comprehension, but mostly not to a statistically significant degree.
Caution is needed when interpreting results, though, because of several methodological issues.For example, some studies (e.g., Dafouz et al., 2014;Ouazizi, 2016) did not conduct a pre-test to ensure that the two groups had comparable content knowledge prior to the intervention.Moreover, in some studies (e.g., Dafouz et al., 2014), the homogeneity of evaluation criteria was not met when comparing the content outcomes of CBI and non-CBI courses.There was also an issue of lack of consistency in the treatment due to a wide variety of CBI instruction in different schools.These issues make it difficult to support claims of effectiveness of CBI on students' content learning.
Cognitive load theory (Sweller, 1988) may help explain the findings on CBI content outcomes.In some cognitively challenging disciplines, UNIVERSIDAD DE LA SABANA DEPARTMENT OF FOREIGN LANGUAGES AND CULTURES such as physics, CBI seems to have a negative effect on students' acquisition of content, particularly for students with little background knowledge or limited English proficiency (Fung & Yip, 2014;Piesche et al., 2016).This suggests that learning content in an L2 could demand a high amount of working memory capacity, making it difficult for students to acquire the content knowledge due to the need to simultaneously process new content and the L2.On the other hand, higher ability students performed better in CBI classrooms than in non-CBI classroom, possibly due to a more reasonable cognitive load (Fung & Yip, 2014).Thus, it seems that CBI might be more suitable for higher level students while L1 instruction may be more beneficial for low-achieving students.However, we caution that policy should not be made on this basis.That is, lower-level L2 students should not, at this point, be denied access to CBI courses based on some partially supported research evidence.The effect of CBI on various levels of students is still questionable because of contradictory findings.In Hernandez-Nanclares and Jimenez-Munoz's (2017) study, none of the students in the CBI classroom were able to reach the highest score while some of the non-CBI students did.Therefore, they concluded that the L1 instruction is more effective for advanced level students at the tertiary-level.On the other hand, a fewer number of students in the CBI classroom failed the course compared to the non-CBI classroom, which could be explained by the positive effect of CBI instruction on students' motivation and attitude toward the class.Due to these conflicting findings, more studies are needed to examine the effect of CBI on different learners, considering their language proficiency, background knowledge in content, as well as their motivation and attitudes toward the class.

Conclusion
This literature review explored current literature on CBI student outcomes, revealing mixed findings for both language and content outcomes.
Generally, these studies report that CBI either exceeds non-CBI courses or there is no significant difference.However, caution should be exercised skills in terms of research focus.From a broad view of productive skills, six studies focused on writing(Agustín-Llach, 2016, 2017;Basterrechea & del Pilar García Mayo, 2014;Gené-Gil, Juan-Garau, & Salazar-Noguera, 2015;Manzano-Vázquez, 2014;Maxwell-Reid, 2010) and four focused on speaking(Rallo-Fabra & Juan-Garau, 2011;Lazaro-Ibarrola, 2012;Mesquida & Juan-Garau, 2013;Moore, 2011).Current research provides little evidence that CBI students enjoy an advantage when it comes to writing.Maxwell-Reid (2010) provides some evidence of the advantages of CBI for writing, finding that the writing of CLIL students in the study seem to display more of the characteristics of English writing, whereas non-CLIL students tended to display characteristics more associated with Spanish writing.However,Basterrechea anddel Pilar García Mayo (2014) andGené-Gil et al. (2015) found very few significant differences in writing between secondary CLIL and non-CLIL students.Basterrechea and del Pilar García Mayo (2014) found no significant difference in the ability of CLIL secondary students to produce the third-person -s, andGené-Gil et al. (2015), measuring complexity, accuracy, and fluency of writing, found the only significant difference to be in terms of lexical variety, in favor of the non-CLIL group.LikeGené-Gil et al. (2015), other researchers have also explored the lexical side of writing.Two studies conducted by extend Krashen's work to CBI situations by suggesting that the Language and Content Outcomes of CLIL and EMI: A Systematic Review

Table 1 .
Overview of included articles Goris, Denessen, and Verhoeven (2013) Moore (2010)randomly selected 61 schools out of a pool of 403 and administered an experimental language diagnostic test.The diagnostic test measured all four language skills (reading, writing, listening, and speaking) and found that CLIL students far outperformed non-CLIL students in all areas, with overall averages of 62 percent and 38 percent, respectively.Coral, Lleixà, and Ventura (2018) also found primary CLIL students performed better on a state language test measuring reading and listening comprehension, although only slightly.In secondary schools,Goris, Denessen, and Verhoeven (2013)reported that CLIL students in three European countries outperformed their non-CLIL counterparts on a variety of measures.