Research Ideas and Outcomes : PhD Thesis
|
Corresponding author: Mikel W Cole (mikel.w.cole@gmail.com)
Received: 28 Aug 2018 | Published: 29 Aug 2018
© 2018 Mikel Cole
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Cole MW (2018) Effectiveness of peer-mediated learning for English language learners: A meta-analysis. Research Ideas and Outcomes 4: e29375. https://doi.org/10.3897/rio.4.e29375
|
This manuscript reports the findings from a series of inter-related meta-analyses of the effectiveness of peer-mediated learning for English language learners (ELLs). Peer-mediated learning is a broad term that as operationalized in this study includes cooperative learning, collaborative learning, and peer tutoring. Drawing from research on interaction in second language acquisition, as well as from work informed by Vygotskian perspectives on socially-mediated second language learning, these meta-analyses synthesize the results of experimental and quasi-experimental studies.
Included studies were conducted with language learners between the ages of 3 and 18 in order to facilitate comparisons to US students in K-12 educational settings. All participants were identified as ELLs, though learners in both English as a Second Language (ESL) and English as a Foreign Language (EFL) settings were included. Similarly, learners from a variety of language backgrounds were included in order to facilitate generalizations to the linguistic diversity present in US schools, and abroad. Main effects analyses indicate that peer-mediated learning is effective at improving a number of outcome types, including: language outcomes, academic outcomes, and social outcomes. Funnel plots and Egger’s regression analyses were conducted to examine the probability of publication bias, and it appears unlikely in most analyses. Moderator analyses were also conducted, where sample sizes were sufficient, to examine what measured variables were capable of explaining heterogeneity in effect sizes between studies.
This dissertation presents the results of a meta-analysis of the effectiveness of peer-mediated learning for English language learners (ELLs)*
Currently, more than eleven million students in K-12 schools in the United States speak a language other than English at home, meaning that linguistically-diverse students now comprise more than 20% of the total school age population (
Not only is the population of ELLs rapidly growing and dispersing throughout US schools, ELLs are a remarkably heterogeneous group of students (
School-level Silence: Sociopolitical Context and Program Models
ELLs are a linguistically diverse group of students, collectively speaking more than 400 languages (
This historical legacy of silence persists in contemporary examples of lost opportunities to learn and instances of the ongoing denial of students’ access to their own language and literacy practices (
Empirical evidence indicates that context influences student learning, and both the sociopolitical environment and the model of education provided to students contribute to ELLs’ academic success (
Perhaps the most widely-researched aspect of linguistic capital present in the effectiveness literature for ELLs is language of instruction (
Teacher-level Silence: Pedagogy, Preparation, and Dispositions
Current schooling practices continue to manifest messages of silence for linguistically-diverse students and teachers often reinforce these messages, creating classroom atmospheres like the following example where the teacher invokes a traditional “Initiate-Respond-Evaluate” discourse pattern that effectively stifles students: “I was struck by the silence when I entered the classroom. The teacher, positioned at the front of the traditionally organized room, began to speak. ‘Where’s the adjective in this sentence?’”(
Unfortunately, most teachers of ELLs remain largely unprepared to provide the specialized learning this growing and heterogeneous group of students requires (
Even in classrooms where talking and rich discussion are the norm, English learners are often silenced during class discussions because of inequitable distributions of power between students and teachers (
Student-level Silence: Positioning, Identity, and Resistance
ELLs are also positioned towards silence by distributions of power at the student level, distributions at once informed by sociopolitical factors in the local context and driven by the reorganization of social strata and identity formation at the student level (
Consequently, learners’ identities and motivations affect academic success in dynamic and complex ways; sometimes peer influences and individual aspirations drive learners to pursue school success, and sometimes peer networks and individual responses to power inequities lead learners to resist schooling (
In conclusion, it is worth reiterating the primary focus of the proposed study—to investigate the effectiveness of peer-mediated learning for improving language, academic, and social outcomes for ELLs. This framing of “the problem” is intended to show the multi-faceted ways that issues of power and inequity interact with learning for ELLs. However, it is not intended to advance a claim that interactive learning methods will solve all of the inequities that ELLs face. Cooperative learning alone is no panacea. Rather, it is the thesis of this statement of the problem that questions of educational effectiveness for ELLs demand attention to the ways that power and inequity interact with learning.
Specifically, the meta-analysis reported in this dissertation seeks to answer the following two primary research questions. More specific questions and hypotheses are presented in Chapter 3, following the literature review in Chapter 2 that presents the case for examining specific variables of interest.
The results of the proposed meta-analysis are intended to contribute to a growing literature on the effectiveness of specific instructional approaches for the fastest growing group of students in US schools, which contributes to an on-going discussion of equitable, high-quality instruction for ELLs. The results of the meta-analysis will offer a concise synthesis of multiple evaluation studies; specifically, standardized mean effect size estimates for language, academic, and attitudinal outcomes will provide systematic evidence of the effectiveness of peer-mediated instruction in key sets of learning outcomes for ELLs. Additionally, meta-analysis enables a systematic analysis of moderating factors that are important to consider when interpreting current and future evidence and when considering instructional decisions that might arise during implementation of peer-mediated learning in actual classroom contexts. As discussed in the Methods section, inclusion of studies conducted within the US and in other countries enables results to be broadly generalizable while allowing for analysis of the contribution of context as a moderator of effectiveness (i.e., are results produced in English-as-a-Foreign-Language and English-as-a-Second-Language settings significantly different?).
As indicated, the purpose of this meta-analysis is to synthesize the empirical literature on the effectiveness of peer-mediated learning for English language learners in K-12 settings; specifically, the meta-analysis computes main effects and identifies important mediators of effectiveness using experimental and quasi-experimental studies. Thus, the most relevant literature to review consists of previous meta-analyses and quantitative syntheses of peer-mediation; however, important qualitative studies, especially highly-cited reviews and syntheses are included to ensure that relevant theoretical, instructional, and empirical variables are not overlooked by focusing exclusively on experimental designs in the literature review.
What is Peer-mediated Learning?
In this paper, “peer-mediated learning” refers to an instructional approach that emphasizes student-student peer interaction, and it is intended to provide a contrast to teacher-centered or individualistic approaches to learning. In practice, peer-mediated learning includes a variety of approaches, each with supporting literatures that are typically distinct from one another. Specifically, this meta-analysis synthesizes three distinct varieties of peer-mediated learning: cooperative, collaborative, and peer tutoring, a distinction employed in previous syntheses (e.g.,
The use of peer-mediated as a term to include multiple varieties of instruction not only emphasizes the similarities amongst these methods, it also reflects an underlying bias in this paper. The author currently sees a sociocognitive reading of Vygotskian theory as a conceptual common grounds between traditional second language acquisition models of L2 learner interaction and sociocultural models of L2 learner interaction, and Vygotskian perspectives on learning and cognitive development would describe all three approaches (i.e., cooperative, collaborative, and peer tutoring) as peer-mediated learning (see for example,
Thus, the treatment of several varieties of peer-mediated learning as similar does not imply that they are identical; rather, the intention is to focus on what they have in common, especially when compared to teacher-driven or individualistic approaches. However, for the sake of clarity and to maintain an awareness of how the varieties do differ in meaningful ways, each of the three focal varieties of peer-mediated learning is briefly reviewed separately below.
Cooperative learning
Cooperative learning represents what
A commonly definitive characteristic of cooperative learning approaches is the degree of structure (
The description of Jigsaw above highlights another important component that defines cooperative methods—interdependence. The concept of interdependence is closely tied to group goals, and is intended to measure the extent to which individual members rely on each other for success. Several versions of cooperative learning suggest that students are motivated to participate in cooperative tasks because the group shares a common goal; however, researchers argue that commonly shared group goals are insufficient alone (e.g.,
Collaborative Learning
A number of reviews treat cooperative and collaborative methods as if they are similar, if not identical, methods (e.g.,
Peer Tutoring
While cooperative and collaborative methods dominate the field of peer-mediated learning approaches, it is important to recognize that there is considerable diversity of approaches within the field. Inclusion of peer tutoring approaches is intended to illustrate this diversity, while acknowledging that other peer-mediated approaches exist. Peer tutoring approaches also vary widely (see
How Does Peer-mediation Promote Learning?
Thus, according to his model, group interdependence is a necessary component of enhanced learning through cooperation. Group interdependence is mediated by a number of motivational factors that contribute to several specific components of peer-mediated learning, including: elaborated explanations, peer modeling, and peer assessment and correction. It seems clear from the literature base of individual studies from which Slavin draws that not all of the individual components in the third box need be present for peer-mediated learning to be effective; rather, group interdependence fosters motivation which enables some of the individual components to occur. Slavin even acknowledges that limited evidence suggests that group interdependence need not always be present, but he argues that it is easiest to make cooperative methods effective when interdependence is present (
In addition to including aspects of power and equity,
Empirical Evidence for Peer-mediated Learning
Both quantitative and qualitative evidence support the claim that peer-mediated learning is effective at promoting numerous kinds of outcomes; while the qualitative syntheses, with some exceptions like Slavin’s narrative, “best-evidence” reviews (
Qualitative Evidence
In another best evidence synthesis of qualitative and quantitative studies,
Like many of the quantitative syntheses discussed below,
Finally, another synthesis of cooperative learning explores the literature on the effectiveness of cooperative methods with Asian students in preschool to college settings (
Quantitative Evidence
Unlike the theoretically-oriented syntheses presented above, the following quantitative reports offer more methodologically-focused syntheses that compare various models of cooperative learning to one another (
More recently,
Interestingly,
One approach to determining the important components of an intervention is to systematically examine the contribution of important variables over the course of many separate replications (i.e., a meta-analysis); nonetheless, a more fine-grained approach is to design a study that explicitly tests various components individually and/or in multiple combinations, and
Finally, two meta-analyses examine the impact of peer-mediated methods for particular groups of students.
In conclusion, considerable qualitative and quantitative research supports the assertion that peer-mediated methods of instruction are more effective at promoting multiple kinds of outcomes than individualistic or competitive approaches. Despite decades of consistently positive research, a number of variables of instructional structure (e.g., size of group and composition of groups) and social interaction, as well as important learner (e.g., age) and methodological (e.g., design and measurement) variables, remain important foci of current and future research. In particular, few syntheses of the effectiveness of peer-mediation for particular kinds of students exist, and none of the syntheses discussed so far even mention specific issues involving linguistically diverse students. Thus, questions of whether, why, and under what conditions peer-mediation is effective for English language learners are the focus of the remainder of this literature review.
While much of the research regarding the effectiveness of cooperative learning reviewed so far is relevant for English language learners, it is important to keep in mind that English language learners are a distinct group of learners who, by definition, must master both academic and language objectives. Thus, when considering ELLs, it is essential to consider whether peer-mediated methods are effective for both academic and language outcomes, and as noted, language outcomes are largely ignored in the studies already reviewed. Moreover, it is essential to understand whether there are important linguistic mechanisms engaged during peer-mediated learning that are conceptually distinct from the more psychological and sociological mechanisms of peer-mediated methods just discussed in order to consider the relevant instructional and theoretical foci for L2 research.
Academic Rationale for Peer-mediated Learning with ELLs
Several recent syntheses of effective instruction for English language learners suggest that cooperative and collaborative models of instruction could be effective for promoting language, literacy, and content-area learning for ELLs (
For example, the National Literacy Panel on Language-minority Youth and Children (
Two other high-profile reviews (
Investigating effective instructional approaches for ELLs in elementary and middle grades,
No synthesis of the effectiveness of peer-mediated methods at improving academic outcomes for ELLs was identified in the review of extant literature for this meta-analysis, which is a strong warrant for the pursuit of this particular study. Consequently, only high-visibility, individual studies exist to document the academic rationale for using peer-mediated methods with ELLs. What Works Clearinghouse (WWC) reports results for only the most methodologically-rigorous studies, and taken as a whole, the inclusion criteria and analyses make the WWC site something like a quantitative synthesis of research; granted, WWC does not employ meta-analysis or any other formally-synthetic method to make claims across the included studies, so the actual reports are not truly syntheses. For ELLs, What Works Clearinghouse reports separately for the following outcomes: reading/writing, mathematics, and English language development. Of the studies included for reading and writing, only three use peer-mediated methods extensively, and all three demonstrate effectiveness at promoting literacy outcomes for ELLs. Two of the peer-mediated literacy interventions are complex models of which peer-mediated learning is one of multiple components (i.e., Success for All and Bilingual Cooperative Integrated Reading and Composition), and only one of the interventions focuses exclusively on the effectiveness of peer-mediation (Peer-assisted Learning Strategies, or PALS). WWC does not report any interventions for ELLs with math/science outcomes that meet its standards for inclusion, and language outcomes are discussed in the following section that presents the linguistic rationale for using peer-mediation with ELLs.
A closer look at the full reports of the three included interventions with literacy outcomes reveals that a number of important instructional variables differ across these interventions. For example, the most effective of the three interventions is BCIRC, and the WWC report is based almost entirely on
Like BCIRC, Peer-assisted Learning Strategies (PALS) was evaluated for use in upper elementary ELL classrooms, and like BCIRC, only one evaluation study of the intervention meets WWC standards (
Linguistic Rationale for Peer-mediated Learning with ELLs
While no formal synthesis of the effectiveness of peer-mediation at promoting academic outcomes for ELLs exists, several theoretical, qualitative, and quantitative syntheses of the effectiveness of peer-mediated learning at promoting language outcomes for ELLs exist. Thus, there is a considerably stronger rationale for using peer-mediation to promote language learning for ELLs than for promoting academic outcomes, and this is a key assertion because it is precisely English language proficiency that defines this group of students. Thus, peer-mediated learning offers promise not only as an effective approach for promoting the academic success of ELLs, it may also be an important tool for removing the fundamental barrier to equal access to the mainstream school curriculum the term ELL is intended to identify: English language proficiency*
Whereas, cooperation is high-structure and collaboration is low-structure in her scheme, she finds that interaction studies vary widely on this variable. Importantly, Oxford identifies a number of additional variables that influence the effectiveness of interactive approaches; including learner variables (i.e., willingness to communicate and learning styles) and grouping dynamics (i.e., group cultures and physical arrangement of the classroom).
In a narrative review of both qualitative and quantitative empirical research,
In particular, the findings reported in
Two recent meta-analyses of the effectiveness of interaction at promoting L2 learning outcomes offer additional warrant for using peer-mediated learning methods with ELLs; and in addition to providing overall estimates of the effectiveness of peer-mediated L2 learning, they provide considerable insight into important factors that mediate effectiveness. The first of the two meta-analyses (
These syntheses provide compelling evidence that peer-mediated methods are effective at promoting a wide variety of language outcomes for second language learners, though many issues raised in the L1 research remain largely unanswered in the L2 literature. For instance, ELLs are a highly heterogeneous population (i.e., language background, prior schooling, SES, race/ethnicity, age of arrival, and length of residence), but there is little research that discusses with which ELLs peer-mediated methods might be most effective, though both
Type of task matters in both the theoretical and empirical L1 and L2 literatures reviewed so far, but neither the qualitative nor the quantitative literatures offer much feedback about which kinds of tasks are best for which types of language or academic outcomes for ELLs. Importantly,
Peer mediated methods have consistently proven effective at promoting academic, social, and language outcomes with a wide variety of first- and second-language students in a wide variety of contexts, lending support to
Summary of key variables from literature review.
VARIABLE |
L1 Research |
L2 Research |
Peer-mediated Method Matters |
|
|
Peer-mediated Method does not Matter |
|
|
High-structure is Best |
|
|
Low-structure is Best |
|
|
Interdependence is Needed |
|
|
Interdependence is not Needed |
|
|
Content Area Matters |
|
|
Content Area does not Matter |
|
|
Age of Students is Important |
|
|
Age of Students is not Important |
|
|
Ethnicity of Students Matters |
|
|
Ethnicity of Students does not Matter |
|
|
Language Proficiency (i.e., L1 or L2) of Students Matters |
|
|
Language Proficiency (i.e., L1 or L2) of Students does not Matter |
|
|
Culturally-relevant Instruction Matters |
|
|
Culturally-relevant Instruction does not Matter |
|
|
SES of Students Matters |
|
|
SES of students does not Matter |
|
|
Size of Group Matters |
|
|
Size of Group does not Matter |
|
|
Equality of Power among Students Matters |
|
|
Equality of Power among Students does not Matter |
|
|
Duration of Intervention Matters |
|
|
Duration of Intervention does not Matter |
|
|
Setting (i.e., segregated, cooperative, ESL or EFL, lab or classroom, urban or rural) Matters |
|
|
Setting does not Matter |
|
|
Journal Quality Matters |
|
|
Journal Quality does not Matter |
|
|
Sample Size Matters |
|
|
Sample Size does not Matter |
|
|
First, researchers disagree about the importance of the particular method, whether cooperative, collaborative, peer tutoring, or some set of specific approaches (e.g., Jigsaw, Learning Together, STAD, TGT). The clearest distinction appears to be between L1 researchers that generally agree the method matters (though which method is ultimately superior remains debatable) and L2 researchers that typically do not report differences between specific methods. To be fair, this largely reflects the nascent state of L2 research, and many of the studies listed in Table
While considerable debate exists within and across L1 and L2 literatures about which peer-mediated method is most effective, there is strong consensus that more structured approaches produce bigger gains than less-structured approaches. Despite this strong consensus, theoretical (
Notably, several variables of equity mentioned in the Statement of the Problem in Chapter 1 appear to be missing, or at least largely ignored, in the above list, including: adequate facilities, context of reception, preparation of teachers to work with ELLs, attitudes and beliefs of teachers towards ELLs, relations of power between teachers and ELLs, and length of residence of ELLs. To the extent possible, these variables will also be coded when reviewing studies for inclusion in this meta-analysis. However, the absence of these variables from the extant literature probably supports the assertion that the field of peer-mediated learning studies for ELLs remains largely driven by psychological theory and that sociological perspectives remain underrepresented (e.g.,
Chapter 1 presented the two fundamental research questions driving this meta-analysis; however, as indicated in the literature review in Chapter 2, there are a number of substantive theoretical, instructional, and methodological variables of potential interest. Consequently, formal hypotheses regarding the key variables of interest are presented below.
a. Hypothesis 1a: Test of HA: Interventions testing the effectiveness of peer-mediated forms of learning against teacher-centered or individualistic control groups report language outcome effect sizes that are significantly larger.
b. Hypothesis 1b: Test of HA: Interventions testing the effectiveness of peer-mediated forms of learning against teacher-centered or individualistic control groups report academic outcome effect sizes that are significantly larger.
c. Hypothesis1c: Test of HO: Interventions testing the effectiveness of peer-mediated forms of learning against teacher-centered or individualistic control groups report attitudinal outcome effect sizes that are not significantly different.
a. Hypothesis 2a: Test of HO: Interventions testing the effectiveness of cooperative, collaborative, and peer tutoring approaches report effect sizes that are not significantly different.
b. Hypothesis 2b: Test of HO: Interventions testing the effectiveness of peer-mediated approaches in English-as-Second Language (ESL) and English-as-a-Foreign Language (EFL) settings report effect sizes that are not significantly different.
c. Hypothesis 2c: Test of HO: Interventions testing the effectiveness of peer-mediated approaches in elementary, middle school, and high school settings report effect sizes that are not significantly different.
d. Hypothesis 2d: Test of HO: Interventions testing the effectiveness of peer-mediated approaches in laboratory and classroom settings report effect sizes that are not significantly different.
e. Hypothesis 2e: Test of HO: Interventions testing the effectiveness of peer-mediated approaches as part of complex interventions and those testing just peer-mediation report effect sizes that are not significantly different.
f. Hypothesis 2f: Test of HO: Interventions testing the effectiveness of peer-mediated approaches with students from different language backgrounds report effect sizes that are not significantly different.
g. Hypothesis 2g: Test of HO: Interventions testing the effectiveness of peer-mediated approaches with students from high- and low-SES backgrounds report effect sizes that are not significantly different.
h. Hypothesis 2h: Test of HA: High-quality studies report effect sizes that are significantly larger than low-quality studies.
i. Hypothesis 2i: Test of HA: Studies of longer duration report effect sizes that are significantly larger than short-duration studies.
a. Hypothesis 3a: Test of HA: Studies conducted in settings where ELLs are segregated from their English-speaking peers will report significantly lower effect sizes than studies conducted in settings where ELLs are integrated with non-ELLs.
b. Hypothesis 3b: Test of HA: Studies conducted in settings that authors describe as having adequate facilities will report significantly higher effect sizes than studies conducted in settings that authors describe as inadequate.
c. Hypothesis 3c: Test of HA: Studies conducted with ELL-certified teachers will report significantly higher effect sizes than studies in which teachers do not possess specialized certifications to work with ELLs.
d. Hypothesis 3d: Test of HA: Studies testing interventions described by the authors as at least partially culturally-relevant will report larger effect sizes than studies that do not make culturally-relevant claims.
e. Hypothesis 3e: Test of HA: Years of teaching experience will be positively correlated with effect sizes.
f. Hypothesis 3f: Test of HA: Studies reporting interventions that utilize students’ native language during instruction will report larger effect sizes than studies using only students’ second language (i.e., English) for instruction.
A number of researchers argue that not enough experimental evaluations of intervention effectiveness exist in the ELL literature (e.g.
Types of Studies
Both experimental and quasi-experimental studies were included in the review. For studies in which non-random assignment was used, studies must have included pre-test data, or must have statistically controlled for pre-test differences (e.g., ANCOVA). Similarly, studies which tested more than one treatment against a control group were included as long as one treatment could readily be identified as the focal treatment. If a study did not include a control group, it was excluded.
Although 20 years is a common standard for study inclusion, studies that are older than 20 years were included if they met the other criteria because scarcity of research suggests that older studies may be necessary to provide sufficient power for the detection of effects and moderator analyses.
Finally, for practical purposes studies must have been published in English, though the research may have occurred in any country with participants of any nationality. In addition, the target language must have been English in order to facilitate direct comparisons to ELLs in US schools; however, participants may have represented any language background, and instruction could have occurred in any language, as well.
Types of Participants and Interventions
Studies must have tested the effects of peer-learning involving students between the ages of 3 and 18, again in order to facilitate comparisons to US students in K-12 educational settings. For example, in studies of peer tutoring, both students for whom outcomes are measured and students who act as tutors must have been within this age range to preserve the focus on “peer” interactions. Also, participants must have included students identified as English language learners (though methods of identification and definitions of ELL varied), and results must have been exclusively, or disaggregated, for ELLs. For example, the inclusion of studies conducted internationally necessitated the inclusion of students learning English as a Foreign Language (EFL) and students in the United States learning English as a Second Language (ESL). The difference in settings (e.g. immersed in an English-dominant environment for ESL students) makes the process of language acquisition very different, but for purposes of this synthesis, both of these types of learners were subsumed under the ELL category.
Interventions may have utilized a number of instructional activities, but peer-peer interaction must have been a focal aspect of the intervention. Furthermore, comparison groups must not have received instruction for which peer-mediated learning was widely-used, and studies that only provided a cooperative intervention were coded separately from those that involved more complex interventions in which peer-mediated methods were just one component (e.g., Success for All). Studies for which peer-peer interaction could not be identified as a focal feature of the intervention were excluded, as were studies for which comparison groups used large amounts of peer assistance.
Types of Outcomes and Instruments
Cooperative learning has been used to improve almost every conceivable academic achievement outcome, but it has also been widely used to improve a number of behavioral and social outcomes. Therefore, nearly any outcome was coded, though some outcomes were not assessed frequently enough to allow inferential statistical analyses. To facilitate coding and analysis, outcomes will be divided into five conceptually-distinct categories; and while variety existed within categories (e.g. math and social studies within academic outcomes), it was presumed that enough similarity existed to facilitate comparative analyses. These categories are: oral language, written language, other academic, attitudinal and social. Oral language outcomes were those that focused on speaking and listening, while written language outcomes were those that included primarily reading and writing. Other academic outcomes included content-area outcomes from subjects like science, social studies, and mathematics. Attitudinal outcomes were psychological in nature and consisted almost entirely of measures of motivation, and social outcomes were behavioral measures of things like interactions with native speakers. In some cases, measures were broad-band, complex measures that included aspects of several of these categories. For instance, the Revised Woodcock-Johnson Test of Achievement is a widely-used instrument that explicitly measures oral language, reading fluency and comprehension, and academic achievement. In some cases, specific subtests were reported and when possible, these sub-test scores were coded separately into one of the above categories. However, in other cases, only composite scores were reported, and in some cases descriptions of the measure seemed to favor one category over another. In some cases, however, the measures were simply too inclusive to reliably choose one category over another. In these cases, in order to maintain inter-rater reliability and to provide a systematic coding approach that could be replicated later, written language was chosen as the default outcome category for complex outcomes that measured more than one category.
Similarly, a number of instruments were used to assess effectiveness, including norm-referenced tests, researcher and teacher-created measures, and psychological and sociological instruments. These characteristics were coded to enable both inferential moderator and descriptive analyses, and they followed the same construct-driven division of results just discussed.
Multiple databases were searched using consistent combinations of keywords, though specific format varied according to individual database preferences (e.g. AND used between terms for the PsychINFO search). Several databases were combined into simultaneous searches. For instance, the ProQuest search included the following individually-selected databases: Dissertations at Vanderbilt University and Dissertation Abstracts International, Ethnic News Watch, and several subsets of the Research Library collection--core, education, humanities, international, multicultural, psychology, and social sciences. Similarly, PsychINFO included the following databases, which were manually-selected: ERIC, IBSS, CSA Linguistics, Language, and Behavior, PsychArticles, PsychINFO, and Sociological Abstracts.
Furthermore, potentially-relevant studies were cross-cited using the bibliographies of previous syntheses and identified studies. All studies were identified through the following process - titles and abstracts were first skimmed to identify potentially-relevant studies; if a study appeared to be a possible candidate, the full study was retrieved to the extent possible. If the study was not immediately available, Interlibrary Loan requests and librarian searches were pursued. If this did not succeed, attempts were made to contact the author of the study. Studies not retrieved at that point were deemed unavailable.
“Near-miss” studies were excluded at this point if closer examination revealed that they violated inclusion criteria or if an effect size could not be extracted from the information provided. As above, attempts were made to retrieve necessary information from the authors, though in many cases data were no longer available or the authors could not be reached. The “near miss” studies are included in the references section, but no further analyses were conducted with these studies.
The researcher functioned as the primary coder, and all of the studies were coded by the researcher. Reliability of inclusion and exclusion criteria, as well as coding of key substantive and methodological variables was assessed by comparing the primary coding with the coding of two independent coders. The additional coders were doctoral students with experimental and statistical training methods in the ExpERT program at Vanderbilt University. After some discussion of the inclusion and exclusion criteria and practice with an example, the other coders made inclusion/exclusion decisions for a sub-sample of 30 abstracts.
As already discussed, previous syntheses suggest that high-quality experimental studies are scarce in this field. Consequently, it seems appropriate to cast a wide net, a long-standing approach to social science syntheses (e.g.
As is often the case in meta-analysis, some studies reported data on several outcomes, and occasionally multiple measures of the same construct were provided by individual studies. For instance, a study may have measured outcomes of reading comprehension, reading fluency, and attitudes toward reading. Furthermore, both researcher-specific and state-mandated measures of reading comprehension were sometimes reported. For all such cases of multiple measures, the following general approach was used. First, every measure was coded in order to provide simple descriptive summaries of the kinds and frequencies of outcomes reported in the literature. Then, as part of the coding, outcomes were categorized into one of the five primary constructs outlined above. Finally, for situations in which multiple outcomes and/or measures were provided for any given construct in a single study (e.g. two different academic outcomes), a focal measurement was identified. In general, the most reliable instrument was coded as the focal instrument, though in cases where reliability information was not provided, the most widely-used measure was chosen. If neither of these criteria could be employed, the first measure discussed was chosen as a default. Although many meta-analyses average effects across measures, individual measures were utilized in this review because the measures varied considerably within constructs (e.g. math, reading and science within academic) and because coding of individual measures preserves the possibility of additional analyses at a later time. In any case, only one measurement for each of the five main constructs was identified as a focal instrument, allowing analyses within constructs that did not violate assumptions of independence.
A number of study and outcome characteristics were coded in order to enable analyses of the primary research questions as well as a number of potentially-relevant moderator analyses. A brief summary of the variables coded is provided here. Essentially, the variables included: study descriptors like design and quality, participant descriptors like age and language background, treatment descriptors like duration and frequency, and a variety of outcome descriptors. Key outcome descriptors included primary data like means and standard deviations as well as secondary calculations like effect sizes. While effect size statistics are discussed in more detail elsewhere, as much relevant information as necessary for effect size calculations was identified and coded, in keeping with guidelines provided by
Moderating variables are those that may affect overall effect size estimates leading to different effect sizes estimates for different values of the moderator. A number of study, treatment and participant variables were analyzed as moderators in CMA analysis and as correlates in SPSS. Separate analyses were conducted for each of these variables, and the results for these moderator analyses are presented separately for each moderator of interest. A potential limitation of multiple moderator analysis is that it does not account for covariation amongst moderators, and meta-regression is an alternative analysis that allows examination of the independent contributions of each variable to variance in the effect sizes. To the extent possible, meta-regression analyses of key moderators that affect outcomes was conducted to determine the unique contribution made to the variance of outcomes by methodological and substantive moderators. At minimum, single-variable regressions of potentially influential variables were run to test their viability as moderator variables, even if multivariate regression was untenable because of small sample size. Exploratory analyses of substantively important variables also included correlational analysis and descriptive statistics.
Finally, coding reliability was assessed through measurement of inter-rater reliability. Following exclusion/inclusion reliability assessment, the researcher met with the additional coders to discuss and practice using the coding manual on three examples. Following this initial training, the coders coded five studies independently. The researcher then met again with the coders to discuss the initial coding and to practice together again on two additional examples. Following the second training session, the two additional coders coded 10 more studies independently. Thus, the coders independently coded 15 studies each, with a total subsample of 25 studies included for the assessment of reliability. The studies were drawn evenly from published and unpublished studies. Cohen’s Kappa was calculated for categorical variables, while Pearson’s r was calculated for continuous variables. For variables with reliability coefficients low enough to be close to chance agreement, variable constructs were reexamined and disagreements were examined case by case to reach consensus.
The effect size statistic (ES) calculated was the Standardized Mean Difference(ESSM), which is appropriate for group contrasts made across a variety of dependent measures (
\(\overline{ES} = \dfrac{\overline{X}_{G1} - \overline{X}_{G2}}{s_{pooled}}_{pooled} = \sqrt{\dfrac{s_{1}^{2} (n_{1}-1) + s_{2}^{2} (n_{2}-1) }{n_{1} + n_{2} - 2}}\).
Thus, the mean effect size is calculated by dividing the difference between the mean for the treatment (XG1) and the mean for the control (XG2) by the pooled standard deviation (spooled). We see in the second formula that the pooled standard deviation (spooled) is equal to the square root of the sum of the weighted variance for the treatment group (s12 * [n1-1]) and the weighted variance for the control group (s22 * [n2-1]) divided by the pooled degrees of freedom (n1 + n2 - 2). In these formulas, s2 is the observed variance and n is the sample size.
The ESSM is known to be upwardly biased for small samples. Thus, the Hedges G transformation is traditionally used to correct for this bias
\(G = D \left(1-\dfrac{3}{4(n_{1} +n_{2}) - 9}\right)\).
Where Cohen’s D = ESSM, the biased effect size estimate weighted by a correction for small sample bias. This adjusted effect size, ES'SM, has its own SE and inverse variance weight formulas, as illustrated in
\(se = \sqrt{\frac{n_{1} + n_{2}}{n_{1}n_{2}} + \frac{\overline{ES}_{SM}}{2(n_{1} + n_{2})}} = \dfrac{1}{se^{2}}\)
However, the illustrated weight formula is appropriate only for fixed effects models which assume invariate effect sizes across studies. These assumptions are untenable given the broad constructs included in the proposed meta-analysis; consequently, a random effects model will be utilized in this meta-analysis, and the formulas for this model include another variance component in the denominator of the weight formula:
\(w_{i} = \dfrac{1}{se_{i}^{2} + \hat{\nu}_{\theta}}\)
In addition to the sampling error represented by the term sei2, the random effects weight includes a term for heterogeneous effect sizes, vθ. This additional term is a constant weight applied to every study, and can be computed as a method of moments estimate using the Q statistic, which is a measure of the heterogeneity of effect sizes within the sample. The formula for vθ is:
In this formula, Q is the heterogeneity statistic provided in standard CMA output, k is the number of effect sizes included in the analysis, and w is the fixed-effects weight calculated as before.
As indicated, heterogeneity was assessed using the Q statistic, which describes the degree to which effect sizes vary beyond the degree of expected sampling error. I2 is another useful measure of heterogeneity, and it indicates the amount of heterogeneity that exists between studies (
Additionally, outliers can be particularly problematic, with extreme observations affecting both effect size estimates by distorting the means of the distributions as well as calculations of variance. Furthermore, as meta-analysis is primarily a survey methodology interested in synthesizing studies and providing descriptions of typical effects, atypical results are not overly-informative. Consequently, Tukey’s guidelines were employed to identify outliers (3*IQR+75th percentile and 25th percentile-3*IQR). Results above and below these values were Winsorized to these cut-off points.
Another source of potential error involves designs that utilize cluster randomization in which intact groups are assigned en masse to conditions, and unless corrected, the standard errors upon which the inverse variance weights are based would be incorrect (
Similarly, in several studies, pre-test data was available, but the original researchers did not use pre-test data in their post data analyses. that is, pre-test differences were left unadjusted in final analyses. In these situations, post hoc adjustments were made by this researcher to control for pre-test differences. Simply, pre-test means were subtracted from post-test means for both the treatment and the control groups, and these differences were used as the mean gain scores from which effect sizes were computed.
Finally, a number of alternate computations were occasionally necessary. For instance, some studies did not provide ES estimates, and a number of formulations exist for converting other commonly reported data into ESSM. These other data include means and standard deviations, t-tests and degrees of freedom, and p values and sample sizes, and effect sizes using these alternative data were calculated as necessary.
General statistical analyses were computed using CMA and SPSS software; in particular, overall effect size analyses, some publication bias, and moderator analyses were computed with CMA, and diagnostic and descriptive analyses were conducted with SPSS.
Chapter Four presents the data obtained from descriptive, main effects, and moderator analyses, and Chapter Five will consider the extent to which the data answers the formal research questions detailed in Chapter Three. First, descriptive information is provided for the included sample of studies. Then, descriptive statistics, main effects analyses, and moderator analyses are provided for each of the outcome categories. Because each outcome category contains independent samples of effect sizes and because outcomes are assumed to be more conceptually similar within categories than between them, Chapter Four is organized primarily by outcome type to maintain statistical and conceptual clarity.
Initial keyword searches returned 17, 613 results, of which 148 were unique and potentially relevant. Additionally, extant meta-analyses and syntheses (e.g.,
Included sample of studies.
Lead Author |
Year |
Publication Type |
Country |
Construct |
Design |
Grade Level |
|
Alhaidari |
2006 |
Dissertation |
Saudi Arabia |
Cooperative |
Quasi-Experiment |
Elementary |
|
Alharbi |
2008 |
Dissertation |
Saudi Arabia |
Cooperative |
Experiment |
High School |
|
Almaguer |
2005 |
Journal |
USA |
Peer Tutoring |
Quasi-Experiment |
Elementary |
|
August |
1987 |
Journal |
USA |
Peer Tutoring |
Quasi-Experiment |
Elementary |
|
Banse |
2000 |
Dissertation |
Burkina Faso |
Collaborative |
Quasi-Experiment |
High School |
|
Bejarano |
1987 |
Journal |
Israel |
Cooperative |
Quasi-Experiment |
Middle School |
|
Brandt |
1995 |
Dissertation |
USA |
Cooperative |
Quasi-Experiment |
High School |
|
Bustos |
2004 |
Dissertation |
USA |
Cooperative |
Experiment |
Elementary |
|
Calderon |
1997 |
Technical Report |
USA |
Cooperative |
Quasi-Experiment |
Elementary |
|
Calhoun |
2007 |
Journal |
USA |
Cooperative |
Quasi-Experiment |
Elementary |
|
Chen |
2011 |
Journal |
USA |
Cooperative |
Quasi-Experiment |
High School |
|
Cross |
1995 |
Technical Report |
USA |
Collaborative |
Quasi-Experiment |
High School |
|
Dockrell |
2010 |
Journal |
England |
Collaborative |
Quasi-Experiment |
Pre-K |
|
Ghaith |
2003 |
Journal |
Lebanon |
Cooperative |
Quasi-Experiment |
High School |
|
Ghaith |
1998 |
Journal |
Lebanon |
Cooperative |
Quasi-Experiment |
Middle School |
|
Hitchcock |
2011 |
Technical Report |
USA |
Cooperative |
Quasi-Experiment |
Elementary |
|
Hsu |
2006 |
Dissertation |
Taiwan |
Collaborative |
Quasi-Experiment |
High School |
|
Johnson |
1983 |
Journal |
USA |
Peer Tutoring |
Experiment |
Elementary |
|
Jung |
1999 |
Dissertation |
South Korea |
Peer Tutoring |
Quasi-Experiment |
Elementary |
|
Khan |
2011 |
Journal |
Pakistan |
Cooperative |
Experiment |
High School |
|
Kwon |
2006 |
Dissertation |
South Korea |
Collaborative |
Quasi-Experiment |
High School |
|
Lin |
2011 |
Journal |
Taiwan |
Cooperative |
Quasi-Experiment |
Middle School |
|
Liu |
2010 |
Journal |
Taiwan |
Collaborative |
Quasi-Experiment |
Middle School |
|
Lopez |
2010 |
Journal |
USA |
Collaborative |
Quasi-Experiment |
Elementary |
|
Mack |
1981 |
Dissertation |
USA |
Collaborative |
Quasi-Experiment |
Elementary |
|
Martinez |
1990 |
Dissertation |
USA |
Cooperative |
Quasi-Experiment |
Elementary |
|
Prater |
1993 |
Journal |
USA |
Cooperative |
Experiment |
Elementary |
|
Sachs |
2003 |
Journal |
Hong Kong |
Cooperative |
Experiment |
High School |
|
Saenz |
2002 |
Dissertation |
USA |
Peer Tutoring |
Quasi-Experiment |
Elementary |
|
Satar |
2008 |
Journal |
Turkey |
Collaborative |
Experiment |
High School |
|
Slavin |
1998 |
Technical Report |
USA |
Cooperative |
Quasi-Experiment |
Elementary |
|
Suh |
2010 |
Journal |
South Korea |
Collaborative |
Quasi-Experiment |
Elementary |
|
Thurston |
2009 |
Journal |
Catalonia |
Peer Tutoring |
Quasi-Experiment |
Elementary |
|
Tong |
2008 |
Journal |
USA |
Collaborative |
Quasi-Experiment |
Elementary |
|
Uludag |
2010 |
Dissertation |
Jordan |
Collaborative |
Quasi-Experiment |
Middle/ High School |
|
Vaughn |
2009 |
Journal |
USA |
Peer Tutoring |
Quasi-Experiment |
Middle School |
The 37 included studies reported relevant data on 44 independent samples (i.e., several reports described multiple experiments or included independent samples) and contained a total of 132 outcomes. As indicated in the full coding manual (in the Excel spreadsheet that accompanies this dissertation Suppl. material
Summary of Key Variables in Included Sample
Year (n=43) |
Pre1980-1989 = 4 |
1990-1999 = 10 |
2000-2012 = 29 |
|
Publication Type (n=43) |
Dissertation = 15 |
Journal = 22 |
Technical Report = 6 |
|
Country (n=43) |
USA = 22 |
Other = 21 |
|
|
Setting (n=43) |
ESL= 23 |
EFL= 20 |
|
|
Design (n=43) |
Experimental = 8 |
Quasi- experimental= 35 |
|
|
Quality (n=43) |
High = 26 |
Medium = 13 |
Low = 4 |
|
Dosage (Total Contacts) (n=43) |
0-30 = 17 |
31-90 = 13 |
91+ = 13 |
|
Construct (n=43) |
Cooperative = 17 |
Collaborative = 16 |
Peer Tutoring = 10 |
|
Component (n=43) |
Yes =19 |
No =24 |
|
|
Adequate Facilities (n=23) |
Yes = 2 |
No = 3 |
Unknown = 18 |
|
Segregated (n=23) |
Yes = 9 |
No = 14 |
|
|
Culturally Relevant (n=23) |
Yes = 5 |
No =18 |
|
|
Language of Instruction (n=43) |
L1 only = 2 |
Bilingual = 14 |
L2 only = 14 |
Unknown = 13 |
In School (n=43) |
Yes = 43 |
No = 0 |
|
|
Teacher Certification (n=43) |
ELL Certified = 12 |
Not ELL Certified = 2 |
Unknown =29 |
|
Teacher Experience (n=43) |
0-5 years= 3 |
6-10 years= 4 |
11+ years= 4 |
Unknown= 32 |
Teacher Ethnicity (n=43) |
Same as Students’= 7 |
Different than Students’ = 1 |
Unknown = 35 |
|
Grade Level (n=43) |
Elementary = 22 |
Middle = 8 |
High = 13 |
|
Student Ethnicity (n=43) |
Spanish = 20 |
Asian = 8 |
Other = 15 |
|
Student SES (n=43) |
Low = 21 |
High = 3 |
Mixed = 1 |
Unknown = 18 |
Student Length of Residence (n=23) |
0-2 years = 1 |
2+ = 0 |
Unknown = 22 |
|
Key outcome variables.
Total Outcomes= 62 |
Number of Independent Outcomes by Construct |
Number of Participants in Treatment Groups |
Number of Participants in Control Groups |
Oral Language |
14 |
843 |
787 |
Written Language |
30 |
919 |
863 |
Other academic |
6 |
220 |
451 |
Attitudinal |
10 |
397 |
394 |
Social |
0 |
0 |
0 |
As indicated in Table
Table
Summary of Included Studies and Main Effects
A random effects model of the un-corrected and un-Winsorized data provided a mean effect size estimate for the thirteen oral language outcomes of (.587, SE=.141, p<.001); however, after adjustments for outliers, pre-test differences, and cluster randomization, the mean effect size estimate decreased slightly and the variance decreased slightly (.578, SE=.136, p<.001), suggesting that the larger-than-average outliers and the effects of cluster randomization had very little impact on the original estimates. The adjusted distribution is illustrated by the forest plot in Fig.
Throughout the paper, random effects models are the default, primarily because the assumptions of the fixed model are generally untenable. Empirically, homogeneity analysis of the fixed model illustrates the considerable heterogeneity that exists within the observed sample, offering some empirical justification for the use of a random effects model. The Q statistic (37.213, df=12, p<.001) indicates that the observed effect sizes vary more than would be expected by sampling error alone, and the I2 statistic (67.753) indicates that approximately 68% of the observed variance in effect sizes exists between studies. Together, this suggests that moderator analyses might provide insight into what factors influence the effectiveness of peer-mediated learning for ELLs.
Publication Bias for Oral Language Outcomes
The possibility of publication bias remains a persistent concern in meta-analysis, and the following analysis examines empirical evidence for the presence of publication bias in this sample and the extent to which it might distort the estimates. Lipsey and Wilson (1993, as cited in
A recoding of the type of publication variable into a dummy-coded variable with 1=published and 0=unpublished, indicated that 84.6% of the included sample had been published, while the other 15.4% were dissertations. The mean effect size for published studies (.377, SE=.067) is surprisingly much smaller than the mean effect size for unpublished studies (1.159, SE=.330). The difference between the mean effect sizes of -.782 provides a crude estimate of the upper bounds of potential publication bias. Of course, this simple difference does not adequately account for small sample bias nor does it employ inverse variance weights; consequently, appropriately meta-analytic tests of publication bias must also be utilized.
A look at a funnel plot with effect sizes plotted against standard errors is one meta-analytically-appropriate method of visually examining the distribution for the presence of publication bias. In this case, the standard error serves as a proxy for sample size, and because smaller samples are much more likely to lack the statistical power required to attain statistical significance, we look at the small-sample studies to detect publication bias. If there is no such bias, we expect small studies with negative and null results to be as frequent as small studies with positive results. The following funnel plot in Fig.
A computational alternative to visual inspection of the distribution is Egger’s regression intercept, as discussed in
Because we assume that publication bias will be positive, that is, in the direction of significantly positive effects and because it provides a more conservative estimate of significance, the p value of the single-tailed test at α=.05 is typically reported. The null hypothesis tests whether the ratio of the ES/se is > 0. While some debate exists about whether the single-tailed or two-tailed test is more appropriate, we see in Fig.
In conclusion, these varied analyses provide very little evidence for the possibility that publication bias is likely for the distribution of studies reporting oral language outcomes. Furthermore, the potential bias induced is small enough that if a sufficient number of small sample studies with null or negative results were included to make the distribution more symmetrical, the mean effect size estimate would hardly change. As indicated, very few studies in the sample have null or negative effect size estimates; as such, it remains distinctly possible that the literature search failed to uncover those studies that for one reason or another simply were not published because they failed to yield significantly positive results.
Moderator Analyses for Oral Language Outcomes
The distribution of oral language effect sizes was heterogeneous, as indicated by the Q and I2 statistics; consequently, we might expect post hoc examination of moderator variables to uncover some statistically-significant moderator variables. However, the sample is modest (n=13) and underpowered for meta-regression analysis of the partial contributions for multiple independent variables. Given these limitations, analysis of moderators is primarily motivated by a priori questions of interest, and findings are qualified by the recognition that small differences may be difficult to detect with the small sample employed and confounding and lurking variables may temper any observed differences between sub-groups. Occasionally, when a categorical variable had too few studies on one or more categories, the category was recoded, often into a binary variable, to enable a more reliable comparison. Table
Summary of moderator analyses for oral language outcomes.
Moderator (Sub-group) |
Number in sub-group |
Effect Size Point-estimate |
Standard Error of estimate |
p-value of estimate |
Q-within of Sub-group |
I2 of Sub-group |
Q-between in Random Effects Model |
Observed Inter- correlation |
Published |
|
|
|
|
|
|
.601 (p=.438) |
Yes |
Yes |
11 |
.377 |
.067 |
.000 |
29.005 (p=.001) |
65.523 |
|
|
No |
2 |
1.159 |
.330 |
.099 |
3.683 (p=.09) |
64.681 |
|
|
Study Quality |
|
|
|
|
|
|
4.089 (p=.129) |
Yes |
High |
7 |
.587 |
.164 |
.000 |
18.544 (p=.005) |
67.644 |
|
|
Medium |
4 |
.761 |
.364 |
.036 |
8.266 (p=.041) |
63.077 |
|
|
Low |
2 |
.174 |
.167 |
..299 |
.028 (p=.866) |
.000 |
|
|
Instrument Type |
|
|
|
|
|
|
2.513 (p=.285) |
Yes |
Researcher-created |
5 |
.478 |
.238 |
.045 |
10.408 (p=.034) |
61.570 |
|
|
Standard-Narrow |
6 |
.743 |
.204 |
.000 |
25.583 (p=.000) |
80.456 |
|
|
Standard-Broad |
2 |
.031 |
.420 |
.941 |
.0359 (p=.549) |
.000 |
|
|
Post Hoc Researcher Adjusted |
|
|
|
|
|
|
4.634 (p=.031) |
Yes |
Yes |
2 |
.174 |
.167 |
.299 |
.028 (p=.866) |
.000 |
|
|
No |
11 |
.675 |
.162 |
.000 |
34.863 (p=.000) |
71.136 |
|
|
Construct |
|
|
|
|
|
|
2.503 (p=.286) |
Yes |
Cooperative |
2 |
.105 |
.315 |
.738 |
10.283 (p=.068) |
51.378 |
|
|
Collaborative |
6 |
.506 |
.157 |
.001 |
.005 (p=.942) |
.000 |
|
|
Peer Tutoring |
5 |
.837 |
.348 |
.016 |
18.721 (p=.001) |
78.634 |
|
|
Component |
|
|
|
|
|
|
1.035 (p=.309) |
Yes |
Yes |
4 |
.388 |
.172 |
.024 |
7.406 (p=.06) |
59.494 |
|
|
No |
9 |
.651 |
.193 |
.001 |
24.013 (p=.002) |
66.684 |
|
|
Setting |
|
|
|
|
|
|
.380 (p=.538) |
Yes |
EFL |
5 |
.691 |
.269 |
.010 |
17.426 (p=.002) |
77.045 |
|
|
ESL |
8 |
.498 |
.161 |
.002 |
17.332 (p=.015) |
59.612 |
|
|
Segregated |
|
|
|
|
|
|
5.412 (p=.020) |
Yes |
Yes |
2 |
.230 |
.088 |
.009 |
.966 (p=.326) |
.000 |
|
|
Other (Not and Unknown) |
11 |
.686 |
.175 |
.000 |
26.944 (p=.003) |
62.866 |
|
|
Language of Instruction |
|
|
|
|
|
|
.681 (p=.711) |
Yes |
L1 (L1-only and bilingual) |
7 |
.649 |
.186 |
.000 |
24.282 (p=.000) |
75.291 |
|
|
L2 Only |
4 |
.427 |
.215 |
.047 |
2.36 (p=.501) |
.000 |
|
|
Unknown |
2 |
.702 |
.535 |
.189 |
9.946 (p=.002) |
89.946 |
|
|
Culturally Relevant |
|
|
|
|
|
|
.739 (p=.691) |
Yes |
Yes |
3 |
.413 |
.196 |
.035 |
7.405 (p=.025) |
72.933 |
|
|
No |
5 |
.572 |
.264 |
.03 |
6.701 (p=.153) |
40.309 |
|
|
Not U.S.A. |
5 |
.691 |
.269 |
.01 |
17.426 (p=002) |
77.045 |
|
|
Grade Level |
|
|
|
|
|
|
.240 (p=.624) |
Yes |
Elementary |
9 |
.628 |
.164 |
.000 |
25.846 (p=.001) |
69.047 |
|
|
Other |
4 |
.454 |
.314 |
.148 |
11.320 (p=.010) |
73.499 |
|
|
SES |
|
|
|
|
|
|
.194 (p=.908) |
Yes |
Low |
5 |
.518 |
.193 |
.007 |
6.821 (p=.146) |
41.36 |
|
|
High |
2 |
.788 |
.582 |
.176 |
3.099 (p=.078) |
67.731 |
|
|
Unknown |
6 |
.550 |
.202 |
.007 |
19.731 (p=.001) |
74.659 |
|
|
Student Hispanic |
|
|
|
|
|
|
.541 (p=.462) |
|
Hispanic |
7 |
.472 |
.181 |
.009 |
15.801 (p=.015) |
62.027 |
|
|
Other( Asian, Arabic, Bangladeshi, Israeli) |
6 |
.68 |
.217 |
.002 |
17.535 (p=.004) |
71.486 |
|
|
Student Asian |
|
|
|
|
|
|
.139 (p=.71) |
|
Asian |
3 |
.696 |
.376 |
.064 |
7.206 (p=.027) |
72.244 |
|
|
Other |
10 |
.545 |
.15 |
.000 |
28.272 (p=.001) |
68.166 |
|
|
As indicated in the Q-between column, only two moderators were statistically significant at the p=.05 level: post hoc researcher adjusted and segregated. In cases where post-test effects sizes were unadjusted for pre-test differences by authors in the original study reports, the researcher of this meta-analysis adjusted post-test effect sizes post hoc. In these cases, post hoc adjustments resulted in much smaller effect sizes on average (G=.174) than unadjusted (G=.675). This finding indicates that methodological rigor and care in synthesizing previous research can exert a large influence on reported results. The other significant moderator of the effectiveness of peer-mediated learning for improving oral language outcomes was whether or not the intervention occurred in settings where ELLs were segregated from their non-ELL peers. ELLs in segregated settings performed much lower (G=.230) than they did in settings that were not segregated or in settings for which segregation was unreported (G=.636). Some care should be taken when interpreting this result, in particular. First, the confluence of segregated settings with ambiguous settings (i.e., researchers did not report if segregated) presents some conceptual challenges in interpreting the results because some of the ambiguous settings may very well have been segregated in practice. Secondly, the number of studies that reported that they were segregated was relatively small (n=2), and so the estimate is not as precise as it could have been.
For all other variables, differences in mean effect sizes were evident across variables, but none proved to be significant moderators. Because the sample size for oral language outcomes is relatively small, this general lack of statistically significant moderators likely represents a lack of statistical power to detect meaningful differences. Thus, some of these moderators might prove significant if additional studies were included, and future meta-analyses may benefit from larger sample sizes as the field continues to produce experimental and quasi-experimental evaluations of peer-mediated learning.
Summary of Included Studies and Main Effects
A random effects model of the un-corrected and un-Winsorized data provided a mean effect size estimate for the twenty eight written language outcomes of (.551, SE=.111, p<.001); however, after adjustments for outliers, pre-test differences, and cluster randomization, the mean effect size estimate decreased and the variance increased slightly (.486, SE=.121, p<.001), suggesting that outliers and cluster randomization had some noticeable impact on the original estimates. The adjusted distribution of written language outcomes is illustrated by the forest plot in Fig.
The distribution of effect sizes for written language outcomes was even more heterogeneous than the distribution of oral language outcomes. The Q statistic (97.135, df=27, p<.001) indicates that the observed effect sizes vary more than would be expected by sampling error alone, and the I2 statistic (72.204) indicates that approximately 72% of the observed variance in effect sizes exists between studies. Together, this suggests that moderator analyses might provide insight into what factors influence the effectiveness of peer-mediated learning for ELLs for written language outcomes.
Publication Bias for Written Language Outcomes
A recoding of the type of publication variable into a dummy-coded variable with 1=published and 0=unpublished, indicated that 64.3% of the included sample were unpublished (i.e., technical reports and dissertations), while the other 36.7% were dissertations. The mean effect size for published studies (.442, SE=.24) is not much smaller than the mean effect size for unpublished studies (.524, SE=.142). The difference between the mean effect sizes of -.082 provides a crude estimate of the upper bounds of potential publication bias.
The funnel plot in Fig.
We see in Fig.
In conclusion, these analyses provide no evidence for the possibility that publication is likely for the distribution of studies reporting written language outcomes. Additionally, several studies in the sample have null or negative effect size estimates; thus, it seems unlikely that the literature search failed to uncover those studies that for one reason or another simply were not published because they failed to yield significantly positive results, and as indicated by the funnel plot and the difference in means between published and unpublished studies, the possible impact of studies lurking in the “the file drawer” on the mean effect size estimates appears relatively minor in this case.
Moderator Analyses for Written Language Outcomes
The distribution of oral language effect sizes was heterogeneous, as indicated by the Q and I2 statistics; consequently, we might expect post hoc examination of moderator variables to uncover some statistically-significant moderator variables. The sample is large enough (n=28) and sufficiently powered for meta-regression analysis of the partial contributions for at least a few, (e.g., 2-3) independent variables. As before, analysis of moderators is primarily motivated by a priori questions of interest, and findings remain qualified by the recognition that small differences may be difficult to detect with the size of the sample employed and confounding and lurking variables may temper any observed differences between sub-groups. Table
Summary of Moderator Analyses for Written Language Outcomes
Moderator (Sub-group) |
Number in sub-group |
Effect Size Point-estimate |
Standard Error of estimate |
p-value of estimate |
Q-within of Sub-group |
I2 of Sub-group |
Q-between in Random Effects Model |
Observed Inter- correlation |
Published |
|
|
|
|
|
|
.086 (p=.770) |
Yes |
Yes |
10 |
.442 |
.240 |
.065 |
38.89 (p=.000) |
76.858 |
|
|
No |
18 |
.524 |
.142 |
.000 |
55.851 (p=.000) |
60.562 |
|
|
Study Quality |
|
|
|
|
|
|
10.635 (p=.005) |
Yes |
High |
17 |
.637 |
.144 |
.000 |
56.534 (p=.000) |
71.7 |
|
|
Medium |
8 |
.328 |
.311 |
.291 |
31.991 (p=.000) |
78.119 |
|
|
Low |
3 |
-.095 |
.173 |
.582 |
.170 (p=.981) |
.000 |
|
|
Instrument Type |
|
|
|
|
|
|
1.107 (p=.575) |
Yes |
Researcher-created |
17 |
.411 |
.147 |
.005 |
35.743 (p=.003) |
55.236 |
|
|
Standard-Narrow |
7 |
.338 |
.168 |
.033 |
50.012 (p=.000) |
88.003 |
|
|
Standard-Broad |
4 |
.746 |
.420 |
.045 |
5.677 (p=.128) |
47.156 |
|
|
Post Hoc Researcher Adjusted |
|
|
|
|
|
|
9.058 (p=.003) |
Yes |
Yes |
3 |
-.095 |
.173 |
.583 |
.170 (p=.918) |
.000 |
|
|
No |
25 |
.554 |
.129 |
.000 |
88.612 (p=.000) |
72.916 |
|
|
Construct |
|
|
|
|
|
|
1.391 (p=.499) |
Yes |
Cooperative |
14 |
.632 |
.168 |
.000 |
64.105 (p=.000) |
79.721 |
|
|
Collaborative |
10 |
.376 |
.162 |
.02 |
9.94 (p=.355) |
9.460 |
|
|
Peer Tutoring |
4 |
.310 |
.414 |
.454 |
19.234 (p=.000) |
84.403 |
|
|
Component |
|
|
|
|
|
|
1.07 (p=.301) |
Yes |
Yes |
12 |
.633 |
.184 |
.001 |
30.714 (p=.001) |
64.186 |
|
|
No |
16 |
.385 |
.154 |
.012 |
55.422 (p=.000) |
72.935 |
|
|
Setting |
|
|
|
|
|
|
.023 (p=.879) |
Yes |
EFL |
17 |
.504 |
.170 |
.003 |
45.017 (p=.000) |
64.458 |
|
|
ESL |
11 |
.465 |
.184 |
.012 |
51.969 (p=.000) |
80.758 |
|
|
Segregated |
|
|
|
|
|
|
.504 (p=.478) |
Yes |
Yes |
5 |
.373 |
.135 |
.006 |
5.755 (p=218) |
30.942 |
|
|
Other (Not and Unknown) |
23 |
.518 |
.155 |
.001 |
91.38 (p=.000) |
75.952 |
|
|
Language of Instruction |
|
|
|
|
|
|
.274 (p=.872) |
Yes |
L1 (L1-only and bilingual) |
9 |
.457 |
.168 |
.007 |
20.971 (p=.007) |
61.853 |
|
|
L2 Only |
8 |
.402 |
.247 |
.104 |
36.976 (p=.000) |
80.976 |
|
|
Unknown |
11 |
.583 |
.258 |
.024 |
38.447 (p=.000) |
73.99 |
|
|
Culturally Relevant |
|
|
|
|
|
|
.101 (p=.951) |
Yes |
Yes |
2 |
.433 |
.148 |
.003 |
.095 (p=.758) |
0.000 |
|
|
No |
9 |
.474 |
.246 |
.053 |
51.54 (p=.000) |
84.478 |
|
|
Not U.S.A. |
17 |
.504 |
.17 |
.003 |
45.017 (p=.000) |
64.458 |
|
|
Grade Level |
|
|
|
|
|
|
10.863 (p=.004) |
Yes |
Elementary |
12 |
.539 |
.182 |
.003 |
59.259 (p=.000) |
81.437 |
|
|
Middle |
6 |
-.007 |
.134 |
.961 |
2.841 (p=.724) |
0.000 |
|
|
High |
10 |
.7 |
.204 |
.001 |
17.633 (p=.039) |
49.047 |
|
|
SES |
|
|
|
|
|
|
.052 (p=.820) |
Yes |
Low |
11 |
.516 |
.214 |
.016 |
45.141 (p=.000) |
77.847 |
|
|
Other (Includes High and Unknown) |
17 |
.456 |
.147 |
.002 |
48.222 (p=.000) |
66.820 |
|
|
Student Hispanic |
|
|
|
|
|
|
.005 (p=.945) |
|
Hispanic |
10 |
.471 |
.18 |
.009 |
41.128 (p=.000) |
78.117 |
|
|
Other (Asian, Arabic, African, Pakistani, Lebanese) |
18 |
.488 |
.172 |
.005 |
54.233 (p=.000) |
68.654 |
|
|
Student Asian |
|
|
|
|
|
|
.697 (p=.404) |
|
Asian |
6 |
.705 |
.32 |
.028 |
18.652 (p=.002) |
73.193 |
|
|
Other |
22 |
.418 |
.125 |
.001 |
67.671 (p=.000) |
68.967 |
|
|
Like the distribution of oral language outcomes, the distribution of written language outcomes demonstrated few significant moderators, indicating that peer-mediated learning is effective across a number of methodological, setting, and participant variables. However, three moderators were statistically significant at the p=.05 level: study quality, post hoc researcher adjusted, and grade level. As with oral language outcomes, post hoc adjustments of written language outcomes resulted in much smaller effect sizes on average (G=-.095) than unadjusted (G=.554), with the direction of the effect actually switching to support the comparison groups. For this distribution, study quality was also a significant moderator; as study quality increased, so did the magnitude of the mean effect size, a finding that is somewhat counterintuitive. One might actually expect that high quality designs would mitigate the influence of bias and accident, resulting in lower effects on average; however, this is similar to the findings in other meta-analyses of peer-mediated instruction that reported low quality studies tended to report lower effect sizes (e.g.,
Summary of Included Studies and Main Effects
A random effects model of the un-corrected and un-Winsorized data provided a mean effect size estimate for the twenty eight written language outcomes of (.234, SE=.079, p=.003); however, after adjustments for outliers, pre-test differences, and cluster randomization, the mean effect size estimate and the variance increased slightly (.250, SE=.13, p=.054), suggesting that outliers and cluster randomization had more impact on the standard error estimate than the mean effect size estimate. Heterogeneity for the observed sample of other academic outcomes was statistically indistinguishable from zero (Q=1.882, p=.757, I2=0.00). thus, not only were there too few studies to reliably conduct moderator analyses for this distribution, empirical evidence indicates that there is insufficient heterogeneity for moderators to explain the variance in effect sizes. Fig.
Publication Bias for Other Academic Outcomes
A recoding of the type of publication variable into a dummy-coded variable with 1=published and 0=unpublished, indicated that 80% of the included sample were published in journals; the other study was a dissertation. The difference in the mean effect size for published studies (G=.260, p=.078) and the mean of unpublished studies (G=.218, p=.424) is .042 and provides a conceptual limit of the effect of publication bias on the mean effect size estimate. A funnel plot of effect sizes plotted against the standard errors in Fig.
Egger’s regression test provides confirmatory evidence that publication bias is not a significant threat to the validity of the mean effect size estimate. As demonstrated in Fig.
In conclusion, the small sample of other academic outcomes shows a modest effect size of one quarter of a standard deviation that appears uninfluenced by publication bias. The small sample limits the viability of moderator analyses, and the lack of heterogeneity further discourages even exploratory analysis of the influence of moderators. The lack of included studies reporting outcomes for content areas like math, science or social studies is similar to the What Works Clearinghouse, which reports far more language outcomes than math outcomes. Similarly, a number of near-miss studies reported other academic outcomes but were excluded because they failed to meet methodological or other inclusion criteria. In general, it appears that this an emergent field of study, and future meta-analyses may prove useful as the field develops.
Summary of Included Studies and Main Effects
A random effects model of the un-corrected and un-Winsorized data generated a mean effect size estimate for the ten attitudinal outcomes of (.309, SE=.123, p=.012); however, after adjustments for outliers, pre-test differences, and cluster randomization, the mean effect size estimate and the variance increased noticeably (.419, SE=.194, p=.031), suggesting that outliers and cluster randomization had a moderate impact on the original estimates. Heterogeneity analysis indicate that the sample of effect sizes varies more than would be expected from sampling error alone, with about 60% of the variance occurring between studies (Q=28.806, p=.001, I2=68.756); thus, moderator analyses might be able to explain some of this variance. The forest plot of Attitudinal outcomes is depicted in Fig.
Publication Bias for Attitudinal Outcomes
A recoding of the type of publication variable into a dummy-coded variable with 1=published and 0=unpublished, indicated that 40%of the included sample were published, and the other 60% were dissertations. The mean effect size for published studies (.201, se=.216) is considerably smaller than the mean effect size for unpublished studies (.565, se=.305). The difference between the mean effect sizes of -.364 provides a crude estimate of the upper bounds of potential publication bias.
Visual inspection of the funnel plot in Fig.
Egger’s regression test offers some evidence of the probability of publication bias for the included sample of attitudinal outcomes and provides confirmatory analysis to support the fairly large difference in means between published and unpublished studies already presented. As illustrated below in Fig.
Moderator Analyses for Attitudinal Outcomes
The distribution of attitudinal effect sizes was heterogeneous, as indicated by the Q and I2 statistics; consequently, we might expect post hoc examination of moderator variables to uncover some statistically-significant moderator variables. However, the sample is fairly small (n=10) and underpowered for meta-regression analysis of the partial contributions for multiple independent variables. Given these limitations, analysis of moderators is primarily motivated by a priori questions of interest, and findings are qualified by the recognition that small differences may be difficult to detect with the small sample employed and confounding and lurking variables may temper any observed differences between sub-groups. Table
Summary of moderator analyses for attitudinal outcomes.
Moderator (Sub-group) |
Number in sub-group |
Effect Size Point-estimate |
Standard Error of estimate |
p-value of estimate |
Q-within of Sub-group |
I2 of Sub-group |
Q-between in Random Effects Model |
Observed Inter- correlation |
Published |
|
|
|
|
|
|
.947 (p=.330) |
Yes |
Yes |
4 |
.201 |
.216 |
.064 |
5.232 (p=.156) |
42.666 |
|
|
No |
6 |
.565 |
.305 |
. 352 |
21.834 (p=.001) |
77.1 |
|
|
Study Quality |
|
|
|
|
|
|
5.422 (p=.020) |
Yes |
High |
7 |
.650 |
.254 |
.011 |
19.624 (p=.003) |
69.426 |
|
|
Medium |
3 |
-.058 |
.167 |
.728 |
1.424 (p=.491) |
.000 |
|
|
Low |
0 |
|
|
|
|
|
|
|
Instrument Type |
|
|
|
|
|
|
2.382 (p=.123) |
Yes |
Researcher-created |
5 |
.711 |
.36 |
.048 |
17.538 (p=.002) |
77.192 |
|
|
Standardized (Broad and Narrow) |
5 |
.108 |
.151 |
.475 |
4.954 (p=.292) |
19.257 |
|
|
Post Hoc Researcher Adjusted |
|
|
|
|
|
|
5.383 (p=.020) |
Yes |
Yes |
1 |
-.254 |
.259 |
.327 |
.000 (p=.1.0) |
.000 |
|
|
No |
9 |
.509 |
.202 |
.012 |
23.275 (p=.003) |
65.628 |
|
|
Construct |
|
|
|
|
|
|
4.845 (p=.089) |
Yes |
Cooperative |
5 |
.181 |
.14 |
.196 |
1.366 (p=.85) |
.000 |
|
|
Collaborative |
3 |
.141 |
.275 |
.608 |
3.879 (p=.144) |
48.442 |
|
|
Peer Tutoring |
2 |
1.525 |
.603 |
.011 |
3.723 (p=.054) |
73.142 |
|
|
Component |
|
|
|
|
|
|
.134 (p=.715) |
Yes |
Yes |
2 |
.523 |
.278 |
.06 |
.442 (p=.506) |
.000 |
|
|
No |
8 |
.391 |
.23 |
.089 |
27.643 (p=.000) |
74.677 |
|
|
Setting |
|
|
|
|
|
|
.336 (p=.562) |
Yes |
EFL |
7 |
.466 |
.267 |
.08 |
26.195 (p=.000) |
77.095 |
|
|
ESL |
3 |
.264 |
.225 |
.239 |
2.461 (p=.292) |
18.745 |
|
|
Segregated |
|
|
|
|
|
|
.918 (p=.338) |
Yes |
Yes |
2 |
.176 |
.229 |
.442 |
1.243 (p=.265) |
19.543 |
|
|
Other (Not and Unknown) |
8 |
.5 |
.249 |
.045 |
26.984 (p=.000) |
74.059 |
|
|
Language of Instruction |
|
|
|
|
|
|
.973 (p=.615) |
Yes |
L1 (L1-only and bilingual) |
4 |
.651 |
.4 |
.104 |
19.997 (p=.000) |
84.998 |
|
|
L2 Only |
3 |
.316 |
.258 |
.22 |
1.155 (p=.561) |
.000 |
|
|
Unknown |
3 |
.169 |
.281 |
.547 |
4.78 (p=.092) |
58.157 |
|
|
Culturally Relevant |
|
|
|
|
|
|
.336 (p=.562) |
Yes |
Yes |
0 |
|
|
|
|
|
|
|
No |
3 |
.264 |
.225 |
.239 |
2.461 (p=.292) |
18.745 |
|
|
Not U.S.A. |
7 |
.466 |
.267 |
.08 |
26.195 (p=.000) |
77.095 |
|
|
Grade Level |
|
|
|
|
|
|
2.237 (p=.135) |
Yes |
Elementary |
6 |
.667 |
.333 |
.045 |
21.943 (p=.001) |
77.213 |
|
|
Middle |
0 |
|
|
|
|
|
|
|
High |
4 |
.119 |
.153 |
.434 |
3.322 (p=.345) |
9.073 |
|
|
SES |
|
|
|
|
|
|
.919 (p=.338) |
Yes |
Low |
3 |
.168 |
.205 |
.412 |
1.97 (p=.373) |
.000 |
|
|
Other (Includes High and Unknown) |
7 |
.487 |
.261 |
.062 |
45.141 (p=.000) |
77.138 |
|
|
Student Hispanic |
|
|
|
|
|
|
.004 (p=.95) |
|
Hispanic |
4 |
.387 |
.221 |
.081 |
4.096 (p=.251) |
26.76 |
|
|
Other (Arabic, Asian, and Turkish) |
6 |
.41 |
.292 |
.16 |
24.666 (p=.000) |
79.729 |
|
|
Student Asian |
|
|
|
|
|
|
1.166 (p=.280) |
|
Asian |
2 |
1.166 |
.913 |
.202 |
13.835 (p=.000) |
92.772 |
|
|
Other |
8 |
.171 |
.125 |
.170 |
7.735 (p=.357) |
9.497 |
|
|
As with the other outcomes already discussed, most of the moderators proved insignificant predictors of variability in the effectiveness of peer-mediated learning at promoting attitudinal outcomes for ELLs; most likely, the low power prevented the detection of other meaningful effects. Nonetheless, a few variables proved to be significant (or nearly significant) moderators of attitudinal outcomes: post hoc researcher-adjusted, study quality, and the type of peer-mediated learning. The only variable to consistently prove significant as a moderator across outcome types was post hoc researcher adjustment for effect sizes that were unadjusted by the original researchers, and as before, post hoc adjustment resulted in much smaller average effect sizes (G=-.254) than unadjusted effect sizes (G=.509). Another methodological variable proved a significant moderator of attitudinal outcomes; in this case, study quality proved significant, and as with written outcomes, higher quality studies were associated with higher effect sizes. Finally, the type of peer-mediated learning (i.e., Construct) approached statistical significance, with peer tutoring studies (G=1.525) reporting much larger effect sizes than either cooperative (G=.181) or collaborative (G=.141). However, only two studies in this distribution of outcomes reported using peer-mediated learning, and consequently, caution should be used when interpreting this result. Nonetheless, given the reliability of the estimate (p=.011), it seems likely that an effect size of this magnitude is fairly meaningful despite the small sample size upon which the estimate is based.
While Chapter 4 was organized by outcome type, the remainder of the paper is organized by the research questions presented in Chapter 3. As such, Chapter 5 is intended to synthesize findings across outcome types, and this requires a fairly organic combination of quantitative, formal hypothesis testing analysis and qualitative, pattern-seeking analysis. After addressing each of the research questions, a final section presents important limitations of this study and provides some recommendations for future research.
Research Question 1: Is peer-mediated instruction effective at promoting language, academic, or attitudinal learning for English language learners in K-12 settings?
Research Question 1 is the core question of the meta-analysis, and everything else is secondary or exploratory in comparison. Essentially, this question asks if peer-mediated learning works for ELLs, which is the most basic of effectiveness questions. Taken together, the results of the main effects analyses for all four of the available outcome types support the assertion that peer-mediated learning is very effective at promoting a number of learning outcomes for ELLs.
Specifically, the results for oral language outcomes (.578, SE=.136, p<.001) and written language outcomes (.486, SE=.121, p<.001) confirm Hypothesis 1a, which asserted that language outcomes would be significantly larger for interventions utilizing peer-mediated learning than control conditions. Both estimates are highly reliable at α=.001, and both estimates appear unaffected by publication bias. Thus, data indicate that the alternative hypothesis of a significant difference favoring peer-mediated learning over teacher-centered or individualistic learning for ELLs cannot be rejected. Moreover, these effect sizes are of large enough magnitude to be practically significant. Compared to previous meta-analyses of cooperative learning which found effect sizes in the range of .13-1.04 (
Similarly, the main effects analyses for other academic outcomes supports the assertion in Hypothesis 1b that peer-mediated learning would produce larger academic gains than control conditions. The mean effect size for other academic outcomes (.250, SE=.13, p=.054) is just significant at α=.05, though the estimate is based on a modest sample that appeared somewhat influenced by outliers and methodological concerns. After post hoc adjustments were made, the reliability of the estimate dropped from p=.003 to p=.054, suggesting that some caution should be given to strong claims about the reliability of the estimate. Moreover, the correction of bias induced by cluster randomization reduced heterogeneity in the sample to zero, indicating that moderator analyses were unsuitable for this distribution. Nonetheless, publication bias seems unlikely for this distribution of outcomes. The magnitude of the mean effect size of .250 appears a little smaller than the effect sizes of cooperative learning on academic outcomes reported by
Finally, the main effects analysis of attitudinal outcomes indicates that peer-mediated learning is effective at promoting motivation and similar psychologically-oriented outcomes for ELLs. The mean effect size estimate (.419, SE=.194, p=.031) is large and statistically significant at α=.05. However, it appears likely that the estimate is affected by publication bias, thus the magnitude of the estimate may be larger than it would be if all studies conducted had been published. As it stands, the current mean effect size estimate is comparable to the magnitude of previous syntheses of cooperative learning, in general (
In conclusion, analysis of all four outcome types indicates that the answer to research question 1 is yes, peer-mediated learning is effective at promoting a number of learning outcomes for ELLs. In fact, the estimates tended to be quite large in comparison to other instructional approaches, suggesting that peer-mediated learning is especially effective for ELLs. That effects for language outcomes are larger than effects for academic outcomes is consistent with previous syntheses supporting the linguistic rationale for peer-mediated learning. On the other hand, a sociocultural theory of learning would explain the difference by arguing that academic learning is largely mediated by language, and thus, ELLs must learn the language of the content areas before they can master the academic content. However, it could simply be that the small sample of academic outcomes simply needs to include more studies to accurately capture the effectiveness of peer-mediated learning at promoting academic learning. Unfortunately, the design of this study is insufficient to definitively discern the correct answer, and these explanations remain largely speculative.
Nonetheless, the results of the first research question answer the call of the National Reading Panel on Minority-language Youth and Children to determine if the various aspects of effective instruction highlighted by qualitative research are individually effective “…these factors need to either be bundled and tested experimentally as an intervention package or examined as separate components to determine whether they actually lead to improved student performance” (
Research Question 2: What variables in instructional design, content area, setting, learners, or research design moderate the effectiveness of peer-mediated learning for English language learners?
The second research question is intended to provide a more nuanced understanding for the answer to research question 1; essentially, the first question answers “What works?”, and the second question attempts to answer “For whom, and under what conditions?.” The following section details the answers to a large number of specific hypotheses of the influence of particular moderators and concludes with a summarizing synthesis of the effects of moderators across outcome types.
Given ambivalence in the previous literature regarding the effectiveness of specific cooperative, collaborative, and peer-mediated approaches, Hypothesis 2a suggested that there would be no significant difference among the three peer-mediated constructs, and the results of moderator analyses across the three outcome types generally support this hypothesis. For oral and written language outcomes, Construct was insignificant as a predictor, and Construct only approached significance as a predictor for attitudinal outcomes. Notably, the ES estimate for peer-mediated learning was very large (ES=1.525) for the attitudinal distribution, and it was based on only two studies. Thus, the fact that the moderator appeared nearly significant for this outcome distribution may very well reflect a larger-than-average estimate resulting from a very small sample of studies. Moreover, while peer-mediated learning provided the largest effect sizes in two of the three distributions (attitudinal and oral language), cooperative was the largest in written language outcomes, which was the distribution with the largest sample of included studies. Thus, even a qualitative analysis of the rank order of the three constructs suggests that no single version of peer-mediated learning was consistently more effective than the others. This actually affirms a theoretical orientation of this meta-analysis, which posits that a sociocultural explanation of the effectiveness of peer-mediated learning, in general, is that it is through mediated interaction that ELLs learn best. However, the fact that peer tutoring and cooperative learning are the two most structured forms of peer-mediated learning also lends tentative support to claims in the literature that high structure promotes the most learning (eg.,
Hypothesis 2b claimed the language setting EFL or ESL, would not significantly moderate the effectiveness of peer-mediated learning for ELLs. Despite significant differences in the two types of settings (e.g., availability of native speakers and amount of exposure to the target language), both fields advocate the use of interactive methods, and consequently, a null hypothesis was forwarded. Empirical evidence across all three available outcome types suggests that the null hypothesis of no difference between EFL and ESL settings cannot be rejected. Setting was not a significant moderator for any of the outcome types; in fact, the significance of the moderator did not even approach significance for any of the distributions. Interestingly, mean effect sizes were actually larger in EFL settings across all three outcome types (i.e., oral language, written language, and attitudinal). This is surprising given that EFL settings provide less exposure to English input and fewer native language models; however, it supports output models of second language acquisition (e.g.,
Hypothesis 2c posited no significant difference in the effectiveness of peer-mediated learning at different grade levels. To some extent, this is a participant-level question about the effectiveness of peer-mediated learning with students of different ages, but it is analyzed here as a setting-level moderator to reflect differences in pedagogy and instructional delivery associated with these various grade levels. In practice, this moderator addresses aspects of both setting and participant.
Results of moderator analyses across outcome types provide ambivalent support for this hypothesis. For oral language and attitudinal outcomes, Grade was not a significant moderator, though it was analyzed as different bivariate variables for oral outcomes (i.e., elementary vs. other) and attitudinal outcomes (elementary vs. high school) because of availability of data in each distribution. However, for written language outcomes, which contained sufficient studies to analyze all three grade levels, Grade proved to be a significant moderator of effectiveness (Q=10.863, p=.004), mostly because the mean effect size was very low for middle school. In fact, middle school was consistently lower than elementary or high school estimates, suggesting that peer-mediated learning might not be as effective for middle school ELLs. This is markedly different than the general pattern for educational intervention studies which tend to report larger effect sizes for middle school than either elementary or high school (
Hypothesis 2d could not be directly tested as a moderator in this meta-analysis because the sample of studies included only studies conducted in classrooms.
Hypothesis 2e posited no significant difference between interventions that were entirely peer-mediated (e.g., Jigsaw) and those for which peer-mediated learning was one component of a complex intervention (e.g., Bilingual Cooperative Integrated Reading Comprehension), and this moderator was intended to test a claim by Slavin that complex interventions like Success for All provide the greatest benefits (e.g.,
Hypothesis 2f posited no significant difference of the effectiveness of peer-mediated learning for students from differing language backgrounds. Due to limitations in the included sample and the reported data and because culture and language interact in complex ways, student ethnicity was used as a proxy measure of language background. Moderator analyses for all three outcomes suggest that the null hypothesis of no significant difference cannot be rejected. In fact, this variable was tested in two different ways: Hispanic vs. Other and Asian vs. Other. A number of important limitations of these coding categories should be mentioned. First, neither Hispanic nor Asian are monolithic categories; each contains a wide diversity of language, cultural, and geographic variability. Secondly, comparing these two categories to all others faces the same limitation of masking important variability in language and cultural difference. However, these two were chosen because the included sample contained a particularly large number of Hispanic, or Spanish-speaking, participants, Latinos are the largest group of ELLs in the United States, Asians are the fastest growing group of ELLs in the United States, and because at least some research suggested peer-mediated learning may be ineffective for Asians (e.g.,
Hypothesis 2g predicted no significant difference in the effectiveness of peer-mediated learning for students from high- or low-SES backgrounds, and moderator analyses across all three outcome types support this null hypothesis. Notably, SES was analyzed somewhat differently for written language outcomes (i.e., low vs other) than for oral language or attitudinal outcomes because of a lack of sufficient studies in the other two categories. Also, it is noteworthy that for all three outcome types, Unknown was the most frequently coded category, suggesting that findings are somewhat tentative and reflect a lack of careful reporting in the literature base.
Finally, Hypotheses 2h and 2i predicted a significant difference favoring high quality studies. Specifically, 2h posited that high-quality studies (i.e., tested for pre-test differences AND adjusted for pre-test differences) would outperform medium or low-quality studies, and moderator analyses for written language and attitudinal outcomes support this alternative hypothesis. However, study quality was not a significant predictor for oral language outcomes, and medium quality studies actually reported the highest average effect sizes. Thus, moderator analyses provide somewhat ambivalent support for Hypothesis 2h. Hypothesis 2i predicted a significant difference favoring higher dosage studies (i.e., total number of contacts) than for lower dosage studies, and moderator analyses across all three outcome types failed to support this hypothesis. Thus, the null hypothesis of no significant difference could not be rejected for the moderating influence of dosage.
Finally, another study quality moderator, for which there was no a priori hypothesis, proved important: post hoc researcher adjustment, which indicated that this researcher subtracted the post-test mean from the pre-test mean in order to control for unadjusted pre-test differences. Actually, this is the only moderator variable that proved a significant moderator for all three outcome types, and this finding indicates that not controlling for pre-test differences can have a very large impact on effect size estimates.
Research Question 3: In what ways do select issues of power and equity impact the effectiveness of peer-mediated methods?
This third research question is intended to situate the more typical effectiveness findings just discussed within the equity-oriented statement of the problem presented in Chapter 1; that is, the intention of this research question is to expand the typical effectiveness questions of what works, for whom, and under what conditions to include equity-driven variables that the literature indicates are crucial for the academic success of ELLs. To that end, the following hypotheses examine the influence of a number of equity moderators; however, to be clear, the included variables are not exhaustive not does the operationalization of equity implicit in the selection of moderating variables represent the most complex conception of equity available. Rather, these are explorations of equity and how equity-oriented variables may influence the effectiveness of a particular kind of instruction for ELLs.
Hypothesis 3a was an alternative hypothesis that predicted lower effect sizes for ELLs in settings where they are segregated from their peers. This hypothesis is complicated by the fact that many bilingual models intentionally segregate ELLs in order to provide extended, targeted language instruction. Nonetheless, exposure to native language peers offers linguistic, social, and academic advantages that motivate the prediction that ELLs will perform worse in segregated settings. Moderator analyses across the three outcome types offer ambivalent evidence that generally failed to support this hypothesis. However, for oral language outcomes, segregation was a significant moderator, and ELLs demonstrated larger oral language gains in non-segregated settings, as predicted. In fact, qualitative analyses of the written language and attitudinal distributions indicate that non-segregated settings reported higher average effect sizes, which taken with the significant effect for oral language outcomes offers some tentative support to the hypothesis.
As indicated in Table
Similarly, Hypotheses 3c and 3e posited that higher quality teachers would result in more learning gains for ELLs, but very few studies actually reported this information and formal moderator analyses were not possible to test these two hypotheses.
Hypothesis 3d, on the other hand, predicted that culturally-relevant instruction would lead to high learning gains for ELLs. Again, very few studies coded this information, but because the coding was dichotomous and identified whether or not authors made even a cursory claim of cultural relevance, it was possible to code no even when authors did not report the information. Moderator analyses failed to support the hypothesis, however. For attitudinal outcomes, not one study claimed to be even slightly culturally-relevant. For oral language and written language outcomes, qualitative analysis indicates that those studies claiming any cultural relevance actually reported lower effect sizes on average. Overall, the very low bar for coding studies as culturally-relevant resulted in surprisingly few studies coded as culturally relevant, indicating that very little can be said about the moderating effect of strong forms of culturally-relevant instruction on the effectiveness of peer mediation for ELLs.
Finally, Hypothesis 3f predicted that interventions using students’ native language would be more effective than those using only English. This represents an empirical test of the application of the largest literature base on equity-oriented effectiveness research for ELLs. That is, five meta-analyses of the effectiveness of using students’ native language have consistently found that bilingual models outperform English-only models, and this hypothesis is intended to extend that to a particular instructional approach. As coded for these analyses, moderator analysis across all three outcomes consistently failed to support the assertion that using students’ native language produced larger effects than interventions that used only English. Notably, for all three outcome types, one study reported using students’ L1 exclusively (see Suppl. material
Overall, the hypotheses about the importance of equity demonstrate that effectiveness research continues to focus on academic and psychological factors to the exclusion of issues of power and equity. Very few studies reported sufficient information to code these variables, and consequently, the claims that could be tested or supported are relatively few and tentative. Despite these shortcomings, analyses offer some support to claims that that equity variables moderate the effectiveness of peer mediation for ELLs. For instance, segregation proved to be a significant moderator for oral language outcomes, and in all three outcome types, segregated settings produced smaller effect sizes that non-segregated settings. Similarly, effect sizes in all three outcome types were larger for interventions that used students’ native language for instruction.
Limitations and Future Directions
These findings consistently indicate that peer-mediated learning is effective for ELLs nonetheless, there a number of important limitations to consider. For instance, this meta-analysis is limited by reporting in the original studies, and as discussed many important variables were either excluded from formal analyses or modified in some way because of limitations in the extant literature base. Similarly, these findings are based on a modest sample of studies; and analyses of some outcome types were severely limited by sample size. Future research may benefit from a growing literature base. The lack of statistically significant moderators, for instance, likely represents a lack of statistical power to detect practically meaningful differences rather than strong evidence that no difference actually exists. Future meta-analyses may benefit from the inclusion of additional studies that seem likely to be conducted given the ongoing interest in cooperative learning research for ELLs indicated by the large proportion of recent studies included in this sample.
Furthermore, the inclusion of low- and medium-quality studies may influence the findings, and there are certainly those that argue only the highest-quality studies should be included in research syntheses. As argued, ELLs represent an emergent field of research, and much effort was made to analyze the influence of study quality on the effects reported in this meta-analysis. Of course, all secondary data analyses are limited by the quality of the data they analyze, and this limitation is hardly unique to this particular meta-analysis.
Another limitation common to meta-analyses was availability of studies and data. Considerable effort was made to identify and retrieve the entire population of studies conducted on the effectiveness of peer-mediation, but certainly, some studies were missed. Moreover, some studies deemed relevant and qualified were missing data. Even after attempts to contact the authors, occasionally the studies were too old and even the original authors no longer had access to the data. Similarly, this meta-analysis is a product of its particular time, and search tools (e.g., electronic databases and e-mail) are likely biased towards more recent research. Thus, the findings reported in this meta-analysis are limited by the availability of data, and missing data may affect the internal validity of the result, as well as the ability of the sample to accurately estimate general population parameters.
Finally, a number of variables of interest were operationalized in ways that reflected availability of data or that allowed for reliable coding. However, the operationalizations of these variables likely simplified constructs of interest (e.g., equity); consequently, the findings presented in this study may only be of limited use for those doing research within any one of these fields. Similarly, the expansion of certain constructs (e.g., ELL) to include multiple variables (e.g., ESL and EFL) may affect the generalizability of these findings.
Future research should examine other potential moderators, including setting (e.g., laboratory settings), instructional variables (e.g., task type), teacher (e.g., beliefs and attitudes), and student (e.g., social capital and student use of L1) that are known to influence the effectiveness of peer-mediated methods and the learning of ELLs. Similarly, study quality variables (e.g., fidelity of implementation) were generally under-reported in this sample, and future research should examine the moderating influence these may exert on the mean effect size. Additionally, future research should explore in more detail the mechanisms that make peer-mediated learning effective for ELLs; for example, why does peer-mediated learning appear more effective at promoting language outcomes than academic outcomes? Clearly, more attention should be paid to important factors like the certification and experience of teachers, the adequacy of the facilities, and the length of residence or previous schooling of ELLs. The nearly complete absence of this data in the literature base for this study marks a knowledge gap that is unacceptable, especially given a clear literature base demonstrating the importance of these variables for ELLs.
This work was supported, in part, by Vanderbilt's Experimental Education Research Training (ExpERT) grant (David S. Cordray, Director; grant number R305B040110). I am grateful for the faith the ExpERT folks demonstrated in selecting me, and the financial and technical support and training they provided were instrumental in the completion of this study. In particular Mark W. Lipsey, Director of the Peabody Research Institute, and David S. Cordray, director of the EspERT Program at Vanderbilt University, provided countless hours of guidance and support throughout my time at Vanderbilt.
I am also grateful to the support and guidance of my Dissertation Committee. To David Dickinsono, thank you for the tough, insightful readings of my my MAP and dissertation. You did exactly what I asked you to do. To Bridget Dalton, the kindness with which you tempered your feedback was always welcome, and your questions and comments always drove me to think deeply about some aspect of my work. To Mark Lipsey, your technical expertise was invaluable, and even when suggesting significant changes or advancing challenging critiques, you always made me feel that this work was important and valued. Mostly, to my advisor, mentor, and friend, Robert Jiménez, you have taught me more than I expected to be able to learn these last five years, and in the process you’ve become more than just another colleague to me. I am inspired by your example and more than grateful for all you’ve done to prepare me. I look forward to a long career as your colleague.
Finally, I am grateful beyond words for the love and support of my family. To my wife and daughter, you’ve sacrificed more hours than I care to remember in support of this accomplishment. Your love has been my refuge and my strength through these last five years, and I hope to share with you the fruits that your seeds of love have nurtured. To my parents, both by blood and by bond, your financial and emotional support made so much of this journey possible, and your own lives’ works are the models for the work I still hope to accomplish.
These are studies that were potentially-relevant to the meta-anlayses, but that were ultimately excluded during inclusion coding. Future researchers might find this list especially valuable.
English language learner is only one of many terms that refer to linguistically diverse students. Other terms like Limited-English proficient and language minority convey deficiency-oriented or disempowering.
It is worth noting that “peer-mediated learning” is sometimes used to refer to a more-specific subset of these approaches, especially when used with learning disabled students (e. g.
The important theoretical issues raised in this meta-analysis are largely distinct from the questions analyzed and synthesized in the Major Area Paper to which this comment refers. However, the idea that sociocultural theory might prove heuristically useful is explored in this paper. Thus, little explanation for this bias is given here, and readers are encouraged to examine the evidence that warrants this presumption.
Nonetheless, this meta-analysis primarily employs the term effectiveness to emphasize the ability of peer-mediated approaches to improve outcomes for ELLs on discrete measures or instruments, even when those measures assess constructs like out-group relations.
Notably, this is the same research that informed the historic Brown v Board decision that created the legal foundation for the desegregation, if not integration, of public schools in the United States.
The authors actually report the inverse-variance adjustment for small samples as d+, but it is based on Hedge’s original work and is more commonly referred to as Hedges’ g; as such, figures are reported here as g.
It is important to distinguish this assertion from a deficit view of ELLs. Asserting that English proficiency is a barrier to mainstream instruction is not intended to be equivalent to an assertion that ELLs are deficient learners. All ELLs come to school proficient in at least one language, and many are proficient in several. Rather, like the landmark ruling in Lau v Nichols, the assertion is intended to indicate that most instruction in the US is provided in English by monolingual, White teachers; and without affirmative efforts to make the curriculum accessible to ELLs, these students do not generally have a chance to succeed in most US classrooms.
While theoretically distinct, the more individualistic and cognitive orientations (e.g., traditional second language acquisition interaction and cooperative learning) and the more socially-oriented (e.g., sociocultural theory) perspectives share conceptual common ground. Thus, although the theoretical differences are acknowledged, the assertion of a conceptual common ground enables the inclusion of studies from all three theoretical orientations.
While there was some discussion of second language and foreign language differences in the results, the authors report too few FL settings to make substantial claims.