Skip to content Skip to sidebar Skip to footer

13 Counterbalancing Is Used With Within-participants Designs to Control for

Researchers within the field of applied linguistics have long used experiments to investigate cause-effect relationships regarding the use and learning of second languages (L2s). In experimental research, one or more variables are altered and the effects of this change on another variable are examined. This change or experimental manipulation is usually referred to as the treatment. Researchers typically draw upon either experimental or quasi-experimental research designs to determine whether there is a causal relationship between the treatment and the outcome. This chapter outlines key features and provides examples of common experimental and quasi-experimental research designs. We also make recommendations for how experimental designs might best be applied and utilized within applied linguistics research.

Figures - uploaded by John Rogers

Author content

All figure content in this area was uploaded by John Rogers

Content may be subject to copyright.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

133

11

Experimental and

quasi-experimental designs

John Rogers and Andrea Révész

Introduction

Researchers within the eld of applied linguistics have long used experiments to investigate

cause–eect relationships regarding the use and learning of second languages (L2s). In experi-

mental research, one or more variables are altered and the eects of this change on another

variable are examined. This change or experimental manipulation is usually referred to as the

treatment. Researchers typically draw upon either experimental or quasi-experimental research

designs to determine whether there is a causal relationship between the treatment and the out-

come. This chapter outlines key features and provides examples of common experimental and

quasi-experimental research designs. It also makes recommendations for how experimental

designs might best be applied and utilized within applied linguistics research.

Experimental and quasi-experimental research

Experimental and quasi-experimental research designs examine whether there is a causal

relationship between independent and dependent variables. Simply dened, the independent

variable is the variable of inuence and the dependent variable is the variable that is being

inuenced (Loewen & Plonsky, 2016). In other words, the independent variable is expected

to bring about some variation or change in the dependent variable. For example, in a study

examining the impact of oral corrective feedback on grammatical development, corrective

feedback will serve as the independent variable and grammatical development as the depend-

ent variable. Moderating variables are another type of variable that are often of interest in

experimental and quasi-experimental research. Moderating variables are dened as variables

that modify the relationship between an independent variable and a dependent variable. If the

previous study of corrective feedback also investigates how working memory may inuence

the extent to which learners benet from feedback (e.g., Révész, 2012a), working memory will

function as a moderating variable in the design.

Non-experimental designs can also be used to investigate cause–eect relationships between

independent and dependent variables, but there are a number of dening features that mark true

experimental research. True experiments involve the manipulation of one or more independent

John Rogers and Andrea Révész

134

variables, and the dependent variables are carefully measured, typically in the form of pre- and

posttesting. True experiments also include a control group and an experimental group. The

control only takes part in the pre- and posttesting, whereas the experimental group receives the

experimental treatment in addition to completing the pre- and posttesting. Finally, true experi-

ments are characterized by random assignment; that is, participants are randomly placed into

the control and the experimental condition following a chance procedure (Gravetter & Forzano,

2018; Hatch & Lazaraton, 1991; Kirk, 2009; Loewen & Plonsky, 2016; Nunan, 1992).

The main feature that distinguishes non-experiments from true experiments is the lack

of random assignment. Quasi-experiments are a subtype of non-experiments that attempt

to mimic randomized, true experiments in rigor and experimental structure but lack random

assignment (Cook & Wong, 2008; Kirk, 2009). Quasi-experimental studies do not require a

true control group, but may include a comparison group. A comparison group is an additional

experimental group that receives a dierent experimental treatment. Non-experiments may

also take the form of pre-experimental designs. Pre-experimental designs use neither a con-

trol nor a comparison group (Nunan, 1992). As such, experimental and quasi-experimental

designs allow researchers to draw more unambiguous conclusions as to the causal relationship

between two variables (Marsden & Torgerson, 2012).

The quality of experimental research is usually considered in terms of its reliability and

validity. Reliability refers to the extent to which a measurement or an experimental proce-

dure elicits consistent interpretations about the construct that it sets out to measure (Norris &

Ortega, 2003). The reliability of an experimental study may suer due to various sources of

random error in measurement, including issues to do with the context, data collection proce-

dures, characteristics of the instruments, analytical procedures, and participant idiosyncrasies

(Norris & Ortega, 2003). Reliability is considered a prerequisite for validity but does not

guarantee it. Validity refers to the soundness of a study (Loewen & Plonsky, 2016); that is,

the degree to which the results of a study accurately answer the question that it set out to

answer (Gravetter & Forzano, 2018; Révész, 2012b). Any aspect of the experiment that raises

doubts as to whether the results have led to accurate and meaningful interpretations threat-

ens the validity of the research. There are many types of validity that a researcher may wish

to take into consideration when designing a research project (see Loewen & Plonsky, 2016;

Mackey & Gass, 2016; Shadish, Cook, & Campbell, 2002 for an overview), two of which,

internal and external validity, are of particular relevance in this chapter.

Internal validity relates to the design of the study and captures the extent to which the manip-

ulations in the independent variable(s) (e.g., the presence/absence of treatment, dierent types

of treatments) are responsible for the observed changes in the dependent variable. A study can

claim internal validity if the results can only be explained by the independent variable, whereas

a study lacks internal validity if the results may have been inuenced by factors other than the

independent variable. Any extraneous factor that may allow for alternative explanation poses a

threat to internal validity. Threats to the internal validity of a study may be external, such as a

coincidental outside event that inuences the results, or internal, including factors to do with the

soundness of the research design and procedures (Campbell, 1957; McLeary, McDowell, & Bar-

tos, 2017; Shadish et al., 2002). Steps to help ensure internal validity include careful sampling,

thorough piloting of instruments and procedures, adherence to the experimental procedure, and

accurate data analysis (Loewen & Plonsky , 2016; Mackey & Gass, 2016; Shadish et al., 2002).

External validity refers to the degree to which the results of a particular study hold true

outside of the particular study; that is, the extent to which the results are generalizable. The

generalizability of a study can be considered from various perspectives: whether the results

are generalizable from the research participants to the wider population, from one research

Experimental and quasi-experimental designs

135

study to another; and from the research study to a real-world situation. External validity should

not be assumed and is best controlled through replication (see Marsden, this volume; also

McLeary et al., 2017; Porte, 2012; Porte & McManus, 2019; Shadish et al., 2002).

It is widely acknowledged that in experimental research there is a constant tension between

internal and external validity (Chaudron, 2003; Hulstijn, 1997). For example, psycholinguistic

studies typically involve tightly controlled experimental conditions to eliminate or minimize

the eects of potential confounding variables (Hulstijn et al., 2014). However, as a result of

emphasizing control, the experimental conditions may become so articial and unnatural that

they no longer resemble how language is used and learned in the real world, thus reducing

external validity. Despite this tension, all experimental studies should strive to maximize both

internal and external validity by striking a balance between sound study design and generaliz-

ability (Gravetter & Forzano, 2018).

Common research designs in experimental

and quasi-experimental research

When deciding upon an experimental design, there are a number of questions that researchers

need to consider to ensure that the internal and external validity of the study are optimized. These

include reecting on the type of variables studied, the number of independent variables inves-

tigated, the absence or presence of pretesting, the number of treatment sessions required, and

the size and nature of the sample to be selected. Each design option has its pros and cons, thus

researchers inevitably need to make compromises in the decision-making process (Mackey &

Gass, 2016). In the sections to follow, we introduce ve common research designs used within

experimental and quasi-experimental research, highlighting their advantages and limitations

with a view to helping researchers select designs that are best suited to address their research

questions, while also taking into account constraints related to practicality and feasibility.

Pretest–posttest design

The pretest–posttest control group design is probably the most common experimental research

design (Cook & Wong, 2008). In this design, the experimental group takes part in some type of

treatment or intervention (marked by X in Table 11.1), which can consist of single or multiple

training sessions. The design also includes a pretest and a posttest, in which both the experimen-

tal and control groups participate. The purpose of the pretest is to ensure the comparability of

the two groups prior to the treatment, whereas the posttest allows the researchers to determine

the immediate eects of the treatment on the outcome variable(s). In addition to the pretest and

immediate posttest, a delayed posttest or posttests are often included to examine the eects of

the treatment over the longer term. The inclusion of the control group enables researchers to

determine whether any observed changes from the pretest to the posttest in the experimental

group are the result of the experimental treatment or can be attributed to other inuences such

as testing eects or maturation. As both experimental and the control group take the tests at the

same time, time-related confounds are minimized (Gravetter & Forzano, 2018).

Table 11.1 Pretest/posttest control group design

Experimental group O X O

Control group O O

John Rogers and Andrea Révész

136

There are several considerations when designing the testing sessions. Regarding timing, it

has been recommended that the pretest is administered a minimum of one week prior to the treat-

ment session (Hulstijn, 2003) to decrease the likelihood that the eects of the treatment are con-

founded by testing eects that may arise from completing the pretest. The immediate posttest is

typically administered immediately following the treatment phase of the experiment. The timing

of the delayed posttests varies: delayed posttests can be administered one week, one month, or

even several months following treatment. In terms of content and procedures, each testing ses-

sion (pretest, posttest, and delayed posttest) should be comparable. Within a testing session, sin-

gle or multiple outcome measures may be employed. While single outcome measures are more

practical to administer, the use of multiple outcomes measures, if carefully selected, are likely

to provide a fuller picture of second language development (e.g., Webb, 2005). An example of a

study employing a pretest–posttest design is provided in the example that follows.

Example 1: a pretest–posttest design

Experiment: Peters and Webb (2018, Experiment 1) utilized an experimental pretest–

posttest design to examine the eect of TV viewing on the incidental learning of L2

vocabulary.

Independent variable: viewing versus not viewing L2 television

Dependent variable: form recognition and meaning recall of L2 vocabulary

Design: the participants, Dutch learners of L2 English, were randomly assigned to either a

true control group (n = 27) or an experimental group (n = 36). The experiment consisted

of three sessions: a pretesting session (one week prior to treatment), the treatment session,

and a posttesting session (administered one week following treatment). The control group

took part in the testing sessions only. The experimental group, in addition to completing

the pretest and posttest, participated in a treatment that included viewing a TV program.

Despite its utility and practicality, there are some limitations to the pretest–posttest design.

A main issue is that the pretest may sensitize participants to the focus of the experiment, and

this, in turn, may inuence the results. To give an example, if participants notice that the pretest

assesses their vocabulary knowledge, they might be inspired to pay more attention to vocabu-

lary during the treatment. One way to control for this possibility is to include distractor items

in the tests. This, however, has the obvious practical disadvantage of prolonging the length of

the testing sessions. Another potential threat to the validity of this design is that participants in

the control and experimental groups may communicate about the study outside the experiment,

which might also contaminate the ndings. Finally, a pretest–posttest design can provide only a

limited picture of the L2 learning process. Longitudinal designs, such as the time-series design,

are more suitable for capturing the eects of longer-term treatments on L2 development.

Time-series design

A time-series design is an example of longitudinal design in which researchers collect samples

of language on a regular basis over a set period (Kirk, 2009; Mellow, Reeder, & Forster , 1996).

By collecting data on multiple occasions, time-series designs can allow insight into the time

course of language development, including changes that may be immediate, gradual, delayed,

incubated, or residual (Mellow et al., 1996; Mellow, 2012) as well as the permanency of any

eects resulting from a treatment. A time-series design is characterized by multiple obser -

vations both before and after the treatment. The number of pretreatment and posttreatment

Experimental and quasi-experimental designs

137

observations can vary, and there is no need to have the same number of observations pre- and

posttreatment (Kirk, 2009). The treatment may entail a single or multiple treatment sessions.

Whether involving a single or multiple trainings, the treatment can vary in length, from includ-

ing brief to extended sessions. Table 11.2 provides an illustration of a time-series design, with

a single treatment and eight observations, four before the treatment and four after the treatment.

An example time series design by Ishida (2004), is described next.

Example 2: a time-series design

Experiment: Ishida (2004) utilized a time-series design to investigate the impact of recast-

ing on development in the use of the Japanese te-i-(ru) construction.

Independent variable: presence versus absence of recasting

Dependent variable: accuracy in the use of the Japanese te-i-(ru) construction, as reected

in accuracy rates during oral performance

Design: the participants were four learners of L2 Japanese, who took part in eight 30-minute

one-on-one conversation sessions. The rst two sessions served as the pretest, the mid-

dle four as the treatment, and the last two as the posttest. Two participants also partici-

pated in a delayed posttest seven weeks after the last posttest. The treatment involved

providing recasts in response to errors in the use of the Japanese te i-(ru) construction.

The use of multiple pre- and posttests in time-series designs is instrumental in increasing the

internal validity of the ndings. The multiple pretests enable researchers to test whether there

are any trends in the data before the treatment session. If trends are observed prior to the treat-

ment, this indicates that the posttest scores might be inuenced by factors other than the treat-

ment, such as testing eects, fatigue and maturation (Gravetter & Forzano, 2018). Similarly,

the multiple posttests make it possible to obtain a richer account of L2 development than a

single posttest would allow for. It is possible, for instance, that a treatment only has a tempo-

rary eect that fades over time (Mackey & Gass, 2016), which can only be captured if multiple

posttests are included in the design.

Time-series designs, however, fare less well in terms of external validity. Due to the larger

number of observations and the richer analysis of language development they make possible,

time-series designs usually include a smaller number of participants than quantitative designs

with fewer observational points. This inevitably has a negative impact on the generalizability

of the ndings to the wider population.

Latin-square design

A Latin-square design is frequently used within experiments that utilize multiple data collec-

tion instruments. This design can be traced back to Fischer (1925); it gets its name from an

ancient puzzle that was concerned with the number of ways that Latin letters can be arranged

in a square matrix so that each letter appears once in each row and once in each column (Kirk,

2009). A Latin-square is a table made with the same number of rows and columns that can

Table 11.2 Time-series design with a single treatment

Experimental group O1 O2 O3 O4 X O5 O6 O7 O8

Control/comparison group O1 O2 O3 O4 O5 O6 O7 O8

John Rogers and Andrea Révész

138

be used to counterbalance data collection instruments and to help control against test- and

task-order eects (see Richardson, 2018 for a recent review). Simply put, in a Latin-square

design, the ordering of instruments (e.g., tests or tasks) are dierent for various participants or

groups of participants. For instance, Lambert, Kormos, and Minn (2017) used a Latin-square

design to investigate the eects of task repetition on L2 oral uency. Participants carried out

four dierent tasks, three monologue tasks, and an opinion dialogue task. To make sure that

the order of the tasks did not inuence the results, the participants were randomly assigned to

four groups. Each group completed the four tasks in a dierent order following a Latin-square

design, as shown in Table 11.3. Latin-squares are also commonly employed when multiple

versions of tests are included in a study. For example, to avoid practice eects, studies with

pretest– posttest-delayed posttest designs often use three versions of all testing instruments,

and these are typically administered in a Latin-square design across participants in the test-

ing sessions. Of course, besides counterbalancing instruments, Latin-square designs can be

applied in studies with the primary goal of examining task- or test-order eects.

Repeated-measures design

Repeated-measures designs, also known as within-participants designs, are characterized by

a single group of participants who take part in all the dierent treatment conditions and/or

are measured at multiple times (Abbuhl & Mackey, 2017; Gravetter & Forzano, 2018). In a

within-participants design, the participant is subjected to all levels of the independent variable.

This design derives its name from the fact that the design involves 'repeated' measurements

of the same participant. Within-participants designs dier from between-participants designs,

where the treatment conditions are assigned to dierent groups of participants; that is, dierent

participants are tested on the various levels of the independent variable. Lambert et al.'s (2017)

study that was presented earlier also constitutes an example of a repeated-measures design

as all participants completed all four tasks. Another example of a study adopting a repeated-

measures design, Rogers and Cheung (2018), is given next.

Example 3. A pretest–posttest within-participants design

Rogers and Cheung (2018) investigated the impact of spacing on L2 vocabulary learning in

an authentic classroom setting.

Independent variable: temporal spacing of treatment sessions (1 day versus 8 days)

Dependent variable: learning of English adjectives, measured by performance on a multiple-

choice picture identication task

Design: the participants were Cantonese primary school students of L2 English in four

dierent intact classes. They were taught half of the target vocabulary items under

Table 11.3 Example of a Latin-square design

Groups Task order

1Instruction monologue Narration monologue Opinion monologue Opinion dialogue

2Narration monologue Opinion monologue Opinion dialogue Instruction monologue

3Opinion monologue Opinion dialogue Instruction monologue Narration monologue

4Opinion dialogue Instruction monologue Narration monologue Opinion monologue

Source: Lambert et al., 2017

Experimental and quasi-experimental designs

139

spaced-short conditions (one day between treatment sessions) and half of the items

under spaced-long conditions (eight days between treatment sessions). The items were

counterbalanced across the two treatment conditions. All participants took part in the

pretest and posttest as well as the treatment.

In this study, rather than assigning each of the four participating classes to a dierent experi-

mental condition, the researchers manipulated the independent variable within participants;

that is, each class studied half of the target items under one experimental condition and the

other half under another experimental condition.

There are several advantages and disadvantages associated with the use of repeated-

measures designs. This type of design is advantageous in that it helps control for potential

confounds, such as class eects and individual dierences between learners, which might arise

from the lack of randomized assignment in quasi-experimental research or low group sizes

in true experimental studies. Given that dierent measurements come from the same indi-

viduals, groups equivalence can automatically be assumed. An additional benet of repeated-

measures designs is that fewer participants are needed to attain sucient power, as compared

to between-participants designs. A disadvantage is that repeated-measures designs may be

aected by order eects; that is, the results might, at least in part, be attributed to the order in

which the dierent types of treatment conditions are administered rather than the dierence in

the conditions themselves. For example, results may deteriorate due to fatigue and boredom

or improve as a result of more practice and task familiarity. Such order eects may be reduced

by counterbalancing treatment conditions across participants (Rogers, 2017), for example,

through adopting a Latin-square design.

Factorial design

Factorial designs include more than one independent variable; that is, factorial designs are

employed to investigate the eects of two or more independent variables on the dependent

variable. The independent variables in a factorial design are also referred to as factors. Fac -

torial designs allow researchers to examine not only the impact of each independent vari-

able separately but also the combined eects of the independent variables on the dependent

variable. The separate eects of the independent variables are described as main eects and

their combined eects are referred to as interaction eects. In factorial designs, a notation

system is used to denote the number of levels associated with each independent variable. For

instance, in a 2 × 3 design, there are two independent variables or factors: the rst factor has

two levels and the second factor has three. Factorial designs can include between-participants

or within- participants factors only or can combine between- and within-participants factors.

Factorial designs that include both between-participants and within-participants factors are

usually described as mixed factorial designs.

Zalbidea (2017) provides a recent example of a study utilizing a factorial design. The

researcher employed a mixed 2 × 2 factorial design to examine the impact of task complex-

ity and modality on L2 performance. The two independent variables were task complexity,

a within-participants factor, and modality (a between-participants variable). As shown in

Table 11.4, each of the two independent factors had two levels (task complexity: simple ver-

sus complex; modality: written versus spoken). Task complexity was counterbalanced across

participants to avoid order eects. Through adopting a factorial design, Zalbidea was not only

able to examine the impact of modality and task complexity independently but also tease out

how these independent factors interacted in inuencing task performance.

John Rogers and Andrea Révész

140

Considerations when designing an experiment

A full discussion of all the decisions to be made when designing an experimental or quasi-

experimental study is beyond the scope of the current chapter (see Mackey & Gass, 2016,

for a fuller discussion). However, there are several key considerations that we would like to

highlight here.

Assignment of participants to experimental conditions

Randomized experimental designs are considered the gold standard for research investigating

causal relationships (Cook & Wong, 2008). As such, randomized assignment is preferred over

non-randomization in that it eliminates systematic dierences that may preexist among groups

(Kirk, 2009; Plonsky, 2017). It is not surprising, therefore, that in some research domains

non-randomized designs have systematically been shown to result in smaller eect sizes than

experimental research, presumably due to extraneous factors that are less closely controlled

in the absence of randomization (e.g., Bloom, Michalopoulos, & Hill, 2005). However, in

applied linguistics research, random assignment is not always possible due to reasons of prac-

ticality and/or ethical concerns. Further, randomization might not be appropriate given the

objectives of the research. For instance, instructed second language acquisition researchers

often wish to trial instructional interventions in authentic learning environments involving the

use of intact classes. Clearly, the lack of random assignment in such cases may open the door

for potential confounds that can limit the internal validity of the study. However, the resulting

threats to internal validity may be oset by the enhanced ecological validity aorded by con-

ducting research in a context that closely resembles natural classroom environments to which

the results are meant to be generalized (Mackey, 2017). To conclude, when deciding whether

to randomize or not to randomize participant assignment, researchers need to carefully con-

sider the objectives of the study, while taking account of potential practical constraints and

ethical issues.

Control or comparison group

Another key consideration is whether to include a true control group or a comparison group in

quasi-experimental research. While the use of a control group is generally recommended, it is

often not possible to include a true control group in quasi-experimental research for practical

Table 11.4 Example of mixed 2 × 2 factorial design, based on Zalbidea (2017)

Modality

Written modality (N = 16) Spoken modality (N = 16)

Order of task performance

Complex

task

Simple

task

Complex

task

Simple

task

Simple

task

Complex

task

Simple

task

Complex

task

Experimental and quasi-experimental designs

141

or ethical reasons (e.g., Mackey & Gass, 2016; Plonsky, 2017). It is also worth noting that,

in some circumstances, the inclusion of a comparison group might, in fact, be the preferred

option. For instance, as mentioned earlier, when researchers investigate the impact of a par-

ticular instructional intervention, they may decide that intact classes constitute the most eco-

logically valid setting for the research to take place (see Mackey & Gass, 2016; Mackey, 2017;

Plonsky, 2017 for discussions). In this case, a comparison group, engaged in normal classroom

instruction, may serve as the best baseline to the experimental condition. Using a compari-

son rather than a control group might also oer advantages in some experimental contexts.

For instance, Hamrick and Sachs (2018) have argued that the use of a trained control (i.e.,

comparison) group rather than a true control group may help control for hidden bias among

participants in experimental SLA research utilizing articial language systems.

Controlling for extraneous variables

The hallmark of experimental and quasi-experimental designs is using strict experimental con-

trol to maintain the internal validity of the ndings. As such, researchers should take care to

control for extraneous variables and to document how they have done so when reporting their

research. Researchers can help guard the internal validity of their research design in several

ways. Some key methods include employing random assignment to avoid selection bias, using

a control and/or a comparison group to control for the eects of testing, using multiple pre-

and posttests to asses preexisting trends and gain a fuller picture of longer-term treatment

eects, establishing that test versions designed to be parallel are indeed comparable, piloting

instruments and procedures, and reducing test- and task-order eects.

Reporting

Finally, it is also worth considering what details to include when writing up an experimental

study. A general rule of thumb is that the description of the methodology should be suciently

detailed to enable replication. To achieve this, it is essential to include details about the sam-

pling procedures, the sample, the number and timing of the treatment and testing sessions

(both duration and amount of time between sessions), the instruments used in the treatment

and testing sessions, and the steps and procedures followed. It is also important to highlight

how potential extraneous variables were controlled for. Although the importance of detailed

reporting is widely acknowledged in the eld of applied linguistics, crucial methodological

details are often left unaccounted for in published research. For example, published research

studies often do not include information about the number and length of treatment sessions and

the amount of time separating them. Given that the frequency and duration of treatment ses-

sions and the interval between them has been shown to inuence learning and retention (Rog-

ers, 2017), it is recommended that researchers include such details when writing up reports on

experimental research.

Conclusion

This chapter has reviewed basic concepts in experimental and quasi-experimental research

and outlined a number of experimental designs that are commonly used in the eld of applied

linguistics. Throughout the chapter, we have largely focused on internal validity: in particular,

how the architecture of experimental designs can help control for confounding variables that

might otherwise prevent an unambiguous interpretation of the ndings. By way of conclusion,

John Rogers and Andrea Révész

142

however, we would like to emphasize the importance of striking a balance between internal

validity and external validity; that is, the degree to which the results of a study generalize

beyond the context in which the study took place. To achieve this, we would recommend that

researchers, at the planning stage of an experiment, consider the contexts that their research

aims to generalize to, and validate their experimental materials and procedures with respect

to these contexts. For instance, research that aims to generalize to L2 classrooms might begin

by examining the instructional practices and learning behaviors that are characteristic of these

learning environments. These data might then inform the materials and procedures of the

experiment (Lightbown & Spada, 2019). By undertaking such considerations, researchers

could avoid overly articial and/or arbitrary experimental manipulations. We hope that the

descriptions, discussion, and examples provided here will help applied linguists to reach a

good balance between internal and external validity by deepening their understanding of the

tools and methods available in experimental and quasi-experimental research.

References

Abbuhl, R., & Mackey, A. (2017). Second language acquisition research methods. In K. King & N. H.

Hornburger (Eds.), Research methods in language and education (3rd ed., pp. 183–193). New York,

NY: Springer.

Bloom, H. S., Michalopoulos, C., & Hill, C. (2005). Using experiments to assess nonexperimental

comparison-group methods for measuring program eects. In H. S. Bloom (Ed.), Learning more from

social experiments (pp. 173–235). New York, NY : Russell Sage Foundation.

Campbell, D. (1957). Factors relevant to the validity of experiments in social settings. Psychological

Bulletin, 54, 297–312.

Chaudron, C. (2003). Data collection in SLA research. In C. J. Doughty & M. H. Long (Eds.), The hand-

book of second language acquisition (pp. 762–828). Malden, MA: Wiley-Blackwell.

Cook, T., & Wong, V. (2008). Better quasi-experimental practice. In P. Alasuutari, L. Bickman, &

J. Brannen (Eds.), The Sage handbook of social research methods (pp. 134–164). London: Sage.

Fischer, R. (1925). Statistical methods for research workers. London: Oliver & Boyd.

Gravetter, F., & Forzano, L. (2018). Research methods for the behavioral sciences. Boston: Cengage.

Hamrick, P., & Sachs, R. (2018). Establishing evidence of learning in experiments in articial linguistic

systems. Studies in Second Language Acquisition, 40(1), 153–169.

Hatch, E., & Lazaraton, A. (1991). The research manual: Design and statistics for applied linguistics.

Boston: Heinle & Heinle.

Hulstijn, J. H. (1997). Second language acquisition research in the laboratory: Possibilities and limita-

tions. Studies in Second Language Acquisition, 19(2), 131–143.

Hulstijn, J. H. (2003). Incidental and intentional learning. In C. J. Doughty & M. H. Long (Eds.), The

handbook of second language acquisition (pp. 349–381). Malden, MA: Blackwell.

Hulstijn, J. H., Young, R. F., Ortega, L., Bigelow, M., DeKeyser, R., Ellis, N. C., & Talmy, S. (2014).

Bridging the gap: Cognitive and social approaches to research in second language learning and teach-

ing. Studies in Second Language Acquisition, 36(3), 361–421.

Ishida, M. (2004). Eects of recasts on the acquisition of the aspectual form te i-(ru) by learners of

Japanese as a foreign language. Language Learning, 54 , 311–394.

Kirk, R. (2009). Experimental design. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The Sage hand-

book of quantitative methods in psychology (pp. 23–45). London: Sage.

Lambert, C., Kormos, J., & Minn, D. (2017). Task repetition and second language speech processing.

Studies in Second Language Acquisition, 39(1), 167–196.

Lightbown, P., & Spada, N. (2019, March). In it together: Teachers, researchers, and classroom SLA.

Plenary presented at the annual meeting of the American Association of Applied Linguistics. Atlanta,

Georgia, USA.

Loewen, S., & Plonsky, L. (2016). An A-Z of applied linguistics research methods. London: Palgrave

Macmillan.

Mackey, A. (2017). Classroom-based research. In S. Loewen & M. Sato (Eds.), The Routledge handbook

of instructed second language acquisition (pp. 541–561). New York, NY : Routledge.

Experimental and quasi-experimental designs

143

Mackey, A., & Gass, S. (2016). Second language research: Methodology and design. New York, NY:

Routledge.

Marsden, E., & Torgerson, C. J. (2012). Single group, pre- and post-test research designs: Some meth-

odological concerns. Oxford Review of Education, 38, 583–616.

McLeary, R., McDowell, D., & Bartos, B. (2017). Design and analysis of time series experiments.

Oxford: Oxford University Press.

Mellow, J. (2012). Time series. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 1–5).

West Sussex, UK: Wiley-Blackwell.

Mellow, J. D., Reeder, K., & Forster, E. (1996). Using the time-series design to investigate the eects of

pedagogic intervention on SLA. Studies in second language acquisition, 18, 325–350.

Norris, J., & Ortega, L. (2003). Dening and measuring SLA. In C. Doughty & M. Long (Eds.), Hand-

book of second language acquisition (pp. 717–761). Malden, MA: Wiley Blackwell.

Nunan, D. (1992). Research methods in language learning. Cambridge: Cambridge University Press.

Peters, E., & Webb, S. (2018). Incidental vocabulary acquisition through viewing L2 television and fac-

tors that aect learning. Studies in Second Language Acquisition, 40, 551–577.

Plonsky, L. (2017). Quantitative research methods. In S. Loewen & M. Sato (Eds.), Routledge handbook

of instructed second language acquisition (pp. 505–521). New York, NY: Routledge.

Porte, G. (Ed.). (2012). Replication research in applied linguistics. Cambridge: Cambridge University

Press.

Porte, G., & McManus, K. (2019). Doing replication research in applied linguistics. New York, NY:

Routledge.

Révész, A. (2012a). Working memory and the observed eectiveness of recasts on dierent L2 outcome

measures. Language Learning, 62(1), 93–132.

Révész, A. (2012b). Coding second language data validly and reliably. In A. Mackey & S. M. Gass

(Eds.), Research methods in second language acquisition: A practical guide (pp. 203–221). Oxford:

Wiley-Blackwell.

Richardson, J. T. E. (2018). The use of Latin-square designs in educational and psychological research.

Educational Research Review, 4, 84–97.

Rogers, J. (2017). The spacing eect and its relevance to second language acquisition. Applied Linguis-

tics, 38(6), 906–911.

Rogers, J., & Cheung, A. (2018). Input spacing and the learning of L2 vocabulary in a classroom context.

Language Teaching Research [online rst 15 October 2018].

Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for gener-

alized causal inference. New York, NY: Houghton Miin.

Webb, S. (2005). Receptive and productive vocabulary learning: The eects of reading and writing on

word knowledge. Studies in Second Language Acquisition, 27(1), 33–52.

Zalbidea, J. (2017). One task ts all? The roles of task complexity, modality, and working memory capac-

ity in L2 performance. The Modern Language Journal, 101 (2), 335–352.

... Because the students came from a range of grade levels and from different schools across several cities, it is not known what level of instruction the students had prior to the year in which this experiment was conducted. However, the use of a pre-survey does provide a snapshot of student understanding and perception of the topic prior to the experiment, increasing attributability of the post-survey results (Rogers and Révész 2020). The author of this study did not interact with students before or during the experiment but did conduct debriefing sessions with the students in their classrooms to explain the experimental results. ...

... After at least 1 week, students logged back into the survey, where they were individually, randomly assigned to one of the treatment texts through the randomization feature in Qualtrics. A delay between the pre-survey and the treatment was employed to decrease internal validity threats due to an interaction between testing and treatment exposure (Rogers and Révész 2020). Immediately after reading the treatment text, students took a post-survey. ...

... social norms [F (1, 445) = 0.20, p = .652], and behaviour [F (1, 445) = 0.26, p = .610].These results suggest that randomization at the individual level was effective and, as such, reduces validity concerns due to selection bias(Rogers and Révész 2020). ...

  • K.C. Busch K.C. Busch

In US school settings and materials, climate change is often framed as an uncertain phenomenon. However, the effect of such denialist representations on youth's perceptions of climate change has not been empirically tested. To address this gap in the literature, this article reports on a survey-based experiment testing two framings of uncertainty about the causes and effects of climate change—one with a high level of uncertainty and one with a low level of uncertainty—on students' knowledge, attitudes, and behaviours related to climate change. The experiment was conducted with 453 middle and high school students . Students who read a text portraying climate change with high uncertainty reported lower levels of certainty about human-caused climate change . To explore how the students engaged cognitive resources when reading the experimental texts, regression analyses were used to test two hypotheses. The Knowledge Thesis predicts that youth will use their prior knowledge to evaluate the text, and the Norms Thesis predicts that youth will use the perceived norms of their social group to evaluate the text. Results suggested that students did not respond to the treatment differentially, given their differing levels of prior knowledge nor social norms accepting of climate change . Implications for practice include the necessity of explicit scaffolds to support deep critical engagement with informational, or dis-informational, text about climate change.

... The case for viewing these three aspects as prerequisites for generalizable experimental sample designs is both historical and modern, with numerous prominent voices in statistical and L2 literature promoting the benefits of sample-size planning (e.g., Brysbaert, 2019;Cohen, 1988;Lakens et al., 2018), randomization (e.g., Fisher, 1935;Rogers & Révész, 2020), and multisite sampling (e.g., Moranski & Ziegler, 2021;Morgan-Short et al., 2018). However, assessment of published L2 vocabulary experimental samples in relation to these three aspects is relatively scarce. ...

... While random assignment at the participant level is the gold standard (see Rogers & Révész, 2020) and corresponds to Fisher's (1935) guidance, L2-centric reviews have made the distinction between random assignment at either the class/group or participant level. In classroom-centric SLA subfields such as L2 IVA, assessing random assignment at the class/group level is sensible as research is often conducted on intact classes. ...

... Despite the historical (Fisher, 1935) and current (Rogers & & Révész, 2020) case for participant-level random assignment as the gold standard, some have argued that nonrandom procedures are also suitable. Farsani and Babaii (2020), for instance, coded studies that undertook purposeful assignment. ...

In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.

... As highlighted by methodologists (e.g., Kuehl, 2000), experiments in the strictest sense do not have to include a pretest, a measurement of the outcome/dependent variable before the treatment. Recent L2 experimental guidance (Rogers & Révész, 2020), however, has conceptualized pretests as vital to establishing pre-treatment equivalency among the experimental groups. This makes sense as L2 learners will most likely come into classrooms with varying experiences learning the target language. ...

... Randomized (or random) assignment is one side of the randomization process which one expects to find in experimental designs, with the other being randomized sampling or randomly drawing participants from the population (Kuehl, 2000). Random assignment refers to assigning experimental conditions randomly to participants; the gold standard for this process is at the participant level (Rogers & Révész, 2020). When 'writing up' experiments in reports, there is a need to explicitly detail the random assignment process given its importance to experimental designs. ...

... Recent examples of such sampling procedures (e.g., Hiver & Al-Hoorie, 2020), see sample sizes over 1000 drawn from various regions within the country and/or intended population. It is therefore unsurprising that recent L2 methods guidance has either emphasized randomized assignment (e.g., Rogers & Révész, 2020) or mentioned randomization in a general sense (e.g., Gass et al., 2020). Many L2 researchers most likely are practically constrained from constructing randomized samples. ...

With the advent of COVID-19, learning has transitioned from the classroom to online platforms (Tang et al., 2020). Interestingly, this forced migration to online teaching has coincided with the rise of the popularity of flipped learning in education in general (Låg & Sæle, 2019) and ELT and L2 in particular (Mehring, 2018). Flipped learning, in a general sense, inverts the traditional learning paradigm by presenting new content to students before and outside of the class with subsequent class time used for interacting and engaging with said content (for extensive consideration of L2 flipped learning; see Mehring, 2018). Unsurprisingly, recent research has investigated the potential of flipped classrooms against the backdrop of online teaching in response to COVID-19. Tang and colleagues (2020), for instance, investigated Chinese students' perceptions of flipped learning in online environments vis-à-vis traditional methods. It is likewise reasonable to expect L2 researchers to enhance their already substantial interest in flipped learning in response to the current migration to online teaching. What is more is that even after COVID-19 subsides, there is no reason to assume the L2 academic community's interest in flipped learning will wane as interest in it had been growing before the pandemic (see e.g., Bonyadi, 2018).

... External validity is best established through replication (Porte & McManus, 2018;Shadish et al., 2002). It is widely acknowledged that there is a constant tension between internal and external validity in experimental research (e.g., Hulstijn, 1997;Rogers & Révész, 2020). Related to external validity is the construct of ecological validity, which is related to the "ecology" of a particular context. ...

... Although these studies have taken place in a classroom, they have imposed artificial experimental conditions that would not be present normally in a classroom environment. At best, this has meant that the experimental conditions have not been validated with regard to the learning environment in which the study takes place (Rogers & Révész, 2020). At worst, the studies have imposed artificial experimental conditions that do not reflect an authentic learning environment, such as not allowing participants to take notes (e.g., Küpper-Tetzel et al., 2014) or forbidding participants from engaging in specific cognitive strategies during instruction (e.g., Pavlik & Anderson, 2005). ...

... In addition, there are even fewer studies examining the effects of input spacing on the learning of L2 vocabulary in authentic teaching and learning contexts with nonadult populations of learners. By carrying out such a replication, the present study would meet wider calls for more ecologically valid research within the field of SLA with nontraditional populations (Kasprowicz & Marsden, 2018;Lightbown & Spada, 2019;Rogers & Révész, 2020;Spada, 2005Spada, , 2015. ...

This study is a conceptual replication of Rogers and Cheung's (2018) investigation into distribution of practice effects on the learning of L2 vocabulary in child EFL classrooms in Hong Kong. Following a pretest, treatment, delayed posttest design, 66 primary school students (Cantonese L1) studied 20 vocabulary items over three training episodes under spaced-short (1-day interval) or spaced-long (8-day interval) learning conditions. The spacing of the vocabulary items was manipulated within-participants, and learning was assessed using crossword puzzles following a 4-week delay. While Rogers and Cheung (2018) resulted in minimal overall learning with a slight advantage for the spaced-short group, this study found large learning gains across the experimental conditions with no significant differences between the two learning schedules. Taken together, these results provide evidence that the results from previous research examining input spacing with adult populations in laboratory contexts might not generalize to authentic child learning contexts.

... Punjab SWD gave us permission to sample the aging residents at one state-run "Aafiyat" old age home, in a city of Punjab, which houses 45 cognitively sound residents. We deemed it ethical to ask all 45 of the residents to be part of our intervention and chose a quasi-experiment design for this study (Handley et al., 2018;Rogers & Révész, 2020). We used a pre-and post-survey to measure the effect of the intervention without a control group (Harris et al., 2006;Kampenes et al., 2009). ...

There has been no research in Pakistan about how to improve quality of life (QOL) of aging populations through intergenera-tional learning. In this study we aimed to deliver an intervention for intergenerational learning to assess the impact on QOL through a quasi-experiment research design. We also aimed to identify which types of intergenerational learning activities improve QOL and how the activities may be improved. We gained permission to deliver the intervention from a state-run old age center in Punjab. Though the intervention started with 42 participants, we were left with 18 participants at the end of the three-month intervention. The results show posttest improvement in: (i) sleep (t = 3.01, p < .05), (ii) life enjoyment (t = 2.26, p < .05), and (iii) psychological health (t = 2.04, p = .05). In addition, participants with more education exhibited significant improvement in QOL after the intervention. We were also able to compile a list of 19 suggestions by participants for overall changes in learning activities, changes in specific interventions delivered, and suggestions for more types of interventions. We conclude that intergenerational learning improves QOL, and recommend suggestions for life satisfaction, and the planning of old age home centers. This study has implications for aging policy across developing and South Asian populations.

... Nonetheless, we offer some ideas most relevant to our review. First, although randomized experimental designs are generally the "gold standard" for experimental research (Rogers and Révész, 2019), students usually opt into bridge programs, making random assignment impossible and selection bias likely. Therefore, it is important to consider the factors that could impact students' self-selection into a program. ...

University science, technology, engineering, and math (STEM) summer bridge programs provide incoming STEM university students additional course work and preparation before they begin their studies. These programs are designed to reduce attrition and increase the diversity of students pursuing STEM majors and STEM career paths. A meta-analysis of 16 STEM summer bridge programs was conducted. Results showed that program participation had a medium-sized effect on first-year overall grade point average (d = 0.34) and first-year university retention (Odds Ratio [OR] = 1.747). Although this meta-analytic research reflects a limited amount of available quantitative academic data on summer STEM bridge programs, this study nonetheless provides important quantitative inroads into much-needed research on programs' objective effectiveness. These results articulate the importance of thoughtful experimental design and how further research might guide STEM bridge program development to increase the success and retention of matriculating STEM students.

... Finally, Bird's (2010) and Serrano's (2011) studies are among the few studies to have investigated distribution of practice effects in authentic classroom contexts (see also Miles, 2014;Kasprowicz et al., 2019). Replications of these studies would meet wider calls for ecologically valid classroom-based research within the field of SLA (Spada, 2005(Spada, , 2015Kasprowicz & Marsden, 2018;Lightbown & Spada, 2019;Rogers & Révész, 2020). ...

  • John Rogers John Rogers

This paper proposes the replication of Bird (2010) and Serrano (2011), studies which have examined distribution of practice effects in second language acquisition (SLA). These studies, which took place in authentic classroom contexts, produced conflicting results regarding the degree to which the learning of a second language benefits from distributed instruction. In the first part of the paper, I discuss the distribution of practice research in the learning and teaching of second languages. I then describe Bird's (2010) and Serrano's (2011) work highlighting the strengths and limitations of the approaches of these studies. Finally, a number of approaches to approximate and conceptual replications are suggested for each study in order to assess the reliability, internal validity, and generalizability of the original findings.

... Pemberian prestest ditujukan untuk mengetahui keadaan awal dari variabel yang akan diteliti. Pemberian posttest bertujuan untuk mengukur efek yang dihasilkan terhadap variabel yang diteliti setelah pemberian treatment kepada partisipan yang diteliti (Rogers & Révész, 2020). Alur desain penelitian secara lengkap dapat dilihat pada gambar 1. ...

  • Suroso MS, Psikolog Suroso MS, Psikolog
  • Fandy Maramis
  • Muhammad Farid

Increasing prosocial behavior can help solve social problems that occur in the community. Cultivation of prosocial behavior can be done at school through character learning. This study aims to examine the effectiveness of character learning to improve prosocial behavior in working together, helping and respecting the rights and welfare of others in high school adolescents. The experimental research design used in this study was one group pretest-posttest. Research participants numbered 21 class XI Xin Zhong Surabaya High School students who were selected based on purposive sampling techniques. The research instrument used a prosocial scale compiled by researchers with a reliability coefficient of ? = 0.898. The results of the analysis using Wilcoxon show that character learning is effective for improving the prosocial behavior of high school adolescents. It is recommended that teachers use the character learning module to improve the prosocial behavior of high school adolescents. Keywords: Character learning; Learning effectiveness; Prosocial behavior Abstrak Peningkatan perilaku prososial dapat membantu menyelesaikan permasalahan sosial yang terjadi di masyarakat. Penanaman perilaku prososial dapat dilakukan di sekolah melalui pembelajaran karakter. Penelitian ini bertujuan untuk menguji efektivitas pembelajaran karakter untuk meningkatkan perilaku prososial bekerja sama, menolong dan menghargai hak dan kesejahteraan orang lain pada remaja SMA. Desain penelitian eksperimen yang digunakan dalam penelitian ini adalah one group pretest-posttest. Partisipan penelitian berjumlah 21 siswa kelas XI SMA Xin Zhong Surabaya yang terpilih berdasarkan teknik purposive sampling. Instrumen penelitian menggunakan skala prososial yang disusun oleh peneliti dengan koefisien reliabilitas sebesar ?= 0,898. Hasil analisis menggunakan Wilcoxon menunjukkan bahwa pembelajaran karakter efektif untuk meningkatkan perilaku prososial remaja SMA. Disarankan agar para guru menggunakan modul pembelajaran karakter untuk meningkatkan perilaku prososial remaja SMA. Kata kunci: Efektivititas Pembelajaran; Pembelajaran Karakter; Perilaku Prososial

  • Ziqian Xia
  • Yurong Liu

Promoting pro-environmental behaviors is an effective means of reducing carbon emissions at the individual end, but the measurement of behaviors has long been a problem for scholars. Especially in environmental psychology community, the complexity of social policies and habitat implies greater difficulty in measuring. Due to the limitations of traditional questionnaire, laboratory, and naturalistic observation methods, environmental psychologists need more realistic, accurate, and cost-effective ways to measure behavior. The rapid development of IoT technology lights up the hope for achieving this goal, and its large-scale popularization will bring great changes to the research community. This paper reviews the current methods and their limitations, proposes a framework for measuring behavior using IoT devices, and points out its future research directions.

The purpose of this article is to present the methodology and the results of active learning application in an Engineering Education institution in Brazil, using repeated measures experimental design. It involved taking course samples from 7 engineering programs, providing 6 classes, 202 participant students and 296 class-hours. The design had a strict implementation plan where each course content was taught in two subsequent stages using traditional and active learning approaches respectively. Similar grade assessments were applied to both stages. The classes were observed using a classroom observation protocol, to check for the profile change from passive to active in-class behaviours. The consolidated results demonstrated that student performance in the second assessment was 14% better, after the application of active learning techniques (~40% of the grades standard deviation). The article aims to contribute to current research and inform future studies about the effectiveness of active learning methods in Engineering Education.

  • Graeme Porte
  • Kevin McManus Kevin McManus

Doing Replication Research in Applied Linguistics is the only book available to specifically discuss the applied aspects of how to carry out replication studies in Second Language Acquisition. This text takes the reader from seeking out a suitable study for replication, through deciding on the most valuable form of replication approach to its execution, discussion, and writing up for publication. A step-by-step decision-making approach to the activities guides the reader/student through the replication research process from the initial search for a target study to replicate, through the setting up, execution, analysis, and dissemination of the finished work.

This study examined the optimal learning schedule for second language vocabulary within an authentic classroom setting in Hong Kong. Following a pretest, treatment, delayed posttest design, fifty-two primary school students (Cantonese L1) studied twenty English adjectives over two learning episodes under spaced-short (1-day interval) or spaced-long (8-day interval) learning conditions. The spacing of the vocabulary items was manipulated within-participants, and learning was assessed on a multiple-choice posttest, administered following a four-week delay. In contrast to previous laboratory-based findings, the results here indicated superior learning of the items presented under the spaced-short format, suggesting that lag effects might be attenuated by age, learning context and teaching procedure.

  • Elke Peters Elke Peters
  • Stuart Webb

Research has begun to demonstrate that L2 words can be learned incidentally through watching audio-visual materials. Although there are a large number of studies that have investigated incidental vocabulary learning through reading a single text, there are no studies that have explored incidental vocabulary learning through viewing a single full-length TV program. The present study fills this gap. Additionally, three word-related variables (frequency of occurrence, cognateness, word relevance) and one learner-related variable (prior vocabulary knowledge) that might contribute to incidental vocabulary learning were examined. Two experiments were conducted with Dutch-speaking EFL learners to measure the effects of viewing TV on form recognition and meaning recall (Experiment 1) and meaning recognition (Experiment 2). The findings showed that viewing TV resulted in incidental vocabulary learning at the level of meaning recall and meaning recognition. The research also revealed that learning was affected by frequency of occurrence, prior vocabulary knowledge, and cognateness.

  • John Rogers John Rogers

This commentary discusses some theoretical and methodological issues related to research on the spacing effect in second language acquisition research (SLA). There has been a growing interest in SLA in how the temporal distribution of input might impact language development. SLA research in this area has frequently drawn upon the rich field of cognitive psychology as a motivation for research and a context for the discussion of results. However, there are a number of nonconformities between these two fields, including how key constructs have been operationalized and measured. A better understanding of these conceptual divergences will allow SLA to advance with more systematic and robust research into the impact of input spacing on second language development.

  • Kathleen Bardovi-Harlig
  • Melissa Bowles
  • Yuko Butler Yuko Butler
  • Xian Zhang

This is an ambitious work, covering the whole breadth of the field from its theoretical underpinnings to research and teaching methodology. The Editors have managed to recruit a stellar panel of contributors, resulting in the kind of 'all you ever wanted to know about instructed SLA' collection that should be found on the shelves of every good library. " Zoltán Dörnyei, University of Nottingham, UK The Routledge Handbook of Instructed Second Language Acquisition is the first collection of state-of-the-art papers pertaining to Instructed Second Language Acquisition (ISLA). Written by 45 world-renowned experts, the entries are full-length articles detailing pertinent issues with up-to-date references. Each chapter serves three purposes: (1) provide a review of current literature and discussions of cutting edge issues; (2) share the authors' understanding of, and approaches to, the issues; and (3) provide direct links between research and practice. In short, based on the chapters in this handbook, ISLA has attained a level of theoretical and methodological maturity that provides a solid foundation for future empirical and pedagogical discovery. This handbook is the ideal resource for researchers, graduate students, upper-level undergraduate students, teachers, and teacher-educators who are interested in second language learning and teaching.

  • John T. E. Richardson

A Latin square is a matrix containing the same number of rows and columns. The cell entries are a sequence of symbols inserted in such a way that each symbol occurs only once in each row and only once in each column. Fisher (1925) proposed that Latin squares could be useful in experimental designs for controlling the effects of extraneous variables. He argued that a Latin square should be chosen at random from the set of possible Latin squares that would fit a research design and that the Latin-square design should be carried through into the data analysis. Psychological researchers have advanced our appreciation of Latin-square designs, but they have made only moderate use of them and have not heeded Fisher's prescriptions. Educational researchers have used them even less and are vulnerable to similar criticisms. Nevertheless, the judicious use of Latin-square designs is a powerful tool for experimental researchers.

  • Rebekha Abbuhl
  • Alison Mackey

Since its inception in the 1960s, the field of second language acquisition (SLA) has sought to document and explore how children and adults acquire a nonnative language. Researchers have investigated the linguistic, cognitive, social, contextual, psychological, and neurobiological characteristics of second language (L2) learning, processing, and use. Typical research questions include: What are the characteristics of learner interlanguage? How do individual differences, such as working memory capacity, impact the learning of an L2? How does the social context (such as stay-at-home vs. study abroad) influence the fluency, accuracy, and complexity of learner language? How do different types of motivation impact the learning process? How is the L2 processed in the learner's mind and how is this affected by age of acquisition? To investigate these and many other questions, SLA researchers have at their disposal a large array of research designs. In this chapter, we will discuss various research designs, including quantitative, qualitative, and mixed methods traditions. We will also address current works in progress and examine recent topics of concern related to the conducting of research on L2 learning. Finally, we will conclude with future directions for SLA research.

  • Janire Zalbidea

The present study explores the independent and interactive effects of task complexity and task modality on linguistic dimensions of second language (L2) performance and investigates how these effects are modulated by individual differences in working memory capacity. Thirty-two intermediate learners of L2 Spanish completed less and more complex versions of the same type of argumentative task in the speaking and writing modalities. Perceived complexity questionnaires were administered as measures of cognitive load to both L2 learners and native speakers to independently validate task complexity manipulations. Task performance was analyzed in terms of general (complexity and accuracy) as well as task-relevant (conjunctions) linguistic measures. Quantitative analyses revealed that task modality played a larger role than task complexity in inducing improved linguistic performance during task-based work: Speaking tasks brought about more syntactically complex output while writing tasks favored more lexically complex and more accurate language. In addition, relationships of working memory capacity with various linguistic measures were attested, but only when the cognitive complexity of tasks was enhanced.

Artificial linguistic systems (ALSs) offer many potential benefits for second language acquisition (SLA) research. Nonetheless, their use in experiments with posttest-only designs can give rise to internal validity problems depending on the baseline that is employed to establish evidence of learning. Researchers in this area often compare experimental groups' performance against (a) statistical chance, (b) untrained control groups' performance, and/or (c) trained control groups' performance. However, each of these methods can involve unwarranted tacit assumptions, limitations, and challenges from a variety of sources (e.g., preexisting perceptual biases, participants' fabrication of rules, knowledge gained during the test), any of which might produce systematic response patterns that overlap with the linguistic target even in the absence of learning during training. After illustrating these challenges, we offer some brief recommendations regarding how triangulation and more sophisticated statistical approaches may help researchers to draw more appropriate conclusions going forward.

13 Counterbalancing Is Used With Within-participants Designs to Control for

Source: https://www.researchgate.net/publication/334250281_Experimental_and_quasi-experimental_designs

Post a Comment for "13 Counterbalancing Is Used With Within-participants Designs to Control for"