The Incidence of "Causal" Statements in Teaching-and-Learning Research JournalsUniversity of Texas, Austin
University of Arizona
University of Texas, Austin
The authors examined the methodologies of articles in teaching-and-learning research journals, published in 1994 and in 2004, and classified them as either intervention (based on researcher-manipulated variables) or nonintervention. Consistent with the findings of Hsieh et al., intervention research articles declined from 45% in 1994 to 33% in 2004. For nonintervention articles, the authors recorded the incidence of "causal" statements (e.g., if teachers/schools/parents did X, then student/child outcome Y would likely result). Nonintervention research articles containing causal statements increased from 34% in 1994 to 43% in 2004. It appears that at the same time intervention studies are becoming less prevalent in the teaching-and-learning research literature, researchers are more inclined to include causal statements in nonintervention studies.
Key Words: randomized trials causal statements causal conclusions intervention research causation In the field of teaching and learning, researchers are frequently interested in identifying practices that educators might use to improve student outcomes, such as academic achievement. Recommendations for educators require that some type of causal connection be established between the treatment and the outcome. As Raudenbush (2005) noted, "Causal questionsquestions about the impact of alternative policies and practiceshave emerged as priorities in educational research" (p. 25) during the past few years. Shadish, Cook, and Campbell (2002) noted that it was the philosopher John Stuart Mill who established three necessary conditions for causal relationships: (a) the cause is related to the effect, (b) no plausible alternative explanation for the effect exists other than the cause, and (c) the cause precedes the effect. These are basically identical to three mentioned recently by Frazier, Tix, and Barron (2004): (a) association between the variables, (b) the association is nonspurious (i.e., isolated), and (c) the cause precedes the effect in time (i.e., direction). Satisfaction of these criteria can be seen as falling along a continuum, with one end of the continuum defined by nonexperimental correlation studies that merely establish an association between two variables and the other defined by experiments with random assignment to conditions (Frazier et al., p. 127). Raudenbush echoed this sentiment concerning the appropriateness of randomized experiments in providing the clearest answers to causal questions, as did Thompson, Diamond, McWilliam, Snyder, and Snyder (2005) in concluding that "definitive causal conclusions in quantitative research can only be reached on the basis of true randomized trials" (p. 182). Summarizing the causal-conclusion ingredients in the context of an undergraduate course on scientific and statistical thinking, Derry, Levin, Osana, Jones, and Peterson (2000) provided students with the following acronym: If an appropriate Comparison reveals Again and Again evidence of a Relationship between a treatment and a desired outcome while Eliminating all other competing explanatory variables, then such CAREful experimentation yields evidence of the treatments effectiveness that is scientifically convincing. (Derry et al., 2000, Figure 1, p. 754). Although most agree that controlled researcher manipulation of variables is an essential step to establishing causation, many authors attempt to draw causal conclusions on the basis of nonmanipulated correlational studies. Frazier et al. (2004) examined multiple-regression "mediation" studies that appeared in Journal of Counseling Psychology in 2001 and found that although all were nonintervention studies, most researchers used causal language in describing their results. Virtually all of the mediational analyses in the studies we reviewed were performed with cross-sectional correlation data. Very few attempts were made to control for common causes of the mediator and the outcome. The direction of the relations among variables often was unclear. Nonetheless, authors typically discussed results using causal language. Authors sometimes acknowledged that no causal conclusions could be drawn, even though they used causal language. (Frazier et al., 2004, pp. 129130) Shadish et al. (2002) discussed the difficulty in drawing causal conclusions from cross-sectional correlation data: In cross-sectional studies in which all the data are gathered on the respondents at one time, the researcher may not even know if the cause precedes the effect. When these studies are used for causal purposes, the missing design features can be problematic unless much is already known about which alternative interpretations are plausible, unless those that are plausible can be validly measured, and unless the substantive model used for statistical adjustment is well-specified. These are difficult conditions to meet in the real world of research practice, and therefore many commentators doubt the potential of such designs to support strong causal inferences in most cases. (Shadish et al., 2002, p. 18) The field of teaching and learning has witnessed a decrease in articles reporting intervention (researcher-manipulated "treatments") studies. Hsieh et al. (2005) examined five relevant empirical journals (American Educational Research Journal [AERJ], Cognition and Instruction [C&I], Contemporary Educational Psychology [CEP], Journal of Educational Psychology [JEdP], and Journal of Experimental Education [JXE]) in 1983 and from 1995 to 2004 and found that the percentage of intervention research articles decreased during the 21-year period (from 52% in 1983 to 45% in 1995 to 33% in 2004). In speculating as to why this decrease had occurred, Hsieh et al. proposed that perhaps authors are becoming more comfortable with offering causal conclusions based on nonintervention studies. Hsieh et al. provided examples of nonintervention research articles that contained causal conclusions (e.g., If teachers/schools/parents would do X, then student/child outcome Y would likely result) in either the abstract or discussion sections. Yet how often does this practice of making causal conclusions based on nonintervention studies occur? In the present study, we sought to investigate the prevalence of this practice empirically. In particular, we examined articles in journals publishing teaching-and-learning research to determine how frequently authors are offering causal conclusions or recommendations for researchers and educators based on results from nonintervention research.
We reviewed 274 empirical articles that appeared in five teaching-and-learning research journals (AERJ, C&I, CEP, JEdP, and JXE) in 1994 and 2004. These five journals were selected on the basis of their publishing primary research in teaching and learning, the discipline of concern here. We selected 1994 and 2004 as the years for the review because 2004 represented the most recent year in which we could review all journal articles and 1994 was a decade earlier, allowing for a comparison over time. We classified each article as either intervention, correlational, qualitative, or descriptive. An article was deemed an intervention study if the researcher assessed the effects of one or more researcher-manipulated variables on one or more participant outcome measures. Intervention articles included researcher-manipulated interventions or treatments applied in randomized, matched, or intact groups (i.e., quasi-experimental) and within-subjects and single-case designs.We are, of course, aware that causal conclusions generally more plausibly flow from randomized intervention studies than from nonrandomized intervention studies. In the present investigation, however, the major distinction was between intervention and nonintervention studies, and no distinction was made between intervention studies that did and did not incorporate randomization. Because of our decision to include nonrandomized intervention studies in the "appropriate causal language" category here, the incidence of reported studies manifesting unjustified causal language is likely underestimated. At the same time, causal-conclusion caveats associated with randomized intervention studies are provided by Levin and ODonnell (1999). Correlational articles reported studies where the researcher manipulated no variables and quantitative data were analyzed to observe relations among variables. Such analyses encompassed correlation and regression (simple, multiple, and multivariate), hierarchical linear modeling (HLM), structural equations modeling (SEM), and static group mean comparisons based on analyses of variance and covariance. It should be noted that one might use any of the foregoing statistical procedures to analyze data from randomized intervention experiments. However, if the researcher manipulated no variables, we classified those articles as correlational. Finally, qualitative and descriptive articles were defined as studies in which no manipulation of variables occurred and no relationships between variables were measured. Qualitative articles included outcome measures but no numerical data, whereas descriptive articles included outcome measures along with numerical data in the form of descriptive statistics (e.g., frequencies, percentages, and other sample summary measures) but with no accompanying inferential statistical tests or probability-based estimation procedures. For all empirical articles, we searched in the Abstract and Discussion sections for sentences where the author(s) used causal language to make recommendations for educators. We operationally defined a causal statement as one where the author(s) explicitly stated that if teachers/schools/parents would do X, then student/child outcome Y would likely result. Examples of statements classified as causal, as well as those classified as noncausal, are presented in Table 1. An initial examination of intervention research articles indicated that the vast majority contained causal conclusions. Moreover, there appeared to be little or no difference between randomized and non-randomized intervention studies with respect to the incidence of such conclusions. We therefore decided to focus on the three research methodologies where causal conclusions are least justified, namely, correlational, qualitative, and descriptive studies. Additionally, insofar as our journal survey is a descriptive study according to the present classification scheme, we present only descriptive statistics and speculative interpretationsand attempt to avoid offering our own causal conclusions or recommendations.
All 274 articles were initially examined and classified by the first author. A different author then examined and classified 99 randomly selected articles from the 274. The agreement (number of agreements divided by the total number possible) was 93% on methodology classification (with no disagreements concerning intervention vs. nonintervention) and 97% on whether interventions used random assignment. For the causal statements classification, the agreement was 100% in terms of whether a nonintervention article included at least one causal statement. The only occasions where the two raters did not have perfect agreement on causal statements were when there were multiple statements in an article and the raters did not report the same statement.
Tables 2 to 4 display the findings. From Table 2 it may be seen that intervention research articles have proportionately declined from 1994 to 2004, whereas both correlational and qualitative research articles have increased. For each individual journal, various trends are apparent. For AERJ, the key difference between 1994 and 2004 is the increase in qualitative articles. In 1994, only 30% of the empirical articles were qualitative, in contrast to 56% in 2004. For CEP, the primary change resides in the lower percentage of intervention articles: In 1994, 58% of the articles were intervention whereas by 2004, this had decreased to 36%. Finally, for JEdP, a notable increase can be seen in correlational articles, from 46% in 1994 to 61% in 2004. In contrast to JEdP, correlational studies in C&I decreased from 36% in 1994 to 0% in 2004. Table 3 reveals that randomized intervention studies also declined during the 10-year period, with 40% of the articles based on treatment randomization in 1994 as compared to 25% in 2004.
Table 4 reveals that across all surveyed journals, between 1994 and 2004, the percentage of studies based on nonintervention methodologies (i.e., correlational, qualitative, descriptive) that contained causal statements increased from 34% to 43%. The within-journal changes in Table 4 are also worth noting, as not all journals experienced an increase. For example, for CEP, the percentage of articles containing causal statements increased from 21% to 69%, and for JEdP, causal-statement articles increased from 36% to 42% from 1994 to 2004. In contrast, C&I went from 17% to 0%, which relates to the number of correlational studies published in C&I during each time period. A decrease in the percentage of nonintervention articles that contained causal statements was also witnessed in AERJ, with 47% recorded in 1994 and 33% in 2004.
Two conclusions can be drawn from our findings. First, articles on teaching-and-learning intervention research have declined across the five selected journals during the past 10 years (see also Hsieh et al., 2005). In their place, nonintervention research incorporating correlational and qualitative methodologies has become more popular. Secondand ironically, given the first conclusionduring the same 10-year period, on average, the prevalence of causal conclusions in nonintervention articles has increased in the same five journals. As was noted earlier, the incidence of causal language from nonintervention studies in the surveyed journals is undoubtedly underestimated because nonrandomized intervention studies (where causal language is more likely to be inappropriate than in randomized intervention studies) were included in the "appropriate causal language" intervention-research category. In addition, it must be remembered that the five journals were selected to represent the published teaching-and-learning research literature. The content (including the language) of articles in these journals is shaped by reviewers and editors who are primed to discover and modify unjustified causal-conclusion usage. Given the plausible assumption that language corrections are less likely to occur in the unpublished research literature (e.g., conference papers, student theses and dissertations, ERIC documents), this would serve further to underestimate the incidence of causal conclusions in reports of empirical teaching-and-learning research. One might argue that even though the incidence of causal language in nonintervention research articles is increasing, readers are unlikely to take such conclusions seriously. This is because authors clearly state that nonintervention methodology was used and some authors even go to great lengths to remind the reader that causal conclusions are inappropriate (even though they may make causal conclusions later, in the Discussion section). To assess the validity of that argument, we recently conducted an experiment in which undergraduates read brief "research" articles based on differing methodologies and strengths of causal language (Pituch, Thomas, Levin, & Robinson, 2006). These two factors were varied to determine their separate and joint influences on readers believability in or confidence about intervention-outcome causal connections. We selected undergraduates for the study because they are assumed to be relatively uninformed readers of scholarly research articles. As such, undergraduates interpretations of reported research findings might be comparable to the interpretations made by educators and other practitioners. As evidence against the argument that readers do not take nonintervention research causal conclusions seriously, we found that our participants made no distinction between experimental, correlational, or qualitative methodologies when it came to evaluating the validity of causal claims. We are currently conducting a follow-up experiment using more methodologically and statistically informed readers. Why are intervention research studies decreasing while causal statements appearing in nonintervention research are increasing? Perhaps, as Hsieh et al. (2005) speculated, researchers are tiring of the obstacles and complications associated with intervention research and have become more comfortable using other methods to offer conclusions about educational practices. These researchers may be seduced by the lure of increasingly available "magical" data-analysis tools such as HLM and SEM. With regard to the latter, a contributor to the causal-language problem may be the equivalent nomenclature by which SEM is known, namely, causal modeling. Twenty years ago, when these statistical methods were first becoming popular, Biddle and Martin (1987) warned of their misuse by researchers: It seems clear that SEM techniques are misunderstood by many. Misuse of these techniques has appeared in major journals. Users have made inappropriate claims for findings generated through these procedures and have applied them mindlessly in data analysis. Researchers have been urged to adopt these techniques for questionable purposes. . . . Causal models are sometimes assessed with cross-sectional data from a single-wave field study, and users seem somewhat less tempted to state unvarnished, causal conclusions for such data . . . but even in (three-or-more-wave field studies) the confirmation of the model does not guarantee a causal relation. However complex the model and study, it is quite possible that other variables the user has failed to consider would account for the relations observed; hence, this would invalidate causal conclusions. (pp. 4, 9) These data-analytic tools are not the first to be wielded "for questionable purposes" (Biddle & Martin, 1987, p. 4). Researchers misapplications of statistical significance testing caused such an outrage that the American Psychological Association formed a task force in 1996 to consider banning its use (Wilkinson & the APA Task Force on Statistical Inference, 1999). Certainly it is not the fault of HLM and SEM that they are relied on by researchers to squeeze causality from correlation data, just as there is no justification for a blanket indictment of statistical significance testing because of researchers incomplete understanding of its appropriate application and legitimate warrants. But why are todays educational researchers becoming more comfortable making recommendations based on nonintervention research? Again, we can only speculate. First, as has been noted previously (Levin & ODonnell, 1999), todays graduate students are being exposed to a wide variety of attractiveand seemingly interchangeablemethodological approaches in their programs of study. These students may then launch their research careers with an incomplete understanding of the necessary conditions for establishing causal connections between treatments and outcomes. Recently, GradPsych, an American Psychological Association newsletter for graduate students, provided an article that featured the 10 most common dissertation discussion mistakes, from a study by Susan Nolen-Hoeksema of Yale University. Number 6 on the list was lapsing into causal language when ones data are correlational (Azar, 2006). Analogously, in a study of journal editors views of necessary criteria for a study to be publishable, Lounds et al. (2002) listed the top 10 criteria. Number 1 was that the design was appropriate to the question being asked. Despite these ideals, perhaps the journal review process has changed in the past decade, where (a) it is becoming permissible for authors to publish articles that make unjustified causal claims and (b) journal editors and reviewers are allowing more unwarranted statements to slip by unnoticed. Second, it is possible that increased pressures on new faculty members seeking tenure lead them to design more "manageable" research studies that do not permit causal conclusions. Yet such conclusions are offered by researchers to enhance the appearance of their studies importance. This situation is reminiscent of Stanovichs (1999) observation of educational researchers who report their findings through the "back door," a bait-and-switch strategy in which . . . [r]esearchers conducting preliminary, exploratory, or descriptive studies] want to make comparative (quantitative) statements, but do not want to be held to the accepted canons for justifying quantitative statements. In the phrasing of Bertrand Russell (1919, p. 71), they want to achieve by stealth what should be achieved by honest toil. (p. 268) Thus, authors of nonintervention studies may believe that that their research will have greater "salability" and impact if educational prescriptions can be offered from their correlational findings, such as those illustrated by the "causal language" examples in Table 1. In addition, reviewers or editors may ask authors to include causal statements in order to make a study (or a journal) seem more attractive to practitioners. In fact, the present authors have occasionally been asked by editors to provide clear directives to practitioners, so that they will be equipped with immediately usable materials, approaches, or strategies based on the studys findings. Both of these possibilities, if true, would confirm recent concerns that educational research is becoming less scientifically credible (e.g., Halpern, 2005; Levin & ODonnell, 1999; Mayer, 2005; Whitehurst, 2003). The American Educational Research Association (AERA) recently formed a task force on reporting research methods to consider the poor reputation of educational research. Gene Glass commented on AERAs problems in addressing shortcomings in educational research: "AERA has more to do with legitimizing certain messages . . . than it does with advancing our understanding of education" (Robinson, 2004, p. 29). In 2003, a presidential invited session at the annual meeting of AERA featured past AERA presidents discussing the state of educational research. The audio of the comments in this session (Number 45.010) is available from http://www.softconference.com/storefront/230421. Several past presidents commented on the problems with AERA and its failure to take on a leadership role in helping educational research overcome its poor reputation. Former AERA president Alan Schoenfeld received a letter from an AERA member who commented that "there isnt much research being reported at AERA" and raised the question, "Should we rename the organization the American Educational Opinion Association?" The member further stated, This is a serious problem. We serve a profession that has little regard for knowledge as the basis for decision-making. By encouraging anything that passes for inquiry to be a valid way of discovering answers to complex questions, we support a culture of intuition and artistry rather than building reliable research bases and robust theories. (Saxe & Schoenfeld, 1998, p. 41). Levin and ODonnell (1999, pp. 194196) pondered precisely the same "AEOA" question in relation to both the content of this letter and other AERA annual meeting sessions. Schoenfeld also suggested that educational researchers are the Rodney Dangerfields ("I get no respect") of academia, and this is partly because of overblown claims in the research literature. In the 2003 AERA presidential invited session, another former president, Penelope Peterson, noted, "We have huge problems with quality. We have few examples of replicable research that has affected educational practice. Oftentimes the best research that really affects practice is not being done in the schools of education." Finally, James Popham remarked, "I want to propose a new evaluative criterion: What is the evidence that this activity is likely to improve the educational process?" G. Reid Lyon, then chief of the Child Development and Behavior Branch of the National Institute of Child Health and Human Development (NICHD) at the National Institutes of Health (NIH), in his testimony before the Committee on Education and the Workforce, U.S. House of Representatives, Washington, D.C., on May 4, 2000, had this to say about the state of educational research: Historically, education research has not had a significant impact on educational policies and classroom instructional practices. The reasons for this persistent gap between the guidance that education research hopefully provides and the teaching practices that teachers use on a day-to-day basis are many. . . . First, as recently found by the National Reading Panel (NRP), much of the education research published in archival journals and disseminated to researchers, teachers, and policy makers is of uneven, and often not good, quality. It is important to understand that the trustworthiness of any research study is predicated on two major elements: (1) the suitability of the proposed research design or methodology to address the specific question posed by the study; and (2) the scientific rigor of the methodology itself. For the results to be trustworthy, a study must use the appropriate methodology and apply it in a rigorous manner. For example, if the question is one of effectivenesslets say, how effective are specific instructional approaches in teaching children to readthen the only type of research design able to specifically address the question of cause and effect is an experimental or quasi-experimental approach. Such studies are quantitative in nature. In fact, this was the type of research approach selected by the National Reading Panel. To quote the NRP Report, "To make a determination that any instructional practice could be or should be adopted widely to improve reading achievement requires that the belief, assumption, or claim supporting the practice can be causally linked to a particular outcome. The highest standard of evidence for such a claim is the experimental study, in which it is shown that treatment can make such changes and effect such outcomes." (National Reading Panel, 1999, pp. 17) Given the foregoing reflections and comments about the impact of educational research, what do we, as a research community, want to consider as adequate evidence for causal statements from our research findings? A speculative solution would be to identify clearly the acceptable designs for causal statements and to involve journal editors and reviewers in the process of monitoring these claims in the scholarly research literature. Both seasoned veterans and beginning graduate students alike may find it enlightening to determine the extent to which causal statements are (in)appropriately used in a variety of contexts (e.g., published research studies, dissertations, class discussions) within the boundaries of different methodologies. For example, when correlational studies report selected or "controlled" comparisons of group means and standardized mean-difference effect sizes, or when randomized experimental studies focus on correlation coefficients or strength-of-relationships indices as outcome measures, it is easy for both researchers and readers to confuse methodology with analysis. It must be recognized that analytic techniques such as SEM and HLM can be applied in both correlational and experimental research contexts. A focus on improved training in understanding how research can be designed to provide more support for causal inferences may be especially helpful for investigators studying the effects of educational treatments across multiple years. Longitudinal studies of educational treatments typically encounter unforeseen obstacles (e.g., student mobility and attrition, changes in teachers and peer groups, treatment diffusion), which make attributing outcome differences to specific treatments much more difficult than in a short-term experimental study. Analogous to criteria established by the health sciences CONSORT (Moher, Jones, & Lepage, 2001) and TREND (Des Jarlais, Lyles, & Crepaz, 2004) guidelines for randomized clinical trials and quasi-experimental research, respectively, it may also be useful to establish and apply criteria for claims that can be made based on differing methodologiesincluding both unjustified causal conclusions in nonexperimental research (of concern in the present study) and unjustified generalizations in more artificial or analog experimental research contexts. As journal readers, we have an obligation to search an article for information about how the data were collected so we are not unduly influenced by unwarranted conclusions.
DANIEL H. ROBINSON is an Associate Professor of Educational Psychology at the University of Texas at Austin, SZB 504, Austin, TX 78712-1296; e-mail: dan.robinson{at}mail.utexas.edu. He specializes in computer applications for learning and educational team environments. JOEL R. LEVIN is a Professor of Educational Psychology, College of Education, University of Arizona, Tucson, AZ 85721; e-mail: jrlevin{at}u.arizona.edu. He specializes in cognitive processes and strategies, as well as in the development of statistical and methodological tools for educational research. GREG D. THOMAS is a Doctoral Student in Educational Psychology at the University of Texas at Austin, SZB 504, Austin, TX 78712-1296; e-mail: gdthom{at}mail.utexas.edu. He specializes in team performance in education. KEENAN A. PITUCH is Assistant Professor of Educational Psychology at the University of Texas at Austin, SZB 504, Austin, TX 78712-1296; e-mail: keenan.pituch{at}mail.utexas.edu. He specializes in multilevel modeling, evaluation methodology, and mediation analysis methods. SHARON VAUGHN is the H. E. Hartfelder/The Southland Corporation Regents Chair in Human Resource Development and Professor of Special Education at the University of Texas at Austin, SZB 504, Austin, TX 78712-1296; e-mail: srvaughnum{at}aol.com. She specializes in interventions for students with reading difficulties and students who are English language learners. Received for publication October 31, 2005. Revision received May 11, 2006. Accepted for publication May 30, 2006.
American Educational Research Journal, Vol. 44, No. 2,
400-413 (2007) This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






