| Sign In to gain access to subscriptions and/or personal tools. |
Students Perceptions of Characteristics of Effective College Teachers: A Validity Study of a Teaching Evaluation Form Using a Mixed-Methods AnalysisUniversity of South Florida
University of Central Arkansas
University of Arkansas, Fayetteville
University of Central Arkansas
This study used a multistage mixed-methods analysis to assess the content-related validity (i.e., item validity, sampling validity) and construct-related validity (i.e., substantive validity, structural validity, outcome validity, generalizability) of a teaching evaluation form (TEF) by examining students perceptions of characteristics of effective college teachers. Participants were 912 undergraduate and graduate students (10.7% of student body) from various academic majors enrolled at a public university. A sequential mixed-methods analysis led to the development of the CARE-RESPECTED Model of Teaching Evaluation, which represented characteristics that students considered to reflect effective college teaching—comprising four meta-themes (communicator, advocate, responsible, empowering) and nine themes (responsive, enthusiast, student centered, professional, expert, connector, transmitter, ethical, and director). Three of the most prevalent themes were not represented by any of the TEF items; also, endorsement of most themes varied by student attribute (e.g., gender, age), calling into question the content- and construct-related validity of the TEF scores.
Key Words: college teaching mixed methods teaching evaluation form validity In this era of standards and accountability, institutions of higher learning have increased their use of student rating scales as an evaluative component of the teaching system (Seldin, 1993). Virtually all teachers at most universities and colleges are either required or expected to administer to their students some type of teaching evaluation form (TEF) at one or more points during each course offering (Dommeyer, Baum, Chapman, & Hanna, 2002; Onwuegbuzie, Daniel, & Collins, 2006, in press). Typically, TEFs serve as formative and summative evaluations that are used in an official capacity by administrators and faculty for one or more of the following purposes: (a) to facilitate curricular decisions (i.e., improve teaching effectiveness); (b) to formulate personnel decisions related to tenure, promotion, merit pay, and the like; and (c) as an information source to be used by students as they select future courses and instructors (Gray & Bergmann, 2003; Marsh & Roche, 1993; Seldin, 1993). TEFs were first administered formally in the 1920s, with students at the University of Washington responding to what is credited as being the first TEF (Guthrie, 1954; Kulik, 2001). Ory (2000) described the progression of TEFs as encompassing several distinct periods that marked the perceived need for information by a specific audience (i.e., stakeholder). Specifically, in the 1960s, student campus organizations collected TEF data in an attempt to meet students demands for accountability and informed course selections. In the 1970s, TEF ratings were used to enhance faculty development. In the 1980s to 1990s, TEFs were used mainly for administrative purposes rather than for student or faculty improvement. In recent years, as a response to the increased focus on improving higher education and requiring institutional accountability, the public, the legal community, and faculty are demanding TEFs with greater trustworthiness and utility (Ory, 2000). Since its inception, the major objective of the TEF has been to evaluate the quality of faculty teaching by providing information useful to both administrators and faculty (Marsh, 1987; Seldin, 1993). As observed by Seldin (1993), TEFs receive more scrutiny from administrators and faculty than do other measures of teaching effectiveness (e.g., student performance, classroom observations, faculty self-reports). Used as a summative evaluation measure, TEFs serve as an indicator of accountability by playing a central role in administrative decisions about faculty tenure, promotion, merit pay raises, teaching awards, and selection of full-time and adjunct faculty members to teach specific courses (Kulik, 2001). As a formative evaluation instrument, faculty may use data from TEFs to improve their own levels of instruction and those of their graduate teaching assistants. In turn, TEF data may be used by faculty and graduate teaching assistants to document their teaching when applying for jobs. Furthermore, students can use information from TEFs as one criterion for making decisions about course selection or deciding between multiple sections of the same course taught by different teachers. Also, TEF data regularly are used to facilitate research on teaching and learning (Babad, 2001; Gray & Bergmann, 2003; Kulik, 2001; Marsh, 1987; Marsh & Roche, 1993; Seldin, 1993; Spencer & Schmelkin, 2002). Although TEF forms might contain one or more open-ended items that allow students to disclose their attitudes toward their instructors teaching style and efficacy, these instruments typically contain either exclusively or predominantly one or more rating scales containing Likert-type items (Onwuegbuzie et al., 2006, in press). It is responses to these scales that are given the most weight by administrators and other decision makers. In fact, TEFs often are used as the sole measure of teacher effectiveness (Washburn & Thornton, 1996).
Several researchers have investigated the score reliability of TEFs. However, these findings have been mixed (Haskell, 1997), with the majority of studies yielding TEF scores with large reliability coefficients (e.g., Marsh & Bailey, 1993; Peterson & Kauchak, 1982; Seldin, 1984) and with only a few studies (e.g., Simmons, 1996) reporting inadequate score reliability coefficients. Even if it can be demonstrated that a TEF consistently yields scores with adequate reliability coefficients, it does not imply that these scores will yield valid scores because evidence of score reliability, although essential, is not sufficient for establishing evidence of score validity (Crocker & Algina, 1986; Onwuegbuzie & Daniel, 2002, 2004). Validity is the extent to which scores generated by an instrument measure the characteristic or variable they are intended to measure for a specific population, whereas validation refers to the process of systematically collecting evidence to provide justification for the set of inferences that are intended to be drawn from scores yielded by an instrument (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME], 1999). In validation studies, traditionally, researchers seek to provide one or more of three types of evidences: content-related validity (i.e., the extent to which the items on an instrument represent the content being measured), criterion-related validity (i.e., the extent to which scores on an instrument are related to an independent external/criterion variable believed to measure directly the underlying attribute or behavior), and construct-related validity (i.e., the extent to which an instrument can be interpreted as a meaningful measure of some characteristic or quality). However, it should be noted that these three elements do not represent three distinct types of validity but rather a unitary concept (AERA, APA, & NCME, 1999). Onwuegbuzie et al. (in press) have provided a conceptual framework that builds on Messicks (1989, 1995) theory of validity. Specifically, these authors have combined the traditional notion of validity with Messicks conceptualization of validity to yield a reconceptualization of validity that Onwuegbuzie et al. called a meta-validation model, as presented in Figure 1. Although treated as a unitary concept, it can be seen in Figure 1 that content-, criterion-, and construct-related validity can be subdivided into areas of evidence. All of these areas of evidence are needed when assessing the score validity of TEFs. Thus, the conceptual framework presented in Figure 1 serves as a schema for the score validation of TEFs.
Criterion-Related Validity Criterion-related validity comprises concurrent validity (i.e., the extent to which scores on an instrument are related to scores on another, already-established instrument administered approximately simultaneously or to a measurement of some other criterion that is available at the same point in time as the scores on the instrument of interest) and predictive validity (i.e., the extent to which scores on an instrument are related to scores on another, already-established instrument administered in the future or to a measurement of some other criterion that is available at a future point in time as the scores on the instrument of interest). Of the three evidences of validity, criterion-related validity evidence has been the strongest. In particular, using meta-analysis techniques, P. A. Cohen (1981) reported an average correlation of .43 between student achievement and ratings of the instructor and an average correlation of .47 between student performance and ratings of the course. However, as noted by Onwuegbuzie et al. (in press), it is possible or even likely that the positive relationship between student rating and achievement found in the bulk of the literature represents a "positive manifold" effect, wherein individuals who attain the highest levels of course performance tend to give their instructors credit for their success, whether or not this credit is justified. As such, evidence of criterion-related validity is difficult to establish for TEFs using solely quantitative techniques.
Content-Related Validity
Construct-Related Validity Structural validity involves evaluating how well the scoring structure of the instrument corresponds to the construct domain. Evidence of structural validity typically is obtained via exploratory factor analyses, whereby the dimensions of the measure are determined. However, sole use of exploratory factor analyses culminates in items being included on TEFs, not because they represent characteristics of effective instruction as identified in the literature but because they represent dimensions underlying the instrument, which likely was developed atheoretically. As concluded by Ory and Ryan (2001), this is "somewhat like analyzing student responses to hundreds of math items, grouping the items into response-based clusters, and then identifying the clusters as essential skills necessary to solve math problems" (p. 35). As such, structural validity evidence primarily should involve comparison of items on TEFs to effective attributes identified in the existing literature. Comparative validity involves convergent validity (i.e., scores yielded from the instrument of interest being highly correlated with scores from other instruments that measure the same construct), discriminant validity (i.e., scores generated from the instrument of interest being slightly but not significantly related to scores from instruments that measure concepts theoretically and empirically related to but not the same as the construct of interest), and divergent validity (i.e., scores yielded from the instrument of interest not being correlated with measures of constructs antithetical to the construct of interest). Several studies have yielded evidence of convergent validity. In particular, TEF scores have been found to be related positively to self-ratings (Blackburn & Clark, 1975; Marsh, Overall, & Kessler, 1979), observer ratings (Feldman, 1989; Murray, 1983), peer ratings (Doyle & Crichton, 1978; Feldman, 1989; Ory, Braskamp, & Pieper, 1980), and alumni ratings (Centra, 1974; Overall & Marsh, 1980). However, scant evidence of discriminant and divergent validity has been provided. For instance, TEF scores have been found to be related to attributes that do not necessarily reflect effective instruction, such as showmanship (Naftulin, Ware, & Donnelly, 1973), body language (Ambady & Rosenthal, 1992), grading leniency (Greenwald & Gillmore, 1997), and vocal pitch and gestures (Williams & Ceci, 1997). Outcome validity refers to the meaning of scores and the intended and unintended consequences of using the instrument (Messick, 1989, 1995). Outcome validity data appear to provide the weakest evidence of validity because it requires "an appraisal of the value implications of the theory underlying student ratings" (Ory & Ryan, 2001, p. 38). That is, administrators respond to questions such as Does the content of the TEF reflect characteristics of effective instruction that are valued by students? Finally, generalizability pertains to the extent that meaning and use associated with a set of scores can be generalized to other populations. Unfortunately, researchers have found differences in TEF ratings as a function of several factors, such as academic discipline (Centra & Creech, 1976; Feldman, 1978) and course level (Aleamoni, 1981; Braskamp, Brandenberg, & Ory, 1984). Therefore, it is not clear whether the association documented between TEF ratings and student achievement is invariant across all contexts, thereby making it difficult to make any generalizations about this relationship. Thus, more evidence is needed.
As can be seen, much more validity evidence is needed regarding TEFs. Unless it is demonstrated that TEFs yield scores that are valid, as contended by Gray and Bergmann (2003), these instruments may be subject to misuse and abuse by administrators, representing "an instrument of unwarranted and unjust termination for large numbers of junior faculty and a source of humiliation for many of their senior colleagues" (p. 44). Theall and Franklin (2001) provided several recommendations for TEFs. In particular, they stated the following: "Include all stakeholders in decisions about the evaluation process by establishing policy process" (p. 52). This recommendation has intuitive appeal. Yet the most important stakeholders—namely, the students themselves—typically are omitted from the process of developing TEFs. Although research has documented an array of variables that are considered characteristics of effective teaching, the bulk of this research base has used measures that were developed from the perspectives of faculty and administrators—not from students perspectives (Ory & Ryan, 2001). Indeed, as noted by Ory and Ryan (2001), "It is fair to say that many of the forms used today have been developed from other existing forms without much thought to theory or construct domains" (p. 32). A few researchers have examined students perceptions of effective college instructors. Specifically, using students perspectives as their data source, Crumbley, Henry, and Kratchman (2001) reported that undergraduate and graduate students (n = 530) identified the following instructor traits that were likely to affect positively students evaluations of their college instructor: teaching style (88.8%), presentation skills (89.4%), enthusiasm (82.2%), preparation and organization (87.3%), and fairness related to grading (89.8%). Results also indicated that graduate students, in contrast to undergraduate students, placed stronger emphasis on a structured classroom environment. Factors likely to lower students evaluations were associated with students perceptions that the content taught was insufficient to achieve the expected grade (46.5%), being asked embarrassing questions by the instructor (41.9%), and if the instructor appeared inexperienced (41%). In addition, factors associated with testing (i.e., administering pop quizzes) and grading (i.e., harsh grading, notable amount of homework) were likely to lower students evaluations of their instructors. Sheehan (1999) asked undergraduate and graduate psychology students attending a public university in the United States to identify characteristics of effective teaching by responding to a survey instrument. Results of regression analyses indicated that the following variables predicted 69% of the variance in the criterion variable of teacher effectiveness: informative lectures, tests, papers evaluating course content, instructor preparation, interesting lectures, and degree that the course was perceived as challenging. More recently, Spencer and Schmelkin (2002) found that students representing sophomores, juniors, and seniors attending a private U.S. university perceived effective teaching as characterized by college instructors personal characteristics: demonstrating concern for students, valuing student opinions, clarity in communication, and openness toward varied opinions. Greimel-Fuhrmann and Geyers (2003) evaluation of interview data indicated that undergraduate students perceptions of their instructors and the overall instructional quality of the courses were influenced positively by teachers who provided clear explanations of subject content, who were responsive to students questions and viewpoints, and who used a creative approach toward instruction beyond the scope of the course textbook. Other factors influencing students perceptions included teachers demonstrating a sense of humor and maintaining a balanced or fair approach toward classroom discipline. Results of an exploratory factor analysis identified subject-oriented teacher, student-oriented teacher, and classroom management as factors accounting for 69% of the variance in students global ratings of their instructors (i.e., ". . . is a good teacher" and "I am satisfied with my teacher") and global ratings concerning student acquisition of domain-specific knowledge. Adjectives describing a subject-oriented teacher were (a) provides clear explanations, (b) repeats information, and (c) presents concrete examples. A student-oriented teacher was defined as student friendly, patient, and fair. Classroom management was defined as maintaining consistent discipline and effective time management. In their study, Okpala and Ellis (2005) examined data obtained from 218 U.S. college students regarding their perceptions of teacher quality components. The following five qualities emerged as key components: caring for students and their learning (89.6%), teaching skills (83.2%), content knowledge (76.8%), dedication to teaching (75.3%), and verbal skills (73.9%). Several researchers who have attempted to identify characteristics of effective college teachers have addressed college faculty. In particular, in their analysis of the perspectives of faculty (n = 99) and students (n = 231) regarding characteristics of effective teaching, Schaeffer, Epting, Zinn, and Buskit (2003) found strong similarities between the two groups when participants identified and ranked what they believed to be the most important 10 of 28 qualities representing effective college teaching. Although specific order of qualities differed, both groups agreed on 8 of the top 10 traits: approachable, creative and interesting, encouraging and caring, enthusiastic, flexible and open-minded, knowledgeable, realistic expectations and fair, and respectful. Kane, Sandretto, and Heath (2004) also attempted to identify the qualities of excellent college teachers. For their study, investigators asked heads of university science departments to nominate lecturers whom they deemed excellent teachers. The criteria for the nominations were based upon both peer and student perceptions of the faculty members quality of teaching and upon the faculty members demonstrated interest in exploring her or his own teaching practice. Investigators noted that a number of nomination letters referenced student evaluations. Five themes representing excellence resulted from the analysis of data from the 17 faculty participants. These were knowledge of subject, pedagogical skill (e.g., clear communicator, one who makes real-world connections, organized, motivating), interpersonal relationships (e.g., respect for and interest in students, empathetic and caring), research/ teaching nexus (e.g., integration of research into teaching), and personality (e.g., exhibits enthusiasm and passion, has a sense of humor, is approachable, builds honest relationships).
Although the few studies on students perceptions of effective college instructors have yielded useful information, the researchers did not specify whether the perceptions that emerged were reflected by the TEFs used by the respective institutions. Bearing in mind the important role that TEFs play in colleges, universities, and other institutions of further and higher learning, it is vital that much more validity evidence be collected. Because the goal of TEFs is to make local decisions (e.g., tenure, promotion, merit pay, teaching awards), it makes sense to collect such validity evidence one institution at a time and then use generalization techniques such as meta-analysis (Glass, 1976, 1977; Glass, McGaw, & Smith, 1981), meta-summaries (Sandelowski & Barroso, 2003), and meta-validation (Onwuegbuzie et al., in press) to paint a holistic picture of the appropriateness and utility of TEFs. With this in mind, the purpose of this study was to conduct a validity study of a TEF by examining students perceptions of characteristics of effective college teachers. Using mixed-methods techniques, the researchers assessed the content-related validity and construct-related validity pertaining to a TEF. With respect to content-related validity, the item validity and sampling validity pertaining to the selected TEF were examined. With regard to construct-related validity, substantive validity was examined via an assessment of the theoretical analysis of the knowledge, skills, and processes hypothesized to underlie respondents scores; structural validity was assessed by comparing items on the TEF to effective attributes identified both in the extant literature and by the current sample; outcome validity was evaluated via an appraisal of some of the intended and unintended consequences of using the TEF; and generalizability was evaluated via an examination of the invariance of students perceptions of characteristics of effective college teachers (e.g., males vs. females, graduate students vs. undergraduate students). Simply put, we examined areas of validity evidence of a TEF that have received scant attention. The following mixed-methods research question was addressed: What is the content-related validity (i.e., item validity, sampling validity) and construct-related validity (i.e., substantive validity, structural validity, outcome validity, generalizability) pertaining to a TEF? Using Newman, Ridenour, Newman, and DeMarcos (2003) typology, the goal of this mixed-methods research study was to have a personal, institutional, and/or organizational impact on future TEFs. The objectives of this mixed-methods inquiry were threefold: (a) exploration, (b) description, and (c) explanation (Johnson & Christensen, 2004). As such, it was hoped that the results of the current investigation would contribute to the extant literature and provide information useful for developing more effective TEFs.
Participants Participants were 912 college students who were attending a midsize public university in a midsouthern state. The sample size represented 10.66% of the student body at the university where the study took place. These students were enrolled in 68 degree programs (e.g., education, mathematics, history, sociology, dietetics, journalism, nursing, prepharmacy, premedical) that represented all six colleges. The sample was selected purposively utilizing a criterion sampling scheme (Miles & Huberman, 1994; Onwuegbuzie & Collins, in press; Patton, 1990). The majority of the sample was female (74.3%). With respect to ethnicity, the respondents comprised Caucasian American (85.4%), African American (11.0%), Asian American (1.0%), Hispanic (0.4%), Native American (0.9%), and other (1.3%). Ages ranged from 18 to 58 years (M = 23.00, SD = 6.26). With regard to level of student (i.e., undergraduate vs. graduate), 77.04% represented undergraduate students. A total of 76 students were preservice teachers. Although these demographics do not exactly match the larger population at the university, they appear to be at least somewhat representative. In particular, at the university where the study took place, 61% of the student population is female. With respect to ethnicity, the university population comprises 76% Caucasian American, 16% African American, 1% Asian American, 0.9% Hispanic, 0.86% Native American, and 2.7% unknown; of the total student population, 89% are undergraduates. The sample members had taken an average of 32.24 (SD = 41.14) undergraduate or 22.33 (SD = 31.62) graduate credit hours, with a mean undergraduate grade point average (GPA) of 2.80 (SD = 2.29) and mean graduate GPA of 3.18 (SD = 1.25) on a 4-point scale. Finally, the sample members number of offspring ranged from 0 to 6 (M = 0.32, SD = 0.84). Because all 912 participants contributed to both the qualitative and quantitative phases of the study, and the qualitative phase preceded the quantitative phases, the mixed-methods sampling design used was a sequential design using identical samples (Collins, Onwuegbuzie, & Jiao, 2006, in press; Onwuegbuzie & Collins, in press).
Setting
Teaching Evaluation Form
Instruments and Procedure To maximize its content-related validity, the questionnaire was pilot-tested on 225 students at two universities that were selected via a maximum variation sampling technique (Miles & Huberman, 1994)—one university (n = 110) that was similar in enrollment size and Carnegie foundation classification to the university where the study took place and one Research I university (n = 115). Modifications to the instrument were made during this pilot stage, as needed.
Research design
Analysis The SMMA consisted of four stages. The first stage involved a thematic analysis (i.e., exploratory stage) to analyze students responses regarding their perceptions of characteristics of effective college teachers (Goetz & LeCompte, 1984). The goal of this analytical method was to understand phenomena from the perspective of those being studied (Goetz & LeCompte, 1984). The thematic analysis was generative, inductive, and constructive because it required the inquirer(s) to bracket or suspend all preconceptions (i.e., epoche) to minimize bias (Moustakas, 1994). Thus, the researchers were careful not to form any a priori hypotheses or expectations with respect to students perceptions of effective college instructors. The thematic analysis undertaken in this study involved the methodology of reduction (Creswell, 1998). With reduction, the qualitative data "sharpens, sorts, focuses, discards, and organizes data in such a way that final conclusions can be drawn and verified" (Miles & Huberman, 1994, p. 11) while retaining the context in which these data occurred (Onwuegbuzie & Teddlie, 2003). Specifically, a modification of Colaizzis (1978) analytic methodology was used that contained five procedural steps. These steps were as follows: (a) All the students words, phrases, and sentences were read to obtain a feeling for them. (b) These students responses were then unitized (Glaser & Strauss, 1967). (c) These units of information then were used as the basis for extracting a list of nonrepetitive, nonoverlapping significant statements (i.e., horizonalization of data; Creswell, 1998), with each statement given equal weight. Units were eliminated that contained the same or similar statements such that each unit corresponded to a unique instructional characteristic. (d) Meanings were formulated by elucidating the meaning of each significant statement (i.e., unit). Finally, (e) clusters of themes were organized from the aggregate formulated meanings, with each cluster consisting of units that were deemed similar in content; therefore, each cluster represented a unique emergent theme (i.e., method of constant comparison; Glaser & Strauss, 1967; Lincoln & Guba, 1985). Specifically, the analysts compared each subsequent significant statement with previous codes such that similar clusters were labeled with the same code. After all the data had been coded, the codes were grouped by similarity, and a theme was identified and documented based on each grouping (Leech & Onwuegbuzie, in press-a). These clusters of themes were compared to the original descriptions to verify the clusters (Leech & Onwuegbuzie, in press-a). This was undertaken to ensure that no original descriptions made by the students were unaccounted for by the cluster of themes and that no cluster contained units that were not in the original descriptions. These themes were created a posteriori (Constas, 1992). As such, each significant statement was linked to a formulated meaning and to a theme. This five-step method of thematic analysis was used to identify a number of themes pertaining to students perceptions of characteristics of effective college instructors. The locus of typology development was investigative, stemming from the intellectual constructions of the researchers (Constas, 1992). The source for naming of categories also was investigative (Constas, 1992). Double coding (Miles & Huberman, 1994) was used for categorization verification, which took the form of interrater reliability. Consequently, the verification component of categorization was empirical (Constas, 1992). Specifically, three of the researchers independently coded the students responses and determined the emergent themes. These themes were compared and the rate of agreement determined (i.e., interrater reliability). Because more than two raters were involved, the multirater Kappa measure was used to provide information regarding the degree to which raters achieved the possible agreement beyond any agreement than could be expected to occur merely by chance (Siegel & Castellan, 1988). Because a quantitative technique (i.e., interrater reliability) was employed as a validation technique, in addition to being empirical, the verification component of categorization was technical (Constas, 1992). The verification approach was accomplished a posteriori (Constas, 1992). The following criteria were used to interpret the Kappa coefficient: < .20 = poor agreement, .21–.40 = fair agreement, .41–.60 = moderate agreement, .61–.80 = good agreement, .81–1.00 = very good agreement (Altman, 1991). An additional method of interrater reliability, namely, peer debriefing, was used to legitimize the data interpretations. Peer debriefing provides a logically based external evaluation of the research process (Glesne & Peshkin, 1992; Lincoln & Guba, 1985; Maxwell, 2005; Merriam, 1988; Newman & Benz, 1998). The ("disinterested") peer selected was a college professor from another institution who had no stake in the findings and interpretations and who served as "devils advocate" in an attempt to keep the data interpretations as "honest" as possible (Lincoln & Guba, 1985, p. 308). The second stage of the sequential qualitative–quantitative mixed-methods analysis involved utilizing descriptive statistics (i.e., exploratory stage) to analyze the hierarchical structure of the emergent themes (Onwuegbuzie & Teddlie, 2003). Specifically, each theme was quantitized (Tashakkori & Teddlie, 1998). That is, if a student listed a characteristic that was eventually unitized under a particular theme, then a score of 1 would be given to the theme for the student response; a score of 0 would be given otherwise. This dichomotization led to the formation of an interrespondent matrix (i.e., Student x Theme Matrix) (Onwuegbuzie, 2003a; Onwuegbuzie & Teddlie, 2003). Both matrices consisted only of 0s and 1s.1 By calculating the frequency of each theme from the interrespondent matrix, percentages were computed to determine the prevalence rate of each theme.2 The third stage of the sequential qualitative–quantitative mixed-methods analysis involved the use of the aforementioned interrespondent matrix to conduct an exploratory factor analysis to determine the underlying structure of these themes (i.e., exploratory stage). More specifically, the interrespondent matrix was converted to a matrix of bivariate associations among the responses pertaining to each of the emergent themes (Thompson, 2004). These bivariate associations represented tetrachoric correlation coefficients because the themes had been quantitized to dichotomous data (i.e., 0 vs. 1), and tetrachoric correlation coefficients are appropriate to use when one is determining the relationship between two (artificial) dichotomous variables.3,4 Thus, the matrix of tetrachoric correlation coefficients was the basis of the exploratory factor analysis. This factor analysis determined the number of factors underlying the themes. These factors, or latent constructs, yielded meta-themes (Onwuegbuzie, 2003a) such that each meta-theme contained one or more of the emergent themes. The trace, or proportion of variance explained by each factor after rotation, served as an effect size index for each meta-theme (Onwuegbuzie, 2003a).5 Furthermore, the combined effect size pertaining to each meta-theme was computed (Onwuegbuzie, 2003a).6 By determining the hierarchical relationship between the themes, in addition to being empirical and technical, the verification component of categorization was rational (Constas, 1992). The fourth and final stage of the sequential qualitative–quantitative mixed-methods analysis (i.e., confirmatory analyses) involved the determination of antecedent correlates of the emergent themes that were extracted in Stage 1 and quantitized in Stage 2. This phase utilized the interrespondent matrix to undertake (a) a series of Fishers Exact tests to determine which demographic variables were related to each of the themes and (b) a canonical correlation analysis to examine the multivariate relationship between the themes and the demographic variables. Specifically, a canonical correlation analysis (Cliff & Krus, 1976; Darlington, Weinberg, & Walberg, 1973; Thompson, 1980, 1984) was used to determine this multivariate relationship. For each statistically significant canonical coefficient, standardized canonical function coefficients and structure coefficients were computed. These coefficients served as inferential-based effect sizes (Onwuegbuzie, 2003a). Onwuegbuzie and Teddlie (2003) identified the following seven stages of the mixed-methods data analysis process: (a) data reduction, (b) data display, (c) data transformation, (d) data correlation, (e) data consolidation, (f) data comparison, and (g) data integration. These authors defined data reduction as reducing the dimensionality of the quantitative data (e.g., via descriptive statistics, exploratory factor analysis, cluster analysis) and the qualitative data (e.g., via exploratory thematic analysis, memoing). Data display refers to describing visually the qualitative data (e.g., graphs, charts, matrices, checklists, rubrics, networks, and Venn diagrams) and quantitative data (e.g., tables, graphs). This is followed, if needed, by the data transformation stage, in which qualitative data are converted into numerical codes that can be analyzed statistically (i.e., quantitized; Tashakkori & Teddlie, 1998) and/or quantitative data are converted into narrative codes that can be analyzed qualitatively (i.e., qualitized; Tashakkori & Teddlie, 1998). Data correlation, the next step, involves qualitative data being correlated with quantitized data or quantitative data being correlated with qualitized data. This is followed by data consolidation, whereby both quantitative and qualitative data are combined to create new or consolidated variables, data sets, or codes. The next stage, data comparison, involves comparing data from the qualitative and quantitative data sources. Data integration is the final stage of the mixed-methods data analysis process, whereby both qualitative and quantitative data are integrated into either a coherent whole or two separate sets (i.e., qualitative and quantitative) of coherent wholes. In implementing the four-stage mixed-methods data analysis framework, the researchers incorporated five of the seven stages of Onwuegbuzie and Teddlies (2003) model, namely, data reduction, data display, data transformation, data correlation, and data integration. Using Collins, Onwuegbuzie, and Suttons (2006) rationale and purpose (RAP) model, the rationale for conducting the mixed-methods study could be classified as (a) participant enrichment, (b) instrument fidelity, and (c) significance enhancement. Participant enrichment represents the mixing of quantitative and qualitative approaches for the rationale of optimizing the sample (e.g., increasing the number of participants). Instrument fidelity refers to procedures used by the researcher(s) to maximize the utility and/or appropriateness of the instruments used in the study, whether quantitative or qualitative. Significance enhancement denotes mixing qualitative and quantitative techniques to maximize the interpretations of data (i.e., quantitative data can be used to enhance qualitative analyses, qualitative data can be used to enhance statistical analyses, or both). With respect to participant enrichment, the present researchers approached instructors/professors before the study began to solicit participation of their students and thus maximize the participation rate. With regard to instrument fidelity, the researchers (a) collected qualitative data (e.g., respondents perceptions of the questionnaire) and quantitative data (e.g., response rate information, missing data information) before the study began (i.e., pilot phase) and (b) used member checking techniques to assess the appropriateness of the questionnaire and the adequacy of the time allotted to complete it, after the major data collection phases. Finally, with respect to significance enhancement, the researchers used a combination of qualitative and quantitative analyses to get more out of their initial data both during and after the study, thereby enhancing the significance of their findings (Onwuegbuzie & Leech, 2004a). Moreover, the researchers sought to use mixed-methods data-analytic techniques in an attempt to combine descriptive precision (i.e., Stages 1 and 3) with empirical precision (i.e., Stages 2 to 4) (Caracelli & Greene, 1993; Johnson & Onwuegbuzie, 2004; Onwuegbuzie & Leech, 2006). Figure 2 provides a visual representation of how the RAP model was utilized in the current inquiry.
Stage 1 Analysis Every participant provided at least three characteristics they believed effective college instructors possess or demonstrate. The participants listed a total of 2,991 significant statements describing effective college teachers. This represented a mean of 3.28 significant statements per sample member. Examples of the significant statements and their corresponding formulated meanings and the themes that emerged from the students responses are presented in Table 1. This table reveals that the following nine themes surfaced from the students responses: student centered, expert, professional, enthusiast, transmitter, connector, director, ethical, and responsive. The descriptions of each of the nine themes are presented in Table 2. Examples of student centered include "willingness to listen to students," "compassionate," and "caring"; examples of expert include "intelligent," and "knowledgeable"; examples of professional are "reliable," "self-discipline," "diligence," and "responsible"; words that represent enthusiast include "encouragement," "enthusiasm," and "positive attitude"; words that describe transmitter are "good communication," "speaking clearly," and "fluent English"; examples that characterize connector include "open door policy," "available," and "around when students need help"; director includes descriptors such as "flexible," "organized," and "well prepared for class"; ethical is presented by words such as "consistency," "fair evaluator," and "respectful"; finally, examples that depict responsive include "quick turnaround," "understandable," and "informative."
The interrater reliability (i.e., multirater Kappa) associated with the three researchers who independently coded the students responses and determined the emergent themes was 93% (SE = 0.7), which can be interpreted as indicating very good agreement. Furthermore, based on the data, the "disinterested" peer agreed with all nine emergent themes. The only discrepancies pertained to the labels given to some of the themes. As a result of these discrepancies,7 the "disinterested peer" and coders scheduled an additional meeting to agree on more appropriate labels for the themes and meta-themes. This led to the relabeling of some of the themes and meta-themes that were not only more insightful but also evolved into meaningful acronyms—as can be seen in the following sections.
Stage 2 Analysis
Stage 3 Analysis An exploratory factor analysis was used to determine the number of factors underlying the nine themes. This analysis was conducted because it was expected that two or more of these themes would cluster together. Specifically, a maximum likelihood factor analysis was used. This technique, which gives better estimates than does principal factor analysis (Bickel & Doksum, 1977), is perhaps the most common method of factor analysis (Lawley & Maxwell, 1971). As recommended by Kieffer (1999) and Onwuegbuzie and Daniel (2003), the correlation matrix was used to undertake the factor analysis. An orthogonal (i.e., varimax) rotation was employed because of the expected small correlations among the themes. This analysis was used to extract the latent constructs. As conceptualized by Onwuegbuzie (2003a), these factors represented meta-themes. The eigenvalue-greater-than-one rule, also known as K1 (Kaiser, 1958), was used to determine an appropriate number of factors to retain. This technique resulted in four factors (i.e., meta-themes). The "scree" test, which represents a plot of eigenvalues against the factors in descending order (Cattell, 1966; Zwick & Velicer, 1986), also suggested that four factors be retained. This four-factor solution is presented in Table 4. Using a cutoff correlation of .3, recommended by Lambert and Durand (1975) as an acceptable minimum value for pattern/structure coefficients, Table 4 reveals that the following themes had pattern/structure coefficients with large effect sizes on the first factor: student centered and professional; the following themes had pattern/ structure coefficients with large effect sizes on the second factor: connector, transmitter, and responsive; the following themes had pattern/structure coefficients with large effect sizes on the third factor: director and ethical; and the following themes had pattern/structure coefficients with large effect sizes on the fourth factor: enthusiast and expert. The first meta-theme (i.e., Factor 1) was labeled advocate. The second meta-theme was termed communicator. The third meta-theme represented responsible. Finally, the fourth meta-theme denoted empowering. Interestingly, within the advocate meta-theme (i.e., Factor 1), the student-centered and professional themes were negatively related. Also, within the responsible meta-theme (i.e., Factor 3), the director and ethical themes were inversely related. The descriptions of each of the four meta-themes are presented in Table 5. The thematic structure is presented in Figure 3. This figure illustrates the relationships among the themes and meta-themes arising from students perceptions of the characteristics of effective college instructors.
An examination of the trace (i.e., the proportion of variance explained, or eigenvalue, after rotation; Hetzel, 1996) revealed that the advocate meta-theme (i.e., Factor 1) explained 14.44% of the total variance, the communicator meta-theme (i.e., Factor 2) accounted for 13.79% of the variance, the responsible meta-theme (i.e., Factor 3) explained 12.86% of the variance, and the empowering meta-theme (i.e., Factor 4) accounted for 11.76% of the variance. These four meta-themes combined explained 52.86% of the total variance. Interestingly, this proportion of total variance explained is consistent with that typically explained in factor solutions (Henson, Capraro, & Capraro, 2004; Henson & Roberts, 2006). Furthermore, this total proportion of variance, which provides an effect size index,8 can be considered large. The effect sizes associated with the four meta-themes (i.e., proportion of characteristics identified per meta-themes)9 were as follows: advocate (81.0%), communicator (43.7%), responsible (41.1%), and empowering (59.6%).
Stage 4 Analysis With respect to level of student, graduate students (59.6%) were statistically significantly more likely to deem being an expert in ones field as characteristic of effective instruction than were undergraduate students (39.7%). Cramers V effect size was .17. Moreover, these graduate students were 2.24 times (95% CI = 1.64, 3.08) more likely than were undergraduates to endorse being an expert. Similarly, graduate students (32.2%) were statistically significantly more likely to consider being a director to exemplify effective instruction than were undergraduate students (18.9%). Cramers V effect size was .14. These graduate students were 2.03 times (95% CI = 1.44, 2.88) more likely than were undergraduate students to endorse being a director. With regard to preservice teacher status, preservice teachers (40.8%) were statistically significantly less likely to endorse student centeredness as being indicative of effective instruction than were the other students (60.7%). Cramers V effect size was .11. Moreover, preservice teachers were 2.24 times (95% CI = 1.39, 3.61) less likely than were other students to endorse student centeredness. Conversely, preservice teachers (44.7%) were statistically significantly more likely to deem being ethical as characterizing effective instruction than were the remaining students (19.5%). Cramers V effect size was .17. These preservice teachers were 2.29 times (95% CI = 1.72, 3.05) more likely than were other students to endorse ethicalness. Similarly, preservice teachers (23.3%) were statistically significantly more likely to endorse being a director as representing effective instruction than were the other students (6.6%). Cramers V effect size was .11. These preservice teachers were 4.30 times (95% CI = 1.71, 10.81) more likely than were other students to endorse being a director. A series of point-biserial correlation coefficients was conducted to correlate each of the nine themes with each of the following four demographic variables: age, GPA, number of credit hours taken, and number of offspring. After applying the Bonferroni adjustment to control for family-wise error, only three associations were statistically significant: (a) Older students were more likely to endorse professionalism as an effective instructional characteristic (r = .12, p < .001), (b) students with the most credit hours were more likely to endorse ethicalness (r = .14, p < .001), and (c) students with the most credit hours were less likely to endorse being a director (r = –.09, p < .001); however, all three correlations were small. A canonical correlation analysis was undertaken to examine the relationship between the nine themes and the eight demographic variables. The nine themes were treated as the dependent set of variables, whereas the following variables were used as the independent multivariate profile: gender, race, level of student, preservice teacher status, age, GPA, number of credit hours taken, and number of offspring. The number of canonical functions (i.e., factors) that can be generated for a given data set is equal to the number of variables in the smaller of the two variable sets (Thompson, 1980, 1984, 1988, 1990). Because nine themes were correlated with eight independent variables, eight canonical functions were generated. The canonical analysis revealed that the eight canonical correlations combined were statistically significant (p < .0001). Also, when the first canonical root was excluded, the remaining seven canonical roots were statistically significant (p < .0001; Canonical Rc1 = .31). Similarly, when the first and second canonical roots were excluded, the remaining six canonical roots were statistically significant (p < .0001; Canonical Rc1 = .23). Furthermore, when the first three canonical roots were excluded, the remaining five canonical roots were statistically significant (p < .001; Canonical Rc1 = .21). However, when the first four canonical roots were excluded, the remaining four canonical roots were not statistically significant. In fact, removal of subsequent canonical roots did not lead to statistical significance. Together, these results suggested that the first three canonical functions were both statistically significant and practically significant (J. Cohen, 1988), but the remaining five roots were not statistically significant. Data pertaining to the first canonical root are presented in Table 6. This table provides both standardized function coefficients and structure coefficients. Using a cutoff correlation of .3 (Lambert & Durand, 1975), the standardized canonical function coefficients revealed that student centered, professional, and director made important contributions to the set of themes—with student centered and director being the major contributors. With respect to the demographic set, ones gender, level of student, and preservice teacher status made noteworthy contributions. The structure coefficients pertaining to the first canonical function revealed that student centered, ethical, and director made important contributions (i.e., were practically significant) to the first canonical variate. The square of the structure coefficient indicated that these variables explained 20.3%, 20.3%, and 33.6% of the variance, respectively. With regard to the demographic cluster, preservice teacher status made the strongest contribution, followed by level of student, number of credit hours, and gender. These variables explained 65.6%, 34.8%, 18.5%, and 9.0% of the variance, respectively.
Comparing the standardized and structure coefficients identified professional as a suppressor variable because the standardized coefficient associated with this variable was large, whereas the corresponding structure coefficient was relatively small (Onwuegbuzie & Daniel, 2003). Suppressor variables are variables that assist in the prediction of dependent variables due to their correlation with other independent variables (Tabachnick & Fidell, 2006). Table 7 presents data pertaining to the second canonical root, containing both standardized function coefficients and structure coefficients. The standardized canonical function coefficients revealed that enthusiast and expert made important contributions to the set of themes—with expert being the major contributor. With respect to the demographic set, ones gender, age, level of student, and number of credit hours made noteworthy contributions. The structure coefficients pertaining to the second canonical function revealed that enthusiast (21.2% explained variance), student centered (11.6% explained variance), and expert (49.0% explained variance) made important contributions. With regard to the demographic cluster, level of student (36.0% explained variance) made the strongest contribution, followed by age (34.8% explained variance), number of credit hours (13.7% explained variance), and number of offspring (11.6% explained variance). Comparing the standardized and structure coefficients implicated gender as a suppressor variable because the standardized coefficient associated with this variable was large, whereas the corresponding structure coefficient was relatively small.
Table 8 presents data pertaining to the third canonical root, containing both standardized function coefficients and structure coefficients. The standardized canonical function coefficients revealed that enthusiast, student centered, professional, ethical, expert, and director made important contributions to the set of themes—with enthusiast and director being the major contributors. With respect to the demographic set, ones age, race, level of student, and pre-service teacher status made similarly noteworthy contributions. The structure coefficients pertaining to the third canonical function revealed that enthusiast (20.3% explained variance), student centered (16.0% explained variance), professional (9.6% explained variance), ethical (10.9% explained variance), expert (10.2% explained variance), and director (16.8% explained variance) made important contributions. With regard to the demographic cluster, race (30.3% explained variance) made the strongest contribution, followed by level of student (15.2% explained variance), number of offspring (15.2% explained variance), and age (10.2% explained variance). Comparing the standardized and structure coefficients identified preservice teacher status as a suppressor variable because the standardized coefficients associated with this variable were large, whereas the corresponding structure coefficient was relatively small.
In sum, the results of the canonical correlation analysis involving the themes suggest that gender, race, age, level of student, preservice teacher status, number of offspring, and number of credit hours are related in some combination to enthusiast, student centered, professional, ethical, expert, and director. Of the demographic variable set, only GPA did not appear to play a role in the prediction of the themes. On the dependent set, the following three variables consistently were not involved in any of the three multivariate relationships: connector, transmitter, and responsive. A canonical correlation analysis also was undertaken to examine the relationship between the four meta-themes and the eight demographic variables. The four meta-themes were treated as the dependent set of variables, whereas the eight demographic variables again were utilized as the independent multivariate profile. The canonical analysis revealed that the four canonical correlations combined were statistically significant (p < .0001). When the first canonical root was excluded, the remaining three canonical roots were statistically significant (p < .0001; Canonical Rc1 = .23). Similarly, when the first and second canonical roots were excluded, the remaining two canonical roots were statistically significant (p < .0001; Canonical Rc1 = .21). However, when the first three canonical roots were excluded, the remaining canonical root was not statistically significant. Together, these results suggested that the first two canonical functions were both statistically significant and practically significant (J. Cohen, 1988), but the remaining two roots were not statistically significant. Data pertaining to the first canonical root are presented in Table 9. Using Lambert and Durands (1975) cutoff, the standardized canonical function coefficients revealed that responsible and empowering made important contributions to the set of meta-themes, with empowering slightly being the major contributor. With respect to the demographic set, age, race, level of student, and preservice teacher status made noteworthy contributions, with level of student making by far the largest contribution. The structure coefficients pertaining to the first canonical function revealed that advocate (13.0% explained variance), responsible (37.2% explained variance), and empowering (47.6% explained variance) made important contributions to the first canonical variate. With regard to the demographic cluster, race (24.0% explained variance), level of student (25.0% explained variance), and preservice teacher status (13.7% explained variance) each made important contributions. Comparing the standardized and structure coefficients implicated age as a suppressor variable because the standardized coefficient associated with this variable was large, whereas the corresponding structure coefficient was relatively small.
Data pertaining to the second canonical root are presented in Table 10. Using Lambert and Durands (1975) cutoff, the standardized canonical function coefficients revealed that communicator, advocate, and responsible made important contributions to the set of meta-themes, with advocate being by far the major contributor. With respect to the demographic set, gender, level of student, and preservice teacher status made noteworthy contributions, with gender making the largest contribution. The structure coefficients pertaining to the first canonical function revealed that advocate (74.0% explained variance) made a significant contribution to the first canonical variate. With regard to the demographic cluster, gender (13.6% explained variance), age (11.6% explained variance), GPA (10.2% explained variance), level of student (27.0% explained variance), and preservice teacher status (14.4% explained variance) each made important contributions. Comparing the standardized and structure coefficients did not reveal any suppressor variables.
In sum, the results of the canonical correlation analysis involving the meta-themes suggest that gender, race, age, GPA, level of student, and pre-service teacher status are related in some combination to all four meta-themes: namely, communicator, advocate, responsible, and empowering. Of the demographic variable set, only number of credit hours and number of offspring did not appear to play a role in the prediction of the meta-themes.
The purpose of this study was to conduct a validity study of a TEF by examining students perceptions of characteristics of effective college teachers, as well as to examine factors that are associated with their perceptions. Participants were 912 undergraduate and graduate students from various academic majors enrolled in a public university in a midsouthern state. Because the sample represented students at a single university (i.e., threat to population validity and ecological validity) whose perspectives about effective teachers were gathered at a single point in time (i.e., threat to temporal validity), it is not clear the extent to which the present findings are generalizable (i.e., have adequate external validity) to students from other institutions, particularly those from other regions of the United States. In addition, with respect to internal validity, instrumentation was a threat. Specifically, the validity of responses might have been affected by the fact that the students perceptions were assessed via a relatively brief self-report instrument (Onwuegbuzie, 2003b). However, as stated in Note 1, member checking data revealed that the time allocated for the completion of the survey was more than sufficient for students to express their views of characteristics of effective teachers, which resulted in more than 200 hours of data, in turn yielding nearly 3,000 significant statements. At the time of the study, the university had 8,555 undergraduate and graduate students enrolled. The sample for this investigation represented 10.7% of the total population and reflected 68 degree programs offered by the university. As such, the findings are representative, at least to some degree, of many students at that institution. In fact, the sample size far exceeded the recommended minimum sample size of 368 for a population size of 9,000 individuals (Krejecie & Morgan, 1970). Notwithstanding, the interpretations that follow pertain only to students at the institution where the study took place. Also, the subgroup sizes were large enough to conduct null hypothesis significance tests with very high (i.e., > .95) statistical power (Onwuegbuzie & Leech, 2004b).
Mixed-Methods Validity
Notwithstanding, the remaining seven legitimation types were addressed. Specifically, sample integration legitimation was optimized by using large and identical samples for both the qualitative and quantitative approaches. This enabled the researchers justifiably to combine the inferences that emerged from both approaches into meta-inferences (i.e., coherent set inference; Tashakkori & Teddlie, 2003, 2006). Inside–outside legitimation was enhanced by capturing the participants voices regarding their perceptions of characteristics of effective college instructors (i.e., insiders views), as well as comparing their perceptions to the TEF items (outsiders views). Weakness minimization legitimation was improved by combining descriptive precision (i.e., stemming from qualitative analyses) with empirical precision (i.e., stemming from quantitative analyses). Paradigmatic mixing legitimation was enhanced by using a fully mixed-methods research design (Leech & Onwuegbuzie, 2005, in press-b), as well as by undergoing all major steps of the mixed-methods research process (Onwuegbuzie & Leech, 2006). Commensurability legitimation was addressed by using a team of researchers that was diverse with respect to research orientation (e.g., qualitative, quantitative, and mixed-methods research orientations all were represented), college teaching experience (e.g., assistant professor, associate professor, and full professor titles all were represented), and discipline (e.g., special educator, educational foundations specialist, educational assessment, teacher educator, distance-learning specialist, instructional technology specialist, research methodologist). Multiple validities legitimation was enhanced by using the RAP model to optimize participant enrichment, instrument fidelity, and significance enrichment, as well as by using techniques (e.g., interrater reliability, member checking, debriefing) that addressed as many threats to the legitimation of both the qualitative and quantitative findings as possible. Finally, political legitimation was addressed by using rigorous qualitative and quantitative techniques. Nevertheless, despite the extremely rigorous nature of the research design, replications of this inquiry are needed to assess the reliability of the current results. These replications should include the use of other mixed-methods research designs and techniques so that sequential legitimation and conversion legitimation could be addressed.
Stage 1 and Stage 2 Analyses Although the context is primary and secondary schools, the American Association of School Administratorss (AASAs) two-element conceptualization of effective teachers can be used to classify these nine themes. The AASA concluded that characteristics of effective teachers tended to fall into two categories: (a) management and instructional techniques and (b) personal characteristics (Demmon-Berger, 1986). Specifically, the three themes (i.e., student centered, enthusiast, ethical) reflect the category of personal characteristics, whereas the remaining six categories (i.e., expert, professional, transmitter, connector, director, responsive) can be classified as representing management and instructional techniques. Comparing the results of the current study to the AASAs conceptualization revealed that a similarly high proportion of the present sample of college students noted one or more characteristics representing the personal characteristic domain (80.5%), as did those who rated a trait representing management and instructional techniques (88.8%). McNemars test indicated no statistically significant relationship (p > .05) between AASAs two response categories. Specifically, college students who rated a personal characteristic as being evidence of an effective teacher were neither more nor less likely to rate a management and instructional technique. This suggests that personal characteristics and management and instructional techniques appear to represent constructs that are somewhat independent. The finding that the student-centered theme represented descriptors that received the greatest endorsement is consistent with the results of both Witcher, Onwuegbuzie, and Minor (2001) and Minor, Onwuegbuzie, Witcher, and James (2002), who assessed preservice teachers perceptions about characteristics of effective teachers in the context of primary and secondary classroom settings. Witcher et al. reported an endorsement rate of 79.5% for the student-centered theme, and Minor et al. documented a 55.2% prevalence rate—both of which represented the highest levels of endorsement in their respective studies. In the present investigation, 58.9% of the sample members provided one or more descriptors that typified a student-centered disposition. All three proportions, which represent very large effect sizes, suggest strongly that student centeredness is considered to be the most important characteristic of effective instruction for teachers at the elementary, secondary, and postsecondary levels. Therefore, as was the case for pre-service teachers (Minor et al., 2002; Witcher et al., 2001), college students in the present study, overall, identified the interpersonal context as the most important indicator of effective instruction. This studys finding that student centered represented descriptors receiving the strongest student endorsement is consistent with the results of Greimel-Fuhrmann and Geyers (2003) study that identified a student-oriented teacher (i.e., student friendly, patient, and fair) as an attribute of an effective college teacher. The characteristics of presentation skills, enthusiasm, fairness in grading (Crumbley et al., 2001), and clarity in communication (Spencer & Schmelkin, 2002) are similar to this present studys themes of transmitter, enthusiast, and ethical, respectively. Witcher et al. (2001) identified the following six characteristics of effective teaching perceived by preservice teachers: student centeredness, enthusiastic about teaching, ethicalness, classroom and behavior management, teaching methodology, and knowledge of subject. Minor et al. (2002), in a follow-up study, replicated these six characteristics and found an additional characteristic, namely, professional. Comparing and contrasting these two sets of findings with the present results reveals several similarities and differences. Specifically, in the current investigation, the following themes from the Witcher et al. and Minor et al. studies were directly replicated: student centered, enthusiast, ethical, and expert (i.e., knowledge of subject area). Also, the professional theme identified in Minor et al.s inquiry was directly replicated. In addition, the director theme that emerged in the present investigation appears to represent a combination of the classroom and behavior management and teaching methodology themes identified in these previous studies. Three additional themes emerged in the present study: transmitter (23.46% endorsement rate), responsive (5.04% endorsement rate), and connector (23.25% endorsement rate). These themes have intuitive appeal, bearing in mind the nature of higher education. The emergence of the transmitter and responsive themes likely resulted from the fact that the material covered and homework assigned at the college level can be extremely complex. As such, many students need clear, explicit instructions and detailed feedback. In public schools, classroom teachers are more accessible as teachers are on-site for most, if not all, of the school day. In contrast, college instructors are expected to engage actively in research and service activities that must be undertaken outside their offices. As such, the amount of time that instructors are available for students (i.e., office hours) varies from department to department, college to college, and university to university. In addition, the requirements imposed by administrators for facultys office hours vary. Some institutions have no office requirements for professors, whereas others expect a minimum of 10 office hours per week. Furthermore, the majority of current undergraduate and graduate students is actively employed while enrolled in college—with a significant proportion working on a full-time basis (Cuccaro-Alamin & Choy, 1998; Horn, 1994). Thus, many students find it difficult to schedule appointments with their instructors during posted office hours. These factors may explain why connector, which includes being accessible, was deemed a characteristic of effective teachers by nearly one fourth of the sample members.
Stage 3 Analysis In addition to the communicator meta-theme, three other meta-themes emerged: advocate, comprising student centered and professional; responsible, representing director and ethical; and empowering, consisting of expert and enthusiast. The finding within the advocate meta-theme that student centered and professional themes were negatively related suggests that college students who were the most likely to endorse being student centered as a characteristic of effective teaching tended to be the least likely to endorse being professional as an effective trait, and vice versa. This result is interesting because it suggests that to some extent, many students view student centeredness and professionalism as lying on opposite ends of the continuum. It is possible that they have experienced teachers who give the impression of being the most professional because they exhibit traits such as efficiency, self-discipline, and responsibility, yet, at the same time, are least likely to display student-centered characteristics such as willingness to listen to students, compassion, and care. This should be the subject of future investigations. Within the responsible meta-theme, the director and ethical themes also were inversely related. In other words, students who deemed ethical to represent characteristics of effective college instructors, at the same time, tended not to endorse being a director, and vice versa. Indeed, of the sample members who endorsed the ethical theme, 89.3% did not endorse the director theme, yielding an odds ratio of 2.34 (95% CI = 1.53, 3.57). Unfortunately, it is beyond the scope of the present investigation to explain this finding. Thus, follow-up studies using qualitative techniques are needed. The most compelling finding pertaining to the meta-themes was that student labels represent the acronym CARE. According to The American Heritage College Dictionary (1997, p. 212), the following definitions are given for the word care: "Close attention," "watchful oversight," "charge or supervision," "attentive assistance or treatment to those in need," "to provide needed assistance or watchful supervision," and "to have a liking or attachment." All of these definitions are particularly pertinent to the field of college teaching. Therefore, the acronym CARE is extremely apt.
Stage 4 Analysis The second canonical correlation solution indicated that enthusiast, expert, and student centered composed a set related to the following demographic variables: gender, age, level of student, number of credit hours, and number of offspring. Therefore, these three themes represent a combination of college students perceptions that can be predicted by their gender, age, level of study, number of credit hours undertaken, and number of offspring. An inspection of the signs of the coefficients indicates that expert is inversely related to enthusiast and student centered. Interestingly, enthusiast and expert represent the empowering meta-theme, whereas student centered represents the advocate meta-theme. The third canonical correlation solution indicated that enthusiast, student centered, professional, ethical, expert, and director comprised a set related to the following demographic variables: age, race, level of student, preservice teacher status, and number of offspring. Thus, advocate (i.e., student centered, professional), empowering (i.e., enthusiast, expert), and responsible (i.e., ethical, director) represent a combination of college students perceptions that can be predicted by their age, race, level of student, preservice teacher status, and number of offspring. An inspection of the signs of the coefficients indicates that the two themes that represent the advocate meta-theme are inversely related to the remaining themes that represent this latent variable (i.e., enthusiast, expert, ethical, director).
Meta-themes The findings that gender, race, age, level of student, preservice teacher status, number of offspring, and number of credit hours are related in some combination to enthusiast, student centered, professional, ethical, expert, and director and that gender, race, age, GPA, level of student, and preservice teacher status are related in some combination to the four meta-themes suggest that individual differences exist with respect to students perceptions of the characteristics of effective college teachers. Thus, any instrument that omits items that represent any of the emergent themes or meta-themes may lead to a particular group of students (e.g., graduates, minority students) being "disenfranchised," inasmuch as the instructional attributes that these students perceive play an important role in optimizing their levels of course performance are not available to them for rating. In turn, such an omission would represent a serious threat to the content- and construct-related validity pertaining to the TEF. Furthermore, the relationships found between the majority of the demographic variables and several themes and meta-themes suggest that when interpreting responses to items contained in TEFs, administrators should consider the demographic profile of the underlying class. Unfortunately, this does not appear to be the current practice. According to Schmelkin, Spencer, and Gellman (1997), many administrators unwisely aggregate responses for the purpose of summative evaluation and comparison with peers without taking into account the context in which the class was taught. For instance, the finding that female students tend to place more weight on student centeredness than do male students, although replicating the findings of Witcher et al. (2001), suggests that a class with predominantly or exclusively female students—often the case in education courses—might scrutinize the instructors degree of student centeredness to a greater extent than might a class containing primarily males—often the case in courses involving the hard sciences. Similarly, a class containing mainly Caucasian American students is more likely to assess the instructors level of enthusiasm than is a class predominantly containing minority students (Minor et al., 2002).
Comparison of Findings With TEF Four themes were not represented by any of the items in the university evaluation form. These were student centered, expert, enthusiast, and ethical. Disturbingly, student centered, expert, and enthusiast represent three of the most prevalent themes endorsed by the college sample. In an effort to begin the process of generalizing the present findings, the researchers who, between them, have taught at three Research I/Research Extensive and two Research II/Research Intensive institutions, also examined the TEFs used at these sites. It was found that for each of these five institutions, at least three of these themes (i.e., student centered, enthusiast, and ethical) were not represented by any of the items in the corresponding teacher evaluation form. This discrepancy calls into serious question the content-related validity (i.e., item validity, sampling validity) and construct-related validity (i.e., structural validity, outcome validity, generalizability) pertaining to these TEFs. There appears to be a clear gap between what the developers of TEFs consider to be characteristics of effective instructors and what students deem to be the most important traits. Moreover, this gap suggests that students criteria for assessing college instructors may not be adequately represented in TEFs; this might adversely affect students ability to critique their instructors in a comprehensive manner. Thus, even if the scores yielded by this university evaluation form are reliable, the overall score validity of the TEF is in question. In an era in which information gleaned from TEFs is used to make decisions about faculty regarding tenure, promotion, and merit pay issues, this potential threat to validity is disturbing and warrants further research.
Conclusion The next step in the process is to design and score validate an instrument that provides formative and summative information about the efficacy of instruction based upon the various themes and meta-themes making up the CARE-RESPECTED Model of Teaching Evaluation that emerged from this study. The researchers presently are undertaking this task and hope that the outcome will provide a useful data-driven instrument that clearly benefits all stakeholders—college administrators, teachers, and, above all, students.
ANTHONY J. ONWUEGBUZIE is a professor of educational measurement and research in the Department of Educational Measurement and Research, College of Education, University of South Florida, 4202 East Fowler Avenue, EDU 162, Tampa, FL 33620-7750; e-mail: tonyonwuegbuzie{at}aol.com. He specializes in mixed methods, qualitative research, statistics, measurement, educational psychology, and teacher education. ANN E. WITCHER is a professor in the Department of Middle/Secondary Education and Instructional Technologies, University of Central Arkansas, 104D Mashburn Hall, Conway, AR 72035; e-mail: annw{at}uca.edu. Her specialization area is educational foundations, especially philosophy of education. KATHLEEN M. T. COLLINS is an associate professor in the Department of Curriculum & Instruction, University of Arkansas, 310 Peabody Hall, Fayetteville, AR 72701; e-mail: kcollinsknob{at}cs.com. Her specializations are special populations, mixed-methods research, and education of postsecondary students. JANET D. FILER is an assistant professor in the Department of Early Childhood and Special Education, University of Central Arkansas, 136 Mashburn Hall, Conway, AR 72035; e-mail: janetf{at}uca.edu. Her specializations are families, technology, personnel preparation, educational assessment, educational programming, and young children with disabilities and their families. CHERYL D. WIEDMAIER is an assistant professor in the Department of Middle/ Secondary Education and Instructional Technologies, University of Central Arkansas, 104B Mashburn Hall, Conway, AR 72035; e-mail: cherylw{at}uca.edu. Her specializations are distance teaching/learning, instructional technologies, and training/adult education. CHRIS W. MOORE is pursing a master of arts in teaching degree at the Department of Middle/Secondary Education and Instructional Technologies, University of Central Arkansas, Conway, AR 72035; e-mail: chmoor{at}tcworks.net. Special interests focus on integrating 20 years of information technology experience into the K-12 learning environment and sharing with others the benefits of midcareer conversion to the education profession. This manuscript was adapted from Onwuegbuzie and Johnson (2006). Reprinted with kind permission of the Mid-South Educational Research Association and the editors of Research in the Schools. Correspondence should be addressed to Anthony J. Onwuegbuzie, Department of Educational Measurement and Research, College of Education, University of South Florida, 4202 East Fowler Avenue, EDU 162, Tampa, FL 33620-7750; e-mail:tonyonwuegbuzie{at}aol.com.
1 This quantitizing of themes led to the computation of what Onwuegbuzie (2003a) called manifest effect sizes (i.e., effect sizes pertaining to observable content). Manifest effect sizes are effect sizes that pertain to observable content (Onwuegbuzie & Teddlie, 2003).
2 These prevalence rates provided frequency effect size measures (Onwuegbuzie, 2003a). Frequency effect size measures represent the frequency of themes within a sample that can be converted to a percentage (i.e., prevalence rate) (Onwuegbuzie & Teddlie, 2003).
3 It should be noted that tetrachoric correlation coefficients are based on the assumption that for each manifest dichotomous variable, there is a normally distributed latent continuous variable with zero mean and unit variance. For the present investigation, it was assumed that the extent to which each participant contributed to a theme, as indicated by the order in which the significant statements were presented, represented a normally distributed latent continuous variable. Unfortunately, this assumption could not be tested given only the manifest variable (Nelson, Rehm, Bedirhan, Grant, & Chatterji, 1999). However, this assumption was deemed reasonable given the large sample size (i.e., n = 912).
4 As noted by Bernstein and Teng (1989), dichotomous items are less likely to yield artifacts using factor analytic techniques than are multicategory (Likert-type) items. For more justification about conducting exploratory factor analyses on inter-respondent matrices, see Onwuegbuzie (2003a).
5 More specifically, the trace served as a latent effect size for each meta-theme (Onwuegbuzie, 2003a). A latent effect size is an effect size pertaining to nonobservable, underlying aspects of the phenomenon being studied (Onwuegbuzie & Teddlie, 2003).
6 The combined frequency effect size for themes within each meta-theme represented a manifest effect size (Onwuegbuzie, 2003a).
7 This additional meeting also was prompted by one of the anonymous reviewers, who questioned some of the labels given to the themes/meta-themes and asked the researchers to derive themes that were more "insightful." Thus, we graciously thank this anonymous reviewer for providing such an important recommendation.
8 This effect size represents a latent effect size.
9 These effect sizes represent manifest effect sizes. Received for publication October 12, 2004. Revision received July 26, 2006. Accepted for publication August 5, 2006.
American Educational Research Journal, Vol. 44, No. 1,
113-160 (2007) This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



= sequential; + = concurrent.
= .05/9 = .0056) was applied for each demographic variable to control for family-wise error. With respect to gender, females (62.3%) tended to place statistically significantly more weight on student centeredness as a measure of instructional effectiveness than did males (49.4%). The effect size associated with this relationship, as measured by Cramers V, was .12. Furthermore, females were 1.70 times (95% confidence interval [CI] = 1.26, 2.29) more likely than were males to endorse student centeredness. However, gender was not statistically significantly related to any other theme. With respect to race, Caucasian American students (31.6%) were statistically significantly more likely to endorse enthusiastic about teaching as a characteristic of effective instruction than were minority students (19.5%). Cramers V effect size was .09. More specifically, Caucasian American students were 1.61 times (95% CI = 1.12, 2.32) more likely than were minority students to endorse enthusiasm. 






