A Framework for Reflecting on Assessment and Evaluation

(Headliner, Symposium 4: Assessment Strategies)

Glen S. Aikenhead
Curriculum Studies
University of Saskatchewan
Saskatoon, SK, S7N 0X1

Published in

Globalization of Science Education: International Conference on Science Education

(pages 195-199 )

Seoul, Korea

May 26-30, 1997

Organized by the Korean Education Development Institute

Co-Sponsored by ICASE and UNESCO


Prof. Black's (1997) highly informative presentation addressed many significant issues in assessment and evaluation. He explored this wide array of problems and solutions with his feet firmly planted in the pragmatic reality of teachers in less-than-ideal classrooms working within the dictates of national politics and cultures.

My paper arises form the need to put his ideas into one framework so I can better understand how the significant issues are interrelated; for example, how is validity related to "the improvement of learning" as well as to "certification of students" and "the problem of equity among students." The purpose of my paper is, therefore, to clarify the whole domain of assessment and evaluation so that people from different nations may have a common framework with which to clarify and communicate national differences in assessment and evaluation practices.

Two Fundamental Concepts

Science educators often want their students to distinguish between observation and interpretation. To observe is to collect information about events, while to interpret is to make meaning of that information by drawing inferences or by judging the match between the information and some abstract pattern. The distinction between observation and interpretation, although sometimes difficult, leads to more precise critical thinking in science classes.

These two science concepts metaphorically define two concepts we use in education: observing students' work, and interpreting the worth of that work. However, a confusion arises, particularly in English, over what to name each concept. Some educators, by convention, use the same term for both concepts. This would tend to force one to equate the methods of collecting data with the methods of interpreting those data. This can cause problems. Recently, the convention in Canada is to refer to the act of collecting student work as assessment and the act of judging that collection of work as evaluation. Conventions in other English speaking countries may have other definitions for those terms. I notice that Prof. Black consistently uses the term "assessment" throughout his paper, although the term "evaluation" does appear.

The distinction between assessment and evaluation helped me understand Prof. Black's quotation from Messick (1989, p. 13), the quotation that defined validity. The Canadian convention recognizes Messick's definition of validity as simply the adequacy and appropriateness of the evaluation of students. The Canadian convention allowed me to think about another topic not addressed by Messick's definition, the validity of student assessment. Obviously we are concerned about the consistency between assessment techniques and evaluative judgements, but this does not mean that we should necessarily use the same concept of validity for both assessment and evaluation. Similarly, our thinking might be sharpened if we discuss summative versus formative evaluation in the context of judgements made about student work, and not confuse our thinking with summative and formative assessment (monitoring student work).

The distinction between assessment and evaluation also guides our thinking about student participation. Prof. Black discussed the advantages and problems of "pupil self-assessment." These could be clarified by making the distinction between self-assessment and self-evaluation. Almost all students can engage in self-assessment (helping to decide what information should be collected and being largely responsible for collecting it), whereas problems can ensue when students are expected to engage in self-evaluation (as Prof. Black described).

A Framework for Assessment and Evaluation

Because educational assessment and evaluation fall within the domain of social science, we can turn to the social sciences to clarify what we do in education. For instance, Habermas (1971, p. 308) recognized that the social sciences were comprised of three different orientations (or paradigms): the empirical-analytic, the interpretive, and the critical-theoretic. These three paradigms represent the complete domain of assessment and evaluation (Ryan, 1988). Rather than focusing narrowly on a two-category system -- traditional versus alternatives -- Ryan expands our thinking by adding a third category (critical-theoretic) that has direct implications for reform. The three paradigms will serve as a framework for thinking about assessment and evaluation.

Ryan (1988) described Habermas's three orientations in terms of assessment and evaluation paradigms as follows:

  1. Empirical-analytic: Western technical rationalism embodied in logical positivist origins. This amounts to the traditional standardized approach to assessment and evaluation.

2. Interpretive: understanding students' language, concepts, and actions from the point of view of the student. Alternative assessment techniques such as portfolios and concept mapping illustrate this paradigm.

3. Critical-theoretic: the elimination of oppressive human relationships (oppressive is defined in terms of forced assimilation). Two examples would be: assessment rubrics for thoughtful decision making developed collaboratively between teacher and students, and student self-evaluation.

Prof. Black addressed issues in all three paradigms. For example, he discussed (1) the issue of standardized tests, which falls within the empirical-analytic paradigm; (2) the issue of formative assessment, which is clearly within the interpretive paradigm; and (3) the issues of equity and student empowerment, both of which fit primarily within the critical-theoretic paradigm. We must be able to function eclectically in all three paradigms. For the sake of clear thinking, however, we should not lose sight of the paradigm we are in, at any given moment.

An advantage to using a three-paradigm framework is its ability to locate certain assessment and evaluation issues within specific paradigms. When we locate an issue (e.g. validity) within a paradigm, we can connect that issue with other issues (e.g. purposes of education) within that same paradigm. In addition, we can explore the different meanings of the same issue (e.g. validity) across different paradigms, thereby recognizing that some disagreements between educators may arise from the fact that the educators are functioning within different paradigms. By identifying those paradigms we clarify our discussions of assessment and evaluation. Several topics are mentioned below to illustrate the power of the proposed framework.

Purposes of Assessment and Evaluation

When Prof. Black stated that there were "three main purposes of assessment in education" (certification, policy development, and serving teaching and learning), I interpreted him as saying:

Within the empirical-analytic paradigm, it is important to consider student certification and the development of educational policy; while within the interpretive paradigm, it is important to consider what knowledge, skills, and values students are actually learning.

Given all three paradigms to consider, we could include yet another "main purpose of education" for discussion, one within the critical-theoretic paradigm (e.g. to ensure that families of higher social economic status are privileged over families with less status).

Alternatively, purposes of assessment and evaluation could be discussed in terms of (1) the social control of schools (the empirical-analytic paradigm), (2) the improvement of teaching (the interpretive paradigm), and the empowerment of students (the critical-theoretic paradigm).


The three-paradigm framework can draw our attention to issues that may otherwise go missing in our discussion of assessment and evaluation. Perhaps Champagne and Newell (1992) had these three paradigms in mind when they analyzed the assessment of scientific literacy in terms of technical issues, pedagogical issues, and social/political/cultural issues; issues that fall within the empirical-analytic, the interpretive, and the critical-theoretic paradigms, respectively.

Purposes of Science Education

The purpose of education in a nation's schools varies from country to country. However, rational action in any country depends on the consistency between a nation's purpose of education and the paradigm of assessment and evaluation embraced by its education authority. Within the empirical-analytic paradigm, the purpose of science education has been to identify and nurture (through competition) the elite student. The alternative, science education for all students (UNESCO, 1994), belongs to both the interpretive and critical-theoretic paradigms. The idea of reforming a nation's science education to "science for all" requires us to shift paradigms, away from the empirical-analytic paradigm whenever we engage in assessment and evaluation practices. When Prof. Black discussed, for instance, formative assessment or student self-evaluation, he was outside the empirical-analytic paradigm. Therefore, his ideas would be consistent with the other two paradigms, for example, his ideas would enhance a science-for-all curriculum, but would not be useful to a science-for-the-elite curriculum.

Outcome Focus

Different paradigms will direct us to assess different instructional outcomes. The empirical-analytic paradigm focuses on the product of instruction, the student's tangible work.

Prof. Black points out the value gained by also considering how the student produced the product, that is, the process. The interpretive paradigm focuses on both the product and process of instructional outcomes.

There is a third aspect to consider, the one embraced by the critical-theoretic paradigm of assessment and evaluation: context. The cultural or social context in which assessment takes place has a great influence on both the process and product of a student's work. Prof. Black alluded to this issue of context when he discussed the reliability of performance assessments. The critical-theoretic paradigm focuses on product, process, and context of instructional outcomes.


Champagne and Newell (1992) discussed validity in three different ways, each corresponding to a different Habermas paradigm. Within the empirical-analytic paradigm, a psychometric viewpoint was adopted, where validity is based on spreading out student scores in a technical rational defensible way. On the other hand, within the interpretive paradigm, a pedagogical viewpoint was assumed, with a focus on improving teaching and learning. And lastly, within the critical-theoretic paradigm, a political-social viewpoint on validity was held. It can lead to various criteria of validity, depending upon the culture: screen students into winners and losers (e.g. science for the elite), equity of opportunity (e.g. science for all), or social control over schools.

Often, discussions of psychometric validity mask the real concern for social control over schools. Social control relates to the purpose of assessment and evaluation in the empirical-analytic paradigm. In other words, we should recognize that psychometric validity sustains a social-control purpose because they both belong to the same paradigm, and thus we might choose to challenge a narrow discussion of psychometric validity in order to uncover issues of purpose.

Messick (1989) defined validity (quoted by Prof. Black) in a way that corresponds to the empirical-analytic paradigm. Having identified the paradigm in which Messick's definition functions, we need a definition applicable to the other paradigms. A definition within the interpretive paradigm was presented by Aikenhead and Ryan (1992, p. 487), a definition that distills to trustworthiness -- the trust that one educator or researcher has for the work of another. Eclectically, we need to draw upon both paradigms' definitions, as we shift paradigms to meet the needs of the particular moment.

Scientific Literacy

The assessment and evaluation of scientific literacy has traditionally been conducted well within the empirical-analytic paradigm. Student responses are either right or wrong, with little interpretation and little consideration for context. Scores are standardized against such norms as statistical distributions or judgments by panels of experts. Assessment tends to be confounded with evaluation, thereby merging the two concepts into one.

An alternative type of instrument that operates within the interpretive paradigm was developed by Aikenhead and Ryan (1992). By collaborating with students, the researchers developed an empirically based, multiple-choice, assessment instrument. The empirical basis consisted of the students' work itself. As a consequence, students are generally able to express their personal and reasoned viewpoints in their own language when they respond to this instrument.

In some environmental education programs, teachers monitor the degree to which students' take social action on an issue. This activist orientation technique belongs to the critical-theoretic paradigm.


As Champagne and Newell (1992) point out, certain cultures demand that assessment be simplistic, competitive, and unidimensional in order to distinguish winners from losers. Tests in those cultures are designed "on the assumption that knowledge can be represented by an accumulation of bits of information and that there is one right answer" (p. 846). On the other hand, "alternative assessment is based on the assumption that knowledge is actively constructed by the child and varies from one context to another" (p. 847). We can now identify these two positions as exemplifying the empirical-analytic and interpretive paradigms, respectively. Moreover, using the three-paradigm framework, we can now ask ourselves the question, "What does the critical-theoretic paradigm say about knowledge?" This question leads to other issues such as: whose knowledge is privileged in the assessment? whose social interactions have cultural capital? whose goals define the criteria for evaluation and how are these goals established? These issues are discussed by O'Loughlin (1992).


Aikenhead, G.S., & Ryan, A.G. (1992). The development of a new instrument: "Views on Science-Technology-Society" (VOSTS). Science Education, 76, 477-491.

Black, P.J. (1997, May). Assessment in the service of science education. A paper presented at the International Conference on Science Education: "Globalization of Science Education." Seoul, Korea.

Champagne, A.B., & Newell, S.T. (1992). Directions for research and development: Alternative methods of assessing scientific literacy. Journal of Research in Science Teaching, 29, 841-869.

Habermas, J. (1971). Knowledge and human interest. Boston: Beacon.

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd edition) (pp. 12-103). London: Collier Macmillan.

O'Loughlin, M. (1992). Rethinking science education: Beyond Piagetian constructivism toward a sociocultural model of teaching and learning.

Ryan, A.G. (1988). Program evaluation within the paradigm: Mapping the territory. Knowledge: Creation, Diffusion, Utilization, 10(1), 25-47.

UNESCO. (1994). The project 2000+ declaration: The way forward. Paris: UNESCO.