Assessment Tools

views updated

ASSESSMENT TOOLS

psychometric and statistical
mark wilson

technology based
sean brophy

PSYCHOMETRIC AND STATISTICAL

The place of psychometric and statistical tools in assessment must be understood in terms of their use within a process of evidence gathering and interpretation. To see this, consider the assessment triangle featured in a recent National Research Council Report and shown in Figure 1. The central problem in assessment is making inferences about cognition from limited observations. Psychometric and statistical tools are located at the interpretation vertex of the triangle, where their role is to negotiate between the other two vertices. To appreciate the way these tools work, one must first understand the other two vertices. Hence, each vertex is described below. Robert Mislevy and Mark Wilson describe similar and more elaborated approaches. The triangle should be seen as a process that is repeated multiple times in both assessment development and in the application of assessment in education.

The Assessment Triangle

Ideally, all assessment design should begin with a theory of student cognition, or set of beliefs about how students represent knowledge and develop competence in a particular area. This theory should be the basis for a construct: the cognitive characteristic to be measured. Constructs are based on what the "assessment designers" know from foundational research in psychology and instruction and on experts' judgments about how students develop in a particular area. The construct distinguishes expert performance from other levels of performance in the area and considers those levels from the perspective of a developmental process. Furthermore, educational outcomes are not as straightforward as measuring

FIGURE 1

height or weight–the attributes to be measured are often mental activities that are not directly observable.

Having delineated the nature of the construct, one then has to determine what kinds of observations of behavior, products, and actions can provide evidence for making inferences concerning the construct, at the same time avoiding data that hold little value as evidence for the construct. In the classroom context, observations of learning activity are the relevant things that learners say and do (such as their words, actions, gestures, products, and performances) that can be observed and recorded. Teachers make assessments of student learning based on a wide range of student activity, ranging from observation and discussion in the classroom, written work done at home or in class, quizzes, final exams, and so forth. In large-scale assessment, standardized assessment tasks are designed to elicit evidence of student learning. These may range across a similar variation of types of performances as classroom-based assessments, but are often drawn from a much narrower range.

At the interpretation vertex is located the chain of reasoning from the observations to the construct. In classroom assessment, the teacher usually interprets student activity using an intuitive or qualitative model of reasoning, comparing what she sees with what she would expect competent performance to look like. In large-scale assessment, given a mass of complex data with little background information about the students' ongoing learning activities to aid in interpretation, the interpretation model is usually a psychometric or statistical model, which amounts to a characterization or summarization of patterns that one would expect to see in the data at different levels of competence.

Standard Psychometric Models

Probably the most common form that the cognition vertex takes in educational assessment is that of a continuous latent variable–cognition is seen as being describable as a single progression from less to more or lower to higher. This is the basis for all three of the dominant current approaches to educational measurement: Classical test theory (CTT), which was thoroughly summarized by Frederick Lord and Melvin Novick; generalizability theory (GT), which was surveyed by Robert Brennan, and item response theory (IRT), which was surveyed in the volume by Wim van der Linden and Ronald Hambleton. In CTT, the continuous variable is termed the true score, and is thought of as the long-run mean of the observed total score. In GT, the effects of various measurement design factors on this variable are analyzed using an approach that analyzes the variance. In IRT, the focus shifts to modeling individual item responses: The probability of an item response is seen as a function of the underlying latent variable representing student competence (often denoted by [.theta] ), and parameters representing the item and the measurement context.

The most fundamental item parameter is the item difficulty, but others are also used where they are thought to be useful to represent the characteristics of the assessment situation. The difficulty parameter can usually be seen as relating the item responses to the construct (i.e, the [.theta] variable). Other parameters may relate to characteristics of the observations. For example, differing item slopes can be interpreted as indicating when item responses are also possibly dependent on other (unmodeled) dimensions besides the [.theta] variable (these are sometimes called discrimination parameters, although that can lead to confusion with the classical discrimination index). Parameters for lower-item and upper-item asymptotes can be interpreted as indicating where item responses have "floor" and "ceiling" rates (where the lower asymptote is often called a guessing parameter). There is a debate in the literature about whether to include parameters beyond the basic difficulty parameters in the response model in order to make the model flexible, or whether to see them as indicating deviations from a regularity condition (specific objectivity) that sees them as threatening the interpretability of the results.

A second possible form for the construct is as a set of discrete classes, ordered or unordered depending on the theory. The equivalent psychometric models are termed latent class (LC) models, because they attempt to classify the students on the basis of their responses as in the work of Edward Haertel. In these models, the form of cognition, such as problem-solving strategy, is thought of as being only pos sible within certain classes. An example might be strategy usage, where a latent class approach would be seen as useful when students could be adequately described using only a certain number of different classes. These classes could be ordered by some criterion, say, cognitive sophistication, or they could have more complex relations to one another.

There are other complexities of the assessment context that can be added to these models. First, the construct can be seen as being composed of more than a single attribute. In the continuous construct approach, this possibility is generally termed the factor analysis model when a classical approach is taken, and a multidimensional item response model (MIRM) when starting from the continuum approach as in the work by Mark Reckase and Raymond Adams and his colleagues. In contrast to the account above, where parameters were added to the models of the item to make it more complex, here the model of the student is what is being enhanced. These models allow one to incorporate evidence about different constructs into the assessment situation.

There are other ways that complexities of the as sessment situation can be built into the measurement models. For example, authors such as Susan Embretson, Bengt Muthen, and Khoo Siek-Toon have shown how repeated assessments over time can be seen as indicators of a new construct: a construct related to patterns of change in the original con struct. In another type of example, authors such as Gerhard Fischer have added linear effect parameters, similar to those available in GT, to model observa tional effects such as rater characteristics and item design factors, and also to model complexities of the construct (e.g., components of the construct that in fluence item difficulty, such as classes of cognitive strategies).

Incorporating Cognitive Elements in Standard Psychometric Models

An approach called developmental assessment has been developed, by Geoffrey Masters and colleagues, building on the seminal work of Benjamin Wright and using the Rasch model, to enhance the interpretability of measures by displaying the variable graphically as a progress map or construct map. Mark Wilson and Kathryn Sloane have discussed an example shown in Figure 2, where the levels, called Criterion Zones in the figure, are defined in Figure 3. The idea is that many important features of assessments can be displayed in a technically accurate way by using the strength of the idea of a map to convey complicated measurement techniques and ideas. For example, one central idea is that the meaning of the construct can be conveyed by examining the order of item locations along the map, and the same technique has been used by Wilson as the basis for gathering validity evidence. In Figure 2, one can see how an individual student's assessments over time can be displayed in a meaningful way in terms of the Criterion Zones. The same approach can be adapted for reporting group results of assessments, and even large national surveys (e.g., that of Australia's Department of Employment, Education and Youth Affairs in 1997).

One can also examine the patterns of results of individual students to help diagnose individual differences. An example from the Grade Map software developed by Wilson and his colleagues is shown in Figure 4. Here, an overall index of "fit" was used to flag the responses of subject Amy Brown that needed extra attention. In the figure, the expected result for each item for Amy Brown is shown using the gray band across the middle, while the observed results are shown by the black shading. Clearly Amy has responded in surprising ways to several items, and a content analysis of those items may prove interesting. An analogous technique has been developed by Kikumi Tatsuoka (1990, 1995) with the advantage of focusing attention on specific cognitive diagnoses.

Adding Cognitive Structure to Psychometric Models

One can go a step further than the previous strategy of incorporating interpretative techniques into the assessment reporting–elements of the construct can be directly represented as parameters of the psycho-metric model. From a statistical point of view, this would most often be the preferred tactic, but in

FIGURE 2

practice, it may add to the complexity of interpretation, so the merits should be considered for each application. A relatively straightforward example of this is the incorporation of differential item functioning (DIF) parameters into the psychometric model. Such parameters adjust other parameters (usually item difficulty parameters) for different effects between (known) groups of respondents. Most often it has been seen as an item flaw, needing to be corrected. But in this context, such parameters could be used to allow for different construct effects, such as using different solution strategies or linguistic differences.

Another general strategy is the delineation of hierarchical classes of observation that group together the original observations to make them more interpretable. This can be seen as acting on either the student or the item aspects of the psychometric model. This could be seen as a way to split up the students into latent groups for diagnostic purposes as in the work of Edward Haertel and David Wiley. Or it could be seen as a way to split up the items into classes, allowing interpretation of student results at the level of, say, classes of skills rather than at the individual item level, as in the work of Rianne Janssen and her colleagues. Wilson has combined the continuum and latent class approaches, thus allowing constructs that are partly continuous and partly discontinuous. For example, the Saltus Model is designed to incorporate stage-like developmental changes along with more standard incremental increases in skill, as illustrated in the work of Mislevy and Wilson.

Generalized Approaches to Psychometric Modeling of Cognitive Structures

Several generalist approaches have been proposed. One is the Unified Model, developed by Louis de Bello and his colleagues, which is based on the assumption that task analyses can classify students' performances into distinct latent classes. A second general approach, Peter Pirolli and Mark Wilson's M2RCML, has been put forward and is based on a distinction between knowledge level learning, as manifested by variations in solution strategies, and symbol-level learning, as manifested by variations in the success of application of those strategies. In work by Karen Draney and her colleagues, this approach has been applied to data related to both learning on a Lisp tutor and a rule assessment analysis of reasoning involving the balance scale.

A very general approach to modeling such structures called Bayes Nets has been developed by statisticians working in other fields. Two kinds of variables appear in a Bayes Net for educational assessment: those that concern aspects of students' knowledge and skill, and others that concern aspects of the things they say, do, or make. All the psycho-metric models discussed in this entry reflect this kind of reasoning, and all can be expressed as particular implementations of Bayes Nets. The models described above each evolved in their own special niches; researchers in each gain experience in use of the model, write computer programs, and develop a catalog of exemplars. Bayes Nets have been used as the statistical model underlying such complex assessment contexts as intelligent tutoring systems as in the example by Mislevy and Drew Gitomer.

Appraisal of Psychometric Models and Future Directions

The psychometric models discussed above provide explicit, formal rules for integrating the many pieces of information that may be relevant to specific inferences drawn from observation of assessment tasks. Certain kinds of assessment applications require the capabilities of formal statistical models for the interpretation element of the assessment triangle. The psychometric models available in the early twentyfirst century can support many of the kinds of inferences that curriculum theory and cognitive science suggest are important to pursue. In particular, it is possible to characterize students in terms of multiple aspects of proficiency, rather than a single score; chart students' progress over time, instead of simply measuring performance at a particular point in time; deal with multiple paths or alternative methods of valued performance; model, monitor, and improve judgments based on informed evaluations; and model performance not only at the level of students, but also at the levels of groups, classes, schools, and states.

Unfortunately, many of the newer models and methods are not widely used because they are not easily understood or are not packaged in accessible ways for those without a strong technical background. Much hard work remains to focus psycho-metric model building on the critical features of models of cognition and learning and on observations that reveal meaningful cognitive processes in a particular domain. If anything, the task has become more difficult because an additional step is now required–determining simultaneously the inferences that must be drawn, the observations needed, the tasks that will provide them, and the statistical models that will express the necessary patterns most efficiently. Therefore, having a broad array of models available does not mean that the measurement model problem is solved. More work is needed on relating the characteristics of measurement models to the specifics of theoretical constructs and types of observations. The longstanding tradition of leaving scientists, educators, task designers, and psychometricians each to their own realms represents perhaps the most serious barrier to the necessary progress.

FIGURE 3

See also: Assessment, subentries on Classroom Assessment, National Assessment of Educational Progress; Testing, subentry on Standardized Tests and High-Stakes Assessment.

bibliography

Adams, Raymond J.; Wilson, Mark; and Wang, Wen-Chung. 1997. "The Multidimensional Random Coefficient Multinomial Logit Model." Applied Psychological Measurement 21 (1):1–23.

Brennan, Robert L. 2002. Generalizability Theory. New York: Springer-Verlag.

Bryk, Anthony S., and Raudenbush, Stephen. 1992. Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park:Sage.

Department of Employment, Education, and Youth Affairs (Deetya ). 1997. National School English Literacy Survey. Canberra, Australia: Department of Employment, Education, and Youth Affairs.

DiBello, Louis V.; Stout, William F.; and Roussos, Louis A. 1995. "Unified Cognitive/Psychometric Diagnostic Assessment Likeli-hood-Based Classification Techniques." In Cognitively Diagnostic Assessment, ed. Paul D. Nichols, Susan F. Chipman, and Robert L. Brennan. Hillsdale, NJ: Erlbaum.

Draney, Karen L.; Pirolli, Peter; and Wilson, Mark. 1995. "A Measurement Model for a Complex Cognitive Skill." In Cognitively Diagnostic Assessment, ed. Paul D. Nichols, Susan F. Chipman, and Robert L. Brennan. Hillsdale, NJ: Erlbaum.

FIGURE 4

Embretson, Susan E. 1996. "Multicomponent Response Models." In Handbook of Modern Item Response Theory, ed. Win J. van der Linden and Ronald K. Hambleton. New York: Springer.

Fischer, Gerhard. 1977. "Linear Logistic Test Models: Theory and Application." In Structural Models of Thinking and Learning, ed. Hans Spada and Willem Kempf. Bern, Germany: Huber.

Haertel, Edward H. 1990. "Continuous and Discrete Latent Structure Models for Item Response Data." Psychometrika 55:477–494.

Haertel, Edward H., and Wiley, David E. 1993. "Representations of Ability Structures: Implications for Testing." In Test Theory for a New Generation of Tests, ed. Norman Frederiksen, Robert J. Mislevy, and Isaac I. Bejar. Hillsdale, NJ: Erlbaum.

Hambleton, Ronald K.; Swaminathan, Harihan; and Rogers, H. Jane. 1991. Fundamentalsof Item Response Theory. Newbury Park: Sage.

Janssen, Rianne; Tuerlinckx, Frances; Meulders, Michel; and De Boeck, Paul. 2000. "An Hierarchical IRT Model for Mastery Classification." Journal of Educational and Behavioral Statistics 25 (3):285–306.

Lord, Frederick M., and Novick, Melvin R. 1968. Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley.

Mislevy, Robert J. 1996. "Test Theory Reconceived." Journal of Educational Measurement 33 (4):379–416.

Mislevy, Robert J., and Gitomer, Drew H. 1996. "The Role of Probability-Based Inference in an Intelligent Tutoring System." User Modeling and User-Adapted Interaction 5:253–282.

Mislevy, Robert J., and Wilson, Mark. 1996. "Marginal Maximum Likelihood Estimation for a Psychometric Model of Discontinuous Development." Psychometrika 61:41–71.

Muthen, Bengt O., and Khoo, Siek-Toon. 1998. "Longitudinal Studies of Achievement Growth Using Latent Variable Modeling." Learning and Individual Differences 10:73–101.

National Research Council. 2001. Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: National Academy Press.

Pirolli, Peter, and Wilson, Mark. 1998. "A Theory of the Measurement of Knowledge Content, Access, and Learning." Psychological Review 105 (1):58–82.

Reckase, Mark D. 1972. "Development and Application of a Multivariate Logistic Latent Trait Model." Ph.D. diss., Syracuse University, Syracuse, NY.

Tatsuoka, Kikumi K. 1995. "Architecture of Knowledge Structures and Cognitive Diagnosis: A Statistical Pattern Recognition and Classification Approach." In Cognitively Diagnostic Assessment, ed. Paul D. Nichols, Susan F. Chipman, and Robert L. Brennan. Hillsdale, NJ: Erlbaum.

Van der Linden, Wim J., and Hambleton, Ronald K., eds. 1996. Handbook of Modern Item Response Theory. New York: Springer.

Wilson, Mark. 2002. Measurement: A Constructive Approach. Berkeley: BEAR Center, University of California, Berkeley.

Wilson, Mark; Draney, Karen; and Kennedy, Cathleen. 2001. Grade Map. Berkeley: BEAR Center, University of California, Berkeley.

Wilson, Mark, and Sloane, Kathryn. 2000. "From Principles to Practice: An Embedded Assessment System." Applied Measurement in Education 13 (2):181–208.

Wright, Benjamin D. 1977. "Solving Measurement Problems with the Rasch Model." Journal of Educational Measurement 14:97–116.

Mark Wilson

TECHNOLOGY BASED

Assessment methods can be learning opportunities for students, though identifying methods to accomplish this can be challenging. Some new instructional methods may target learning outcomes that traditional assessment methods fail to measure. Auto-mated assessment methods, such as multiple-choice and short-answer questions, work well for testing the retrieval of facts, the manipulation of rote procedures, solving multiple-step problems, and processing textual information. With carefully designed multiple-test items it is possible to have students demonstrate their ability to perform causal reasoning and solve multi-step problems. However, students' participation in these kinds of traditional assessment activities do not necessarily help them "learn" and develop complex skills.

What students and teachers need are multiple opportunities to apply new information to complex situations and receive feedback on how well they are progressing toward developing the ability to synthesize and communicate ideas and to systematically approach and solve problems. Technologies are emerging that can assess students' ability to gather, synthesize, and communicate information in a way that helps improve their understanding and that informs teachers how they can improve their instruction. Several instructional techniques and technologies have been designed to help students develop complex skills and help teachers develop students' causal reasoning, diagnose problem-solving abilities, and facilitate writing.

Developing Causal Reasoning

Volumes of information can be shared efficiently by using a graphical image. The potential for students to learn from integrating ideas into graphical information can be very powerful and can demonstrate what they know and understand. One example of how information is portrayed in graphical form is a common street map. Expert mapmakers communicate a wealth of information about the complicated network of roads and transportation routes that connect the various locations in a city. Mapmakers use colored lines to indicate which roads have the higher speed limits, such as expressways and highways. Therefore, people can use the map as a tool to make decisions about a trip by using a map to plan out the fastest route, rather than taking the shortest distance–which may include city roads that have a lower speed limit or are potentially congested. A road map efficiently illustrates the relative location of one place and its relation to another, as well as information about the roads that connect these locations.

Learners can share what they know about a complex topic by creating a concept map, which illustrates the major factors of a topic and descriptive links detailing the relationships between these factors. For example, a river is a complex ecosystem containing a variety of elements, such as fish, macroinvertebrates, plants, bacteria, and oxygen, which are highly dependent on one another. Changes in any of these elements can have a ripple effect that is difficult to determine without some method to represent the links between the elements. Scientists often create a concept map of a system to help keep track of these interdependencies in a complex system. A concept map such as Figure 1 can help students notice when their intuitions are not enough to make sense of a complex situation. Therefore, creating a concept map can be a very authentic activity for students to do as they explore the intricacies of science.

Concept-mapping activities also provide an excellent opportunity for assessment. Students can demonstrate their current conceptions of a system at various stages of their inquiry. A simple method to evaluate concept maps is to compare what the learners create relative to what an expert creates. A point can be given for each relevant element and links identified by the students. A second point can be added for correctly labeling the link (e.g., produce as in "plants produce oxygen"). A common observation is that as students begin exploring a topic area like ecosystems, their concept maps contain only a few elements, many of which are irrelevant, and they use very few links or labels of links. If they include

FIGURE 1

links, they are unable to describe what the links are, though they know there is some dependency between factors. As students begin to investigate more about a system and how it works, they are often able to redraw their maps to include the relevant element links and labels to illustrate the interdependence of the elements of a system. However, grading these maps multiple times can be very time-consuming for a teacher.

New computer software has been developed to provide students with a simple interface to create hierarchical concept maps. The software can also score students' performance on these concept maps, depending on the goals of the instruction. For example, students' concept maps can be compared with those of experts for completeness and accuracy. An expert's model of a system would include a complete list of factors and named links. Comparing a student's map to an expert's map provides a method of identifying if students know what factors are relevant, as well as the relationships between factors.

Causal maps are similar to concept maps and include information about how one factor influences another factor. For example, the relationship "fish consume plants" in Figure 1 could also be expressed as "fish decrease plants" and "plants increase oxygen." The visual representation–with qualitative information about the relationship between factors–gives students an illustration they can use to make predictions about what will happen to the system when one of the factors changes. They can use the causal map as a tool to answer the question "What happens if we add a larger number of fish into a river?" Then, students can follow the increase and decrease links to derive a hypothesis of what they think might happen to the other factors.

Causal maps also provide a method to use technology as both an instructional tool and an assessment tool to measure students' understanding of a complex system. A research group at the Learning Sciences Institute at Vanderbilt University (formerly the Learning Technology Center) has created a computer system, called teachable agents, that provides students with a method to articulate what they know and to test their ideas by helping a virtual agent use their knowledge to answer questions. Students teach this agent how a particular system works by creating a causal map of a system, which becomes the agent's representation of its knowledge. Testing how well the agent has learned is accomplished by asking the agent questions about the relationships in the system. The agent reasons through questions using the causal map the students have provided them.

As the agent reasons through a question, the factors and links are highlighted to illustrate what information it is using to make a decision on how to answer the question. If the causal map is incomplete or has contradictory information, then the computer agent will explain that it doesn't know, or is confused about what to do next. The feedback from watching the computer agent "think" about the problem can help students identify what knowledge is missing or incorrectly illustrated in their causal map. Therefore, students learn by having to debug how their agent thinks about the world. This kind of computer tool tests a student's understanding of the processes associated with a system and provides an automatic method for self-assessment.

Diagnosing Problem-Solving Abilities

Problem solving is a process that incorporates a wide range of knowledge and skills, including identifying problems, defining the source of a problem, and exploring potential solutions to a problem. Many challenging problems, such as designing a house, creating a business plan, diagnosing a disease, troubleshooting a circuit, or analyzing how something works, involve a range of activities. Such situations require the ability to make decisions using available information–and the inquiry process necessary to locate new information. This process can also include making reasonable assumptions that help constrain a problem, thus making it easier to identify a potential solution. Novices often don't have enough background knowledge to make these decisions, relying instead on a trial-and-error method to search for solutions. If a teacher could watch each student solve problems and ask questions about why they made certain decisions, the teacher could learn more about what the students understand and monitor their progress toward developing good problem-solving skills.

The IMMEX system, created by Ron Stevens at UCLA, is a web-based problem-solving simulation environment that tracks many of the decisions a person makes while attempting to solve a problem. Stevens initially created IMMEX to help young immunologists practice their clinical skills. These interns are given a case study detailing a patient's symptoms, and they must make a range of decisions to efficiently and conclusively decide what is wrong with the patient. They must choose from a range of resources–including lab tests, experts' comments, and patient's answers to questions–to help gather evidence to support a specific diagnosis. Each decision that is made can have a cost associated with it in terms of both time and money. The young internists must use their current medical knowledge to make good decisions about what resources to use and when to use them. The IMMEX system tracks these decisions and reports them in the form of a node and link graph (visually similar to a concept map) that indicates the order in which the resources were accessed. In addition, a neural network can compare the decision path the intern makes with the decision path of an expert doctor to identify where the interns are making bad decisions. Students can use these traces to help them evaluate the strategies they use to solve a problem and learn about more optimal strategies. Also, an instructor can use these decision traces to evaluate common errors made by the students. The result is a system that provides students with the opportunity to solve complex problems and receive automated feedback they can use to improve their performance, while professors can use it to refine their instruction to better meet the needs of the students. IMMEX now has programs created for K–12 education.

Facilitating Writing

Writing is a fundamental skill that requires careful use of language to communicate ideas. Learning to write well takes practice and feedback on content, form, style, grammar, and spelling. Essay and report writing are therefore critical assessment tools used to capture students' ability to bring together ideas related to a course of study. However, a teacher can only provide a limited amount of feedback on each draft of a student's essay. Therefore, the teacher's feedback may consist of short comments in the margin, punctuation and grammar correction, or a brief note at the end summarizing what content is missing or what ideas are still unclear. Realistically, a teacher can only give this feedback on a single draft before students hand in a final version of their essays. Most word processors can help students check their spelling and some mechanical grammar errors, which can help reduce the load on the teacher. What students need is a method for reflecting on the content they've written.

Latent semantic analysis (LSA) has great potential for assisting students in evaluating the content of their essay. LSA can correlate the content of a student's essay with the content of experts' writings (from textbooks and other authoritative sources). The program uses a statistical technique to evaluate the language experts use to communicate ideas in their published writings on a specific topic area. Students' essays are evaluated with the same statistical technique. LSA can compare each student's writing with the writing of experts and create a report indicating how well the paper correlates in content on a scale from 1 to 5. The numerical output does not give students specific feedback on what content needs to change, but it helps them identify when more work needs to be done. Students can rewrite and submit their papers to the LSA system as many times as necessary to improve the ranking. The result should be that students' final essays have a much higher quality of content when they hand them in to the teacher. In addition, the students must take on a larger role in evaluating their own work before handing in the final project, allowing the teacher to spend more time evaluating the content, creativity, and synthesis of ideas.

Summary

Assessment of abilities such as problem solving, written communication, and reasoning can be a difficult and time-consuming task for teachers. Performance assessment methods such as class projects and presentations are important final assessments of students' ability to demonstrate what research they have done, as well as their ability to synthesize and communicate their ideas. Unfortunately, teachers often do not have enough time to give students multiple opportunities to engage in these kinds of activities, or to give them sufficient feedback before they perform these final demonstrations of what they have learned. Systems like teachable agents, IMMEX, and LSA provide a method for students to test what they know in a very authentic way as they progress toward their final objectives. These technologies provide a level of feedback that requires the students to reflect on their performance and define their own learning goals to increase their performance. In addition, teachers can use an aggregate of this feedback to evaluate where a class may need assistance. Technology can provide assessment methods that inform students on where they need assistance and that require the learners to define their own learning outcomes.

See also: Assessment, subentries on Classroom Assessment, Dynamic Assessment; Assessment Tools, subentry on Psychometric and Statistical.

bibliography

Biswas, Gautam; Schwartz, Daniel L.; Bransford, John D.; and Teachable Agents Group at Vanderbilt. 2001. "Technology Support for Complex Problem Solving: From SAD Environments to AI." In Smart Machines in Education: The Coming Revolution in Educational Technology, ed. Kenneth D. Forbus and Paul J. Feltovich. Menlo Park, CA: AAAI Press.

Landauer, Thomas K., and Dumais, Susan T. 1997. "A Solution to Plato's Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge." Psychological Review 104:211–240.

internet resources

Chen, Eva J.; Chung, Gregory K. W. K.; Klein, Davina C.; DE Vries, Linda F.; and Burnam, Bruce. 2001. How Teachers Use IMMEX in the Classroom. Report from National Center for Research on Evaluation Standards and Student Testing. < www.immex.ucla.edu/TopMenu/WhatsNew/EvaluationForTeachers.pdf>.

Colorado University, Boulder. 2001. Latent Semantic Analysis at Colorado University, Boulder. < http://lsa.colorado.edu>.

Immex. 2001. < www.immex.ucla.edu>.

Teachable Agents Group at Vanderbilt. 2001. < www.vuse.vanderbilt.edu/~vxx>.

Sean Brophy

Encyclopedia of Education