The gathering of information by educators on students’ learning outcomes has been going on for several years. Some of the terms that have been used to describe this process are observing, examining, testing, quizzing, measuring, evaluating, appraising and assessing, but the primary goal of determining the educational achievement of students has remained constant. According to Popham (1999), assessing the students’ learning goals of professional development is a more complex process than most people anticipate. This is because it entails more than simply documenting students’ current learning status. Changes in students are the components of most development goals and more specifically, they involve improvements or positive changes. This shows that relevant information must be gathered and at appropriate points in time. To determine whether the students are improving, it may be necessary to assess them at the point of entry and then at a later point. Comparisons with other students may also be necessary to isolate the effects of the professional development program (Johnson & Johnson, 1994). Without knowing the students’ position at the beginning or without comparing them with others in the same level, it might be difficult to determine if any improvement or change has actually occurred. Gathering information at a later point is also important to measure retention and long-term learning (Popham, 1999).

The intended students’ learning goals of a program are usually the basis of determining the procedures to be used in collecting evaluation information. The outcomes to be measured can either be cognitive, affective or psychomotor (Popham, 1999). In any educational program, the procedures and instruments used to assess the program’s effects are central. Teachers have several ways of measuring students’ learning and the choice of an assessment procedure will depend on the stated objectives. This paper will critically evaluate the standardized tests and other alternative assessment programs.

Importance of Assessment

According to Linn & Gronlund (2000), measurement refers to assigning numbers to certain characters of people, objects or events according to a rule-governed system. In a classroom context, the rules which are used in assigning the numbers will normally create a ranking that shows how much of the attribute different students possess. They also defined evaluation as the making judgments about the worth or value of a set of measures using a rule-governed system. It is important to assess students’ learning to provide all involved parties with a clear summary of how the student has managed to meet the teacher’s goals. Assessment of students is also important in order to monitor their progress. Teachers need to know whether their students synthesize their instruction and their understanding of all the material covered over time. This enables the teacher to make arrangements for students whose understanding is slower or faulty, such as remedial instruction (Linn & Gronlund, 2000).

Discovering that a student has any difficulties in understanding, and cannot learn at the same pace like the rest of the students, it allows the teachers to decide on the appropriate and timely course of action. Other positive effects on various aspects of learning and instruction also arise from assessing a student’s performance. According to Brookhart (2000), classroom assessment directs students on what is important to learn, influences their motivation and understanding of competence.  Assessment also structures students’ approaches to personal study and fosters the development of improved learning strategies and skills. It is one of the most potent forces influencing learning.

Standardized Tests

Goals and Strengths

Commercially available standardized tests are commonly used by many programs to measure academic achievement of the students. Standardized tests are administered and scored in a standard or consistent manner. They are composed of a set of open-ended or constructed responses items meant to measure higher degree of cognitive skills (William, 2006). The manner of scoring is usually predetermined and the procedures, conditions of administration and the interpretations are consistent and standard. This consistency in administration and scoring allows more reliable comparison of results across test takers. The use of standardized exams in the US started in the 20th century after the Second World War. This was contributed by the need to standardize the highly decentralized education system.

The design of standardized tests is to provide the best match possible to what is viewed as the typical curriculum at a specific grade level. They provide quantifiable information (scores, proficiency levels, etc.) and outcomes that can be used in screening programs, for example, in identifying students whom may require additional assessment (Silbert & Hintze, 2005). Using standardized tests is also advantageous because they provide information on student’s areas of strength and weakness. Standardized tests also allow a comparison between a student and his peers in the same grade or age, and hence, assess development. These tests can also be used to assess a student’s progress over time, for instance, by re-administering a test after an intervention or a remedial program. The results of these tests can also be used to generalize a student’s skills. The results of one test can also be used to measure whether a student is improving uniformly by comparing one subject results with results of a different subject (Silbert & Hintze, 2005).


Many people consider that standardized tests are important because they help to measure students using a consistent process, teachers are held more accountable and it becomes easier to understand where problems occur.  However, there has been criticism on the social and cultural repercussions of standardized tests. Becker (2001) argues that since these tests are designed by people in a position of power, it is possible for cultural bias against the ”have nots” to arise.

Since high performing schools are rewarded, while poorly performing schools are sanctioned by the current system, critics propose that standardized tests reward those at an advantage while the disadvantaged continue to hurt. This system is viewed to be exacerbating the race and class divide in the society through the education system (Burns, Dean & Klar, 2004).  Another concern is the increasing pressure on teachers to produce high test results. This comes about since teachers will teach for the test purposes instead of exploring approaches that may not produce results on paper.

Research on students’ achievement has highlighted a problem associated with over-reliance on standardized tests. Such tests are now administered at every grade level and success or failure of programs defined in terms of test scores. Teachers’ and administrators’ salaries and their job securities are also linked to students’ performance in the standardized tests. The main areas of criticism are the content of assessment, formatting of items and item bias (Fuchs et al, 1991).

Standardized tests generally often rely on multiple-choice questions. This item format provides for greater coverage of content and objective, as well as efficient scoring. However, the item of interest by the format is the identification of the right answer. This type of response does not necessarily correspond to the type of responses regularly exhibited by students in the classroom, for example, the acquisition and synthesis of information (Deno, 2003). If the students are not familiar to the structure within which they are required to respond by the item format, then their test performance may be affected. In another scenario, a student may identify the correct form when it appears as a discrete item in a test format, but use the form incorrectly in communication contexts. In this case, the results of a standardized test may make a student appear more proficient than performance would show (Shapiro, 2004).

Inclusion of items that are biased against some kinds of students has also been a cause for criticism. This includes ethnic minorities, limited English proficient, rural or inner city students. This criticism is based on the fact that the items reflect the culture, language, and/or the style of learning of the middle class majority (Shapiro, 2004). Test companies have endeavored to remove culture based items in tests, but this omission of questions from a meaningful context has been challenging for minority students.

There are arguments that this method measures only superficial knowledge or learning. This is because students can easily cram what they think will appear in the test and fail to give other areas much attention. Standardized tests may also fail to match specific objectives and goals of a program or institution (Klecker, 2000). This therefore makes them more unlikely to provide the most appropriate way to evaluate the program.

Criterion referenced data is considered more useful than norm-referenced data, the type produced by standardized tests. This arises from the premises that norm-referenced data do not clearly show the progress of a student over time. Criterion based data also allow easy administration of pre and post tests to measure development, while a standardized test may be cost prohibitive to administer such (Becker, 2001). Norm data may use norms rather than true national sample, thus making it unfair to those taking it. Another challenge is that it may prove difficult to isolate what changes are required since it is more summative than formative. This makes it difficult to identify areas of weaknesses and develop means to assist students with difficulties. In addition, receiving the results on time is also a challenge (Brookhart, 2000).

Critics argue that the measures used in standardized tests fail to inform instruction adequately. For some students with disabilities, the standardized administrations may not be possible. Some accommodations may need to be done to allow the disabled students to take some test in the established standardized way (Becker, 2000). However, these accommodations can become modifications to then trait under measurement. The items used in standardized tests are also frequently unrelated to the behaviors and tasks required in a classroom setting.

Recommendations for Improvement

Non-cognitive factors such as fatigue, attention, anxiety can influence the test results taken at one point in time. These results will therefore be a reflection of the students’ ability or behavior at that point in time. The results of standardized tests also fail to provide the necessary information required in restructuring curricular or instructional change (Gay & Airasian, 1999). These procedures also prevent the examiner from determining conditions under which the students’ performance may improve. Students with a language deficit can be used better to explain the shortcomings of this system. Assessing the performance of these students on their language proficiency will be unfair since their level of performance may be higher if instruction is delivered using a language they are proficient in (Popham, 1999).

The goal of education is to produce morally developed citizens who fit well in the society. Education also provides an individual with an opportunity to develop skills and learn how to solve the problems. Since people have different skills and abilities, it is important to acknowledge this and provide different forms of evaluating students’ achievement to avoid bias (Griffin, 1994). Measuring the cognitive (knowledge and understanding), affective (attitudes, beliefs and dispositions) and psychomotor (skills, behaviors and practices) outcomes of a program is important.

Cultural, racial, class and gender differences must be taken into account by any assessment task or procedure. In this context, there are strong arguments favoring educators considering using alternative methods of assessing students and evaluating the program (Klecker, 2000).  Even when educators continue to use standardized tests, it is necessary to supply them with other types of assessments. Such additional forms of assessment are:

Group Tasks or Activities

Students’ learning can alternatively or supplementary be assessed by their performance in the group tasks or activities. For instance, if the written test covered 25 of the 50 items covered, then the students can be put into groups and the remaining 25 items covered. These groups are composed of students who work together to tackle a complex problem or carry out a detailed experiment. The structuring of an appropriate group activity is such that each student has a vital role in the task. Group tasks, like any other authentic cooperative learning activities, should include a combination of individual accountability and group responsibility (Johnson & Johnson, 1994).

A popular design of the group tasks is assigning students to perform an activity as a group. Each student is then expected to produce a written product based on that experience. Most group tasks and activities are used by individual teachers as a part of their instructional process, but some large scale assessment systems include them as well. As explained by Popham (1999), they provide information on two key learning goals:

  • They give information on a student’s ability to apply skills to produce outcomes that can be evaluated.
  • They show a student’s ability to work with others in a team to find solutions to the problems.

However, it is important to address the limitations of the group tasks and activities. Ensuring involvement of all students in developing group tasks and activities can prove challenging and time consuming (Popham, 1999). Scoring students’ responses can also be time-consuming especially if the class is a large one. Proper planning, group tasks and activities are the best ways to assess students, if involving students in team work is one of the cognitive goals. Group tasks can also provide an important source of information on complex learning results when paired with specific scoring criteria that students are taught before the group activity (Klecker, 2000). Cooperative groups enhance  students’ understanding of concepts through interaction with peers verbally. They also provide information to the teacher on the cognitive processes students employ in giving responses. Group tasks are also an important aspect in reinforcing the learning environment in a classroom (Johnson & Johnson, 1994).

Portfolios and Other Collection of Students’ Work

These are compilations of students’ work which show what they have achieved so far. Most portfolios include collections of students’ written papers and other works completed in their process of learning (William, 2006). These collections demonstrate the progress of a student over the years. This form of assessment encourages the participation of all interested individuals (teachers, students, parents) in the documentation of the learning process. The papers are derived from students’ daily classroom work. It involves taking samples of students’ work, recording of students’ observations of learning experiences, and evaluation of students’ processes and outcomes. Although information from this type of assessment can be used for grading purposes, the main goal is to improve the instruction methods and students’ learning (Shapiro, 2004).

Curriculum-Based Assessment

Although it falls under criterion-referenced testing, curriculum-based assessment is considered an alternative to traditional standardized norm-referenced academic testing. It refers to a measurement that relies on direct observation and recording students’ performance in the local curriculum as a basis of gathering information to make instructional decisions (Deno, 2003). Curriculum-based assessment (CBA) has also been termed as direct assessment of educational skills, and is based on the assumption that assessment should be on what has been taught. CBA involves repeated measurement of students’ academic skills (Linn & Grolund, 2000). In each area of learning, probes are selected and used to gauge students’ performance. The probes are developed from curricular materials available in the students’ immediate learning environment. CBAs therefore provide a structured method to assess the students’ performance based on curricular assignments used in their actual learning environment (Brookhart, 2000). The basic argument underpinning this assessment approach is that in evaluating their progress, students should be observed in their academic environment.

Dynamic Assessment

This refers to a type of learning assessment that makes use of an active teaching process. The goal of this process is to provide a modification in an individual’s cognitive functioning and observe the changes that result in the examinees’ learning and problem solving strategies. The main goals of dynamic assessment are to:

  • Assess the ability of a student to identify the principles behind a problem and use this understanding to provide a solution.
  • Assess the most appropriate type and amount of teaching required to teach a student the specific principle.
  • Understand any cognitive deficits and non-cognitive factors which help to explain failure in students’ performances and whether teaching can modify such factors (Gay & Airasian, 1999).

Dynamic assessment is a contrast of standardized assessment where examiners present items to examinees without providing any guidance or any other form of intervention designed to improve the students’ performance (Brookhart, 2000). In static assessment, an individual’s deficits and disabilities are accepted and modification is done on the environment to allow the person to work within the identified limitations (Johnson, Johnson & Holubec, 1994). On the contrary, dynamic assessment is based on the active modification where efforts are made to remediate the deficits or to provide the individual with alternative strategies to solve probes to supplement their areas of weakness.

High stakes testing is linked to overreliance on standardized tests as the primary means of assessment and as the principle source of curriculum content. These high stakes testing poses critical consequences to students since the use of a single measure (standardized test score) may determine graduation or promotion to the next level. Instructors are now held accountable of students’ performances (Becker, 2001). Without doubt, testing and accountability are important aspects of a program since assessment practices are the key to accountability and improved teaching processes. However, opposition arises from the use of a single measure of assessment through standardized tests.

The decisions regarding progress, promotion and gradation using a single indicator to measure an individual’s learning violate the ethics of teaching. The higher the stakes for testing, the greater the emphasis teachers will place on the test preparation and teaching to the test as opposed to meaningful learning. Assessment should be driven by innovative curriculum design and effective teaching practices. Other types of assessments should be used to supplement standardized tests to enhance the spirit of research among students. As asserted by Becker (2001), no clear evidence exists that high test scores reflect actual improvement in students’ learning either at the individual or group level. No single measure can be used as a definitive measure of a student’s knowledge. Student assessment is constructive, if the used educational approaches are research oriented and emphasize on equity in the academic processes. Not all students demonstrate well what they have learnt using standardized tests, biased assessment, policies and practices should not be used since they limit learning opportunities for individuals and hinder curriculum development and teaching.

