Can the Testing Industry Meet Growing Demand?
The passage of the No Child Left Behind (NCLB) Act significantly increased the prominence of standardized testing in the nation's K-12 schools. Every state will have to test every student repeatedly in reading, mathematics, and science, and student scores will be a critical measure of how well the schools are fulfilling their mission. For the Bush administration, testing is the key to accountability, and analysis of the results will be a powerful influence on future curriculum and classroom practice.
Since most school systems contract out test preparation to private firms, one practical question to ask as the country begins to implement the NCLB testing requirements is whether the testing industry can meet the demand for tests that will be coming from the states. In considering this question, we have to consider everything that schools and the public expect from such a high-stakes testing program.
Even before the passage of NCLB, states were asking for custom tests aligned with specific standards. They wanted combinations of multiple-choice and open-ended items that could, as deemed appropriate, be released to the public as part of the pubic awareness initiative so critical to high-stakes testing programs. States wanted electronic data banks of test questions linked to specific content requirements of the curriculum. They wanted multiple versions of each test to address issues of security and the opportunity to field test questions for use in future tests. States wanted tests in large print, audio, Braille, and numerous languages to meet the needs of special populations. In addition to tests pegged to state standards, they wanted norm-referenced tests that would compare each student's performance to that of other students across the state and country. They also wanted practice tests, study guides, data management tools, electronic reports, and the ability to manipulate and merge data from a variety of different sources. Of course, they all wanted to administer tests as late as possible in the school year and then to receive results before the end of school.
As daunting as this sounds, the testing industry believes it is up to the challenge. Although the NCLB testing requirement is usually described in the press as something new for the schools, the reality is that virtually all U.S. school systems are already conducting testing programs in response to previous federal legislation or their own desire for accountability. The NCLB is more likely to require a reconfiguration rather than an expansion of testing. States such as Maryland, Alabama, and Pennsylvania have already indicated that they will be eliminating their current testing programs and replacing them with testing that meets the NCLB requirements. We expect to see the states narrowing their testing to meet a few key criteria: States will be demanding tests that allow for rapid and meaningful reporting of results, that focus measurement on achievement of critical state standards, and that are carefully aligned with the required curriculum. If the states are reasonable in their demands, the challenge to the testing industry will be serious but not insurmountable.
History of testing
The signing of NCLB certainly did highlight the power of assessment in the lives of students, teachers, parents, school officials, and policymakers. Although seen by some as a dramatic development in testing, in a fundamental way this law was the logical next step in the natural evolution of the state of achievement testing and standards-based education reform that began with the first Elementary and Secondary Education Act (ESEA) in 1965.
ESEA required the states to assess the progress of students who were receiving federal support through programs such as Title I for low-income students. The Improving America's Schools Act of 1994 amended ESEA by extending the testing requirement to all students. Time and research had shown that for all children to learn, the entire school had to be focused on the learning of all children. The de facto segregation of students into regular classrooms and "special services" classrooms had to end. The 1994 legislation required all states to have standards of academic achievement; to assess students at three stages (grades 3-5, 6-8, and 9-12) to determine if they were meeting the standards; and to implement an accountability system that would identify schools where students were not making adequate progress.
Although it took six years of discussion to fully articulate the requirements of the Improving America's Schools Act, the testing industry began to see significant growth in requests for state-owned standards-based assessments soon after the bill was passed. In the mid-nineties, the states began to ask for their own custom-designed tests rather than the off-the-shelf tests that the companies had already prepared. Test publishers found themselves in the business of developing content, manufacturing tests, and scoring and reporting on assessments that were not part of their inventory and, in fact, did not last beyond a single administration. This stressed the capacity of the testing industry. At Harcourt we found ourselves preparing more than 20 new editions of tests in less than half the usual time.
This shift away from test-publisher products to state-owned, work-for-hire products opened the door for new companies to enter the industry and grow in the marketplace. The new companies were able to compete for business, because there was no longer a need to have a full catalog of already-prepared tests. With more companies looking to hire testing experts and the states also needing to add staff with testing expertise, the competition for talent was fierce. The sudden surge in demand made it difficult to maintain quality. Sometimes errors were found in test results, reports from the companies were sometimes late, and the cost per student increased. The public was not happy.
By 2000, the maturation of the standards-based era became evident. At that point, 49 of the states plus the District of Columbia and Puerto Rico had content standards in at least some subjects. Twenty-four of the states had both content standards and performance standards. Although testing was becoming commonplace, its importance was not yet clearly established, and states varied considerably in how they used test information. With the majority of states having progressed this far into standards-based education reform, the stage was set for NCLB, which made the tests the primary criterion for determining school accountability for individual student learning.
The 1999-2000 Annual Survey of State Student Assessment Programs prepared by the Council of Chief State School Officers (CCSSO) revealed that all states except Nebraska and Iowa had mandated testing in place. Of these 48 states, all assess reading, mathematics, and science at some grade levels. Alabama, Arizona, California, Florida, Idaho, Kentucky, Louisiana, New Mexico, Tennessee, and Texas had met the full requirement of NCLB by mandating testing programs in grades 3-8 and one high school grade--though it is not certain that all had developed standards-based tests in all areas. Therefore, at least 40 states need to expand their testing programs to be compliant with NCLB. At first glance this task seems impossible.
Viewed a different way, the burden is not that overwhelming. According to CCSSO data, which includes 49 states plus the District of Columbia, Puerto Rico, Guam, and the Department of Defense schools, a total of 1,823 tests are being administered in all subjects in all grades in 57 political units. For each of these political units to meet the NCLB standard of testing students in grades 3-8 and one high school grade in mathematics, reading, and science would require the administration of 1,197 tests, a 34 percent reduction from the total number of tests now being administered.
The NCLB does not require the use of open-ended questions, which are expensive and time consuming to grade. If the 20 or so states that now use open-ended questions eliminate them from their testing, it will result in a further decrease in the total testing burden. Changes in the volume of questions being field-tested could also have a significant effect on the total testing burden. In the interest of keeping the public informed about what is being tested, many states now release the full content of each test after it is administered. This means that a completely new test must be prepared each year. The items for next year's test must be field-tested by including them in this year's test. Since only a few questions can be included in each test, it is necessary to prepare 30 to 40 separate versions of the test in order to be able to test enough items.
If the states decide not to release the test each year, they need to produce only three versions of each test to maintain a secure testing program: the primary test: a version that students who fail the test could take for retesting: and a backup version to use in the event of a security violation. This would greatly reduce the cost and difficulty of preparing tests. The time and money saved in this way could be redirected toward efforts to understand how the test results can be used to improve classroom practice or curriculum design. In addition, resources could be devoted to preparing developmental scales that would make it possible to determine if an individual student is making adequate annual progress.
The expansion of testing in the 1990s forced the testing industry to add staff and expand its capacity in other ways. Having this added capacity in place will make it possible to take on the challenge of developing a new generation of tests.
For the testing industry, NCLB is about assuring our customers that we can build standards-based tests consistent with the research requirements and third-party alignment of NCLB, and that we can score the tests and report the results promptly with 100 percent accuracy. On the development side, regardless of whether customers reconfigure their testing programs, test publishers must be able to demonstrate their alignment methodology and to have their alignment studies validated by independent third parties.
The content itself is broadly prescribed by the legislation. Each state is responsible for setting detailed standards. It has already become apparent that there will be considerable variation in how states set and express their standards. The number and nature of the standards will determine how each state test is designed, and most of the states were not thinking about test design while developing their standards. In many states, the standards are numerous and detailed. To test the students on all of these standards will require tests that require far more time than the states want to devote to testing. Revising the standards is not practical. Most states followed very well articulated processes for widespread review and participation in the content standards formulation, and they have neither the will nor the time to repeat the process. States will therefore have to make difficult decisions about what to include in their tests.
Some standards--for example, to read five fiction and five nonfiction books in ninth grade--simply cannot be tested. Others, such as oral reading and speech fluency, cannot be measured on a paper-and-pencil test. But most states still have too many standards to assess in a reasonable amount of time. A state that decides that it cannot test for all standards has two options. The first is to establish superordinate standards, each of which is meant to incorporate a group of more specific standards. This will provide a broadbrush picture of student achievement but not the type of detail needed to fine-tune classroom activities. The second is to identify a core of critical standards that are tested every year with other standards being tested periodically. This option is the most useful for providing detailed guidance for teachers.
Once this decision is made, a state and a testing company must work together to identify test items that correspond with the standards and fulfill the NCLB requirements. The states and the industry have been doing this type of work during the 1990s. What is different this time is that the test also has to satisfy the federal government. Each state will have a development partner and an external auditor in the person of the federal government.
Because the test results will have powerful consequences for schools, scoring accuracy is of primary importance. Each test publisher understands this well and strives diligently to make zero errors a reality for all test scores. As the criticality of scores increases under NCLB, test publishers will be challenged to secure the psychometric resources required to make zero error a reality for all testing programs. With the number of psychometricians graduating from doctoral programs already too low to meet the demand, companies will be hard pressed to recruit and maintain such talent.
It seems quite clear that year-to-year reporting on individual students and on certain cohorts will be required to satisfy NCLB. Year-to-year tracking of individual students will require assigning a unique identifier to each student so that the student can be followed from grade to grade, school to school, and district to district. When states do not provide this unique identifier, test publishers will have to find a way to do it.
To make the tracking of individual scores over time meaningful, it will be necessary to establish a development scale that provides a model for how much a student should be advancing from year to year. The designations of basic, proficient, and advanced are too broad to be useful for helping an individual. It would seem much more appropriate to monitor student learning along a fine-grained developmental scale that can register small changes in progress.
Both the states and the testing companies will find it difficult to recruit expert staff. As the criticality of scores increases under NCLB, test publishers will be challenged to secure the psychometric resources required to make zero-error scoring and reporting a reality for all testing programs. With the number of psychometricians graduating from doctoral programs already too low to meet the demand, companies will be hard pressed to recruit and maintain such talent. State education departments will need to hire additional psychometricians to manage the test preparation process and to analyze results and apply the findings to classroom practice.
States will have to convene working groups of teachers, administrators, curriculum specialists, and testing experts for the time-consuming task of selecting curriculum content, developing standards, and aligning test questions with the standards. Active teachers need to be included in these groups, but it is not clear how someone could teach and attend an endless series of meetings at the same time.
If the states were uncertain about the wisdom of maintaining their own state testing programs as well as introducing new tests linked to NCLB, the decline in state tax revenues resulting from a slowing economy should help them make the decision. For virtually all the states, the wise path is to narrow their focus and limit their expenses by focusing on meeting the basic NCLB requirements. This will not only make their burden bearable but also enable the testing industry to focus its resources so that it can sustain quality while developing new tests for all the states. From the perspective of the industry, it makes more sense to figure out what can be done well than to try to do everything. In particular, we should be ensuring that the scoring and reporting of results is accurate and devoting adequate resources to the task of using the test results to enhance the quality of instruction. We must remember that the purpose is not more and better tests, but more effective teaching and a better education for all students.
Margaret Jorgensen (firstname.lastname@example.org) is vice president for product development, psychometrics, and research at Harcourt Educational Measurement in San Antonio, Texas.