Wednesday, January 9, 2013

Guest writer Doug McRae on the proposed revamping of testing

California Superintendent of Schools Tom Torlakson yesterday unveiled a proposal to revamp testing system for California schools. Here's my story on the topic.

As often is the case with topics on testing, I rely on the expertise of Doug McRae, statistician extraordinaire and former top executive with one of the leading testing companies in the country. He has a lot to say about the proposal, so here's what he wrote in response to Torlakson's detailed proposal. (You can also find that in the story).

By Doug McRae

General Comments

The purpose for a K-12 assessment system is the most important factor for the design of a new assessment system. The primary purpose defined by the report is found on page 4 when the report says the recommended system “has as its primary purpose to model and promote high-quality teaching and student learning activities.” This statement of purpose deviates from the usual purpose for a K-12 statewide assessment system, which is to measure the results of instruction. CA’s previous STAR system had measuring the results of instruction as its primary purpose, so the SPI is recommending a radical departure from the previous statewide assessments.

Based on many many years designing and developing large scale K-12 testing systems, I can say that the two purposes cited above are mutually incompatible. Systems designed for purposes of instruction do not serve the needs for measuring the results of instruction, and systems designed for purposes of measuring the results of instruction do not serve instructional needs well. The report itself says this in a one-sided way on page 5 when it says that STAR is valid for comparing school and district performance (i.e., measuring the results of instruction), but it was not designed to support instructional uses. What the report neglects to say is that assessment systems designed for instructional purposes are not designed to serve the needs of measuring the results of instruction (i.e., accountability uses) well.

The theme of the report focuses on use of assessment data for instruction. Now, data for instructional purposes is a vital ingredient for a comprehensive instructional system, but such data need to reside within the instructional system. Such an instructional system should have strong local school and district control to be aligned with other components of an instructional system, such as instructional materials and professional development. A statewide assessment system is centrally controlled with strict standardization and testing windows and test security that limit utility for instruction. In effect, the SPI recommendation replaces accountability testing by an approach to statewide testing motivated by helping instructional systems. As such, the report has a strong anti-accountability theme. Indeed, over the past 20 years, one of the constants in the debates over statewide assessment systems in the US has been the conflict between advocates of accountability and detractors of accountability for our K-12 schools, and advocating for instructional tests rather than tests that measure the results of instruction is in effect an anti-accountability Trojan Horse strategy that focuses on the benefits of instructional tests to hide an intent to minimize the use of accountability tests.

Rather than abandon the accountability purpose for CA’s statewide testing system, a far better approach would be to advocate for instructional tests as part of CA local control instructional systems, and reserve the mandated statewide assessment system efforts for measuring the results of instruction, or the accountability purpose.

The report provides very little information on three other aspects of its recommendations:

• Tests for instruction will take increased testing time – this is acknowledged generally in the report but no specific information is provided for how much increased time. From the information available from the Smarter Balanced consortium, the summative portion of the tests they plan to provide will increase CA’s statewide assessment time anywhere from 50 percent to 300 percent.
• Tests for instruction will require increased financial resources – again, this is generally acknowledged in the report but no specific information is provided for how many additional dollars will be needed. Based on information currently available on the Smarter Balanced tests that the SPI anticipates will serve as the core for a future statewide assessment system, the costs will increase anywhere from 50 percent to 300 percent.
• The prospects for California being able to implement a computer-adaptive testing system by 2014-15 are not addressed at all in the report. California will have to upgrade its technology hardware and bandwidth significantly simply to administer computer-adaptive tests, just to permit schools to conduct test administrations. In addition, whether or not students will be ready to take computerized tests by 2014-15 is an unknown that has yet to be explored at all. Frankly, the chances that California will have the hardware capability or the student experience needed to implement a statewide computerized test by 2014-15 are slim and none. [More on this under detail recommendation (2) below.]

Detailed Recommendations

The report provides 12 detailed recommendations on pp 41-47, with a summary table on p 48. I’ll now provide initial observations on these 12 recommendations.

(1) Suspend Portions of STAR Beginning Spring 2014
This recommendation is to suspend all parts of STAR not used for federal accountability reporting starting 2014. No rationale is given for this recommendation other than “suspending assessments . . . . will allow staff and stakeholders to focus attention, efforts, and resources on building a new assessment and accountability system.” The only logical way to interpret this recommendation is that the SPI and CDE staff are anti-accountability testing and they want to use their time to design a statewide instructional assessment system. The recommendation ignores the benefits of a substantial amount of data generated by the current STAR program, and the achievement data trends it supplies for schools and districts as well as the state. A far more nuanced recommendation would be to suspend the portions of STAR that generate duplicative and/or unneeded information; there are some major portions of our current statewide assessment system (STAR as well as CAHSEE) that can be modified without losing valuable specific information and/or trend information.

(2) Fully Implement Smarter Balanced E/LA and Math Tests in 2014-15
We have no idea whether we can implement a computerized test for 3-4 million kids by 2014-15 yet. As indicated above, hardware and bandwidth upgrades of unknown quantities will be needed, and students have not been prepped to be able to generate valid and reliable scores from computerized assessments yet. The report notes under this recommendation that for students unable to access a computer, a paper-and-pencil version of the SBAC assessment will be provided for up to 3 years. However, there is no discussion whether the two versions of the SBAC assessments will yield comparable scores that would allow for local school, local district, or statewide aggregation; the technical considerations for generating comparable scores particularly for variable form computer-adaptive tests and fixed form paper-and-pencil tests are substantial, and indeed the state-of-the-art for providing such comparisons is in its infancy. The “phase-in over 3 years” recommendation is shaky, much more so for computer-adaptive tests than for fixed form computer-administered tests and counterpart fixed-form paper-and-pencil tests.

(3) Use Grade 11 SBAC E/LA and Mathematics Tests as Indicators of College Readiness Other than college admission tests such as the College Board SAT and the ACT, the idea of measuring college readiness and/or career readiness across the full spectrum of grade 11 enrollment is a relatively new idea, and frankly assessment system designers do not have much experience with this concept. Initial progress by both consortia (Smarter Balanced and PARCC) now developing “next generation” assessments has been slow. The idea that we will have a fully functional college and/or career readiness assessment instrument by 2014-15 is fanciful thinking.

(4) Develop and Administer Science Assessments Aligned to New Science Standards No standards yet, no details yet, no timeline yet, other than these new assessments will be like the E/LA and Math assessments in terms of their instructional qualities. A pretty empty recommendation right now.

(5) Use Multistate Consortia Alternate Assessments to Replace CAPA To date, virtually no information has been disseminated on CA’s involvement in the NCSC consortium of which CA is a member, nor to my knowledge has the consortium itself disseminated information on its progress.

(6) Determine the Need for Assessments in Other Languages than English Primary language assessments are needed to provide evaluation of the effectiveness of various bilingual education programs (i.e., to measure the results of instruction for these programs). California has had an ineffective primary language program for years since the current test was not designed to provide valid accountability information.

(7) Assess the Full Curriculum with Model Instructional Tests
As indicated under General Comments above, this recommendation is a fine contribution to a local control comprehensive instructional system, but does not belong in a large scale standardized secure centrally controlled statewide assessment system.

(8) Invest in Interim, Diagnostic, and Formative Testing Tools
Again, as indicated under General Comments above, these tools are fine for a locally controlled comprehensive instructional system, but not for a centrally controlled statewide assessment system. There is a significant danger that including these components in a statewide assessment system will result in these components simply being used to game a statewide summative assessment system designed to measure the results of instruction, or in effect as state encouraged institutional tools to “teach to the test” in ways more pervasive than current “teaching to the test” efforts. Teaching to the test efforts degrade both the validity of statewide assessment results and good instructional practices.

(9) Consider Alternatives to CAHSEE
 CAHSEE is essentially a redundant test for a substantial number of 10th graders in CA. We can use STAR results from earlier grades as “early qualification” tests for the CAHSEE high school graduation requirement. This efficiency for our statewide assessment system has been ignored for years by Sacramento policymakers, and is a major contributor to over testing our high school students.

(10) Explore Matriculation Exams
Finally, a recommendation I can agree with . . . . .

(11) Conduct Comparability Studies
These studies are absolutely necessary, but they will impose a substantial time requirement for students to take both tests. The representative samples needed may be problematic if any phased implementations over three years recommended above takes place if indeed certain subgroups of students (i.e., our underserved subgroups) all take paper-and-pencil tests while other subgroups (non-minorities, high SES) take computer-adaptive tests. Achieving adequate study designs for these studies will not be trivial.

(12) Maintain a Continuous Cycle of Improvement
Always a good recommendation for any “future” report . . . . . fills out a nice round number for recommendations.

No comments:

Post a Comment