Because of numerous criticisms of standardized test scores being used to measure student ability, economists have attempted to enhance educational accountability through advanced statistical techniques. Rather than holding teachers and schools accountable to mere examinations of test score levels, researchers have developed measures that account for student growth. The most innovative system devised by scholars today is known as value-added measurement. Although value-added is a substantial improvement to previous accountability measures, there are still many problems that the system cannot avoid.
What is Value-Added?
Value-added measurement usually relies on student fixed effects regression in order to determine the impacts of teachers and schools on student test scores. Since many unobservable student characteristics are relatively constant over time, this technique can control for several factors that could otherwise produce biased estimates of the impacts of schooling. The strongest types of value-added systems have information on individual students over time and a baseline measure of achievement. As a result, the model can estimate how a student is predicted to perform, and compare that to how they actually perform. If the student switches into a new year and performs better than expected, scholars interpret the difference as the effect of the teacher or school, depending on the model.
Random-Sorting & Predicting Scores
First, this framework assumes that students are exogenously switching from one teacher to another. In other words, the model assumes that principals are not doing their jobs and are randomly assigning students to teachers. We would hope that this is not the case, especially if we believe the principal should assign students to teachers based on their abilities, needs, interests, and learning styles.
In addition, these models examine students based on their observable characteristics (and baseline test scores) to predict how they ought to do the following year. This is problematic since the model can only do so based on observable characteristics. The model predicts a value based on things such as skin color, income level, and a crude measure of innate ability. Obviously, two minority children coming from households with similar incomes and similar test scores in the previous year are different in ways that will affect their achievement in subsequent years.
Narrow Focused Tests Needed
Jacob and Rothstein (2016) discuss issues with other assumptions that researchers quickly make about these measures and their potentially damaging consequences. For instance, if we want to use value-added to keep individual teachers accountable, the assessments must have an extremely narrow focus. If we really want to attribute the growth of student achievement to individual subject-area teachers, we must ensure that the assessment does not capture information or skills that come from more than one classroom. Additionally, to improve the measure, we would need the assessments to capture abilities that are malleable solely within individual classrooms. In an attempt to hold teachers more accountable, by improving the value-added measure, we are likely to do much harm to students.
The Scaling Assumption
Perhaps even more importantly, even the best value-added models today assume that the test scores are internally scaled (Ballou, 2009). In other words, receiving a test score gain from zero to 50 and a gain from 50 to 100 are assumed to result in an equal change in cognitive skills. This is a heroic assumption; learning to recite the alphabet is not an equivalent jump in cognitive ability as going from being able to recite the alphabet to reading words. Since the underlying attribute (aptitude or ability) that we care about is not directly measured, our crude measures do not allow us to make the assumption of equal scale across the points awarded on a given assessment.
However, test scores with non-internal scales could be used to predict other measures that are internally scaled, such as graduation and earnings. Nevertheless, these long-term outcomes would take an enormous amount of time and effort to collect and analyze. Further, even if these measures were costless to collect and analyze, such a model would require test scores to be able to predict graduation and earnings. As we know, there is a huge disconnect in the literature between test score outcomes and long-term outcomes. The voucher programs in Milwaukee and DC, for example, produced little or no test score gain, but large increases in high school graduation (Cowen et al., 2013; Wolf et al., 2013). Alternatively, studies have shown some charter schools had a large test score impact with no change in high school graduation (Angrist et al., 2014; Dobbie & Fryer, 2014; Tuttle et al., 2015; Unterman et al., 2016). In addition, this could cause us to overestimate achievement gaps since, on average, black students are more likely to graduate than white students with the same standardized test scores.
Important Unmeasurable Skills
And what about non-cognitive skills? Even if we could, theoretically, perfectly measure the impacts that teachers and schools have on student cognitive skills, we would still miss the other half of the equation. Focusing on cognitive, academic skills is probably beneficial for some students, but it can be harmful to many others. If schools are meant to alter the life-trajectory of children, they must shape citizenship skills, determination, and conscientiousness. Those skills cannot be measured accurately at this time, and probably will not be measured accurately in the future.
Even the most-refined measure that we have, value-added modeling, relies on the heroic assumptions of exogenous sorting and internally scaled assessments. Even if the model somehow did not rely on these assumptions, the accountability system would require someone to decide what ought to be assessed, how much students ought to grow, and what the goals of education ought to be for all members of society.
Uniform decisions such as these cannot, and will not, work for children with unique interests, abilities, and learning styles. The only way to account for diverse children is to make schools accountable to the desires of families. The only way to do that is to allow individual families to choose the type of education that is best for their children.
Image Source: http://creativeelectron.com/wp-content/uploads/2015/03/Statistical-Analysis-X-Ray-sm.jpg