We Measure What Matters, Which Is Why Subgroups in ESSA Accountability Systems Are Important
November 01, 2018 02:04 pm
Recently, Mike Petrilli wrote about the Alliance for Excellent Education’s analysis of state ESSA plans in which we found that twelve states do not ensure subgroups are universally included in school ratings. While acknowledging that this could be an issue, Mike, with an assist from Aaron Churchill, used Ohio data to make the case that we were (mostly) crying wolf: Including subgroups in school ratings doesn’t matter because subgroup performance is almost always reflected in schoolwide averages, at least when using value-added measures. Specifically, Mike and Aaron showed how school-level growth data for “all students” in Ohio tends to be strongly correlated with school-level growth data for “Black” and “low-income” students. Very few schools would have received both an “A” or “B” grade for “all students” growth and a “D” or ”F” for the specific subgroup’s growth. Mike concluded that we “should stop fretting about this particular aspect of state accountability systems” and move on.
I agree that there are other aspects of accountability that should be investigated, and All4Ed has plans to do so as ESSA implementation continues. And I agree that I’m an uber-wonk (thanks, I think?). But I disagree that All4Ed and other advocates for historically underserved students should stop worrying about the inclusion, or lack thereof, of subgroups in school ratings.
Statistics aren’t the only consideration when building an accountability system. By Mike’s logic, if school-level math and reading proficiency data tend to be highly related, we could eliminate math achievement from school ratings and rely on reading alone as a good-enough proxy. No one is arguing for that, however, because student mastery of both subjects is important and including both in accountability systems is one of the clearest signals states can send that a high-quality school means teaching students well in both areas. Isn’t it just as true that a school should not be considered a high-quality learning environment if it fails to teach subgroups of students effectively? And what does it say about a state’s values if subgroups are missing from school ratings?
To boot, it turns out “all students” data isn’t always such a great proxy for subgroup data, even when you’re looking at growth (which is less likely to be correlated with student demographics). Here are two examples, also using school-level growth data[i] from Ohio:
Students with disabilities: Nearly half of schools getting an “A” for “all students” growth receive a “C” or worse for students with disabilities’ growth—information that would be lost if only the “all students” data were considered. In total, this affects about 450 schools, or 17 percent of all Ohio schools with both “all students” and students with disabilities growth data.
Low-performing students: Likewise, about one-third of all schools with an “A” for “all students” growth receive a “C” or lower grade for the growth of the lowest-performing students (i.e., students whose past achievement was in the bottom 20th percentile statewide). That’s about 330 schools, or 12 percent of all Ohio schools with growth data for both “all students” and the low-performing subgroup.
Maybe that’s why Ohio doesn’t rely on “all students” data alone. The state chose to include growth for these two subgroups—plus “all students” growth and growth for gifted students—in the “Progress” component of its A–F school grades. Further, schools can’t get an “A” on the Progress component if they receive a “C” (or lower) for growth of any of the three subgroups; the schools highlighted above can do no better than a “B.” In addition, Ohio uses disaggregated data in the “Gap Closing” component, which examines achievement and graduation rates for students of color, low-income students, English learners, and students with disabilities, along with English language proficiency. For these reasons, Ohio is not among the states we flagged for concern.
Even better, Ohio makes these disaggregated indicators matter. Over one-third of a school’s grade can be based on subgroup performance.[ii] As a result, not a single “A” school overall received a “C” or worse on Gap Closing. In fact, 99 percent of “A” schools in Ohio also received an “A” for Gap Closing. There are also safeguards to ensure that the Gap Closing component grade is lowered in schools where any individual subgroup demonstrates very low achievement or graduation rates.
And that’s how it should be. Building an accountability system isn’t merely a statistical exercise states go through to meet the technical requirements of federal law. It’s one of the most effective tools states have to signal what they value. For example, states added new measures under ESSA—like college and career readiness, chronic absenteeism, and suspension rates—to better align accountability systems to critical policy goals. Is improved performance for subgroups of students not a critical policy goal?
State leaders and organizations representing them have spent the last two years touting how states will use ESSA to advance equity for historically underserved students. But the signal states send when their rating systems only examine “all students” data is precisely the opposite.
Fortunately, the mismatch between states’ values and accountability policies has an easy solution. Yes, states could legislate or regulate a new system that includes a significant weight for subgroups on key indicators—like the one used in Ohio. Given that ESSA plans were just approved, however, that’s unlikely to be an appealing option. Instead, states could adopt a business rule to ensure ratings reflect subgroup performance. Seven states in our analysis (Illinois, Kentucky, Louisiana, Maine, Massachusetts, Nevada and Rhode Island) deploy these rules already, and they can work with any kind of rating—whether five-star reviews, A–F grades, or a descriptive label. In some states, a school cannot receive the highest or second-highest rating if it has a consistently underperforming subgroup (e.g., a school cannot simultaneously be identified for targeted support and be among the highest-rated in the state). In others, the rating is lowered one level if the school has a low-performing subgroup (i.e., a “B” school could become a “C”). Either way, these rules build coherence between school ratings and school identification, are easy to implement and understand, and reinforce the value that a high-quality school must serve subgroups of students well.
Mike is right on another point: We don’t yet know how many schools will receive high overall ratings under ESSA despite having a low-performing subgroup. But we do know that states can take simple steps to make sure the number is zero and reinforce their commitment to equity at the same time. Until they do, I reserve the right to keep worrying.
[i] The Progress component of Ohio’s A–F grading system is based on a weighted average of four sub-measures: growth in math and reading for all students (55%), students with disabilities (15%), the lowest-performing students (15%), and gifted students (15%). The data used here are from the 2017-2018 School Report Cards Data Spreadsheets, which include a school’s overall grade, its Gap Closing component grade, its Progress component grade, and grades for each of the four groups of students examined within the Progress component.
[ii] Ohio awards school grades to both districts and schools based on six components. Because not every component is applicable to each school, the precise percentage of the overall grade based on subgroup data varies based on the grade levels the school serves.
This post originally appeared in Flypaper.
Anne Hyslop is assistant director, policy development & government relations at All4Ed
Featured Image by Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action