Making the Grade: Debating School Performance Ratings
Making the Grade: Debating School Performance Ratings
Phillip Lovell, Vice President of Policy Development and Government Relations, Alliance for Excellent Education
Michael J. Petrilli, President, Thomas B. Fordham Institute
On Januray 11, 2018 the Alliance for Excellent Education held a webinar on school performance ratings. When it comes to school accountability, education advocates tend to split into two camps. One camp believes that, just as students receive grades on their performance, schools should receive a single summative rating—A–F letter grade or a score of 0–100—on a performance index. Doing so, they believe, provides parents and the public clear, transparent, and actionable information on a school’s performance and puts necessary pressure on the education system to drive improvement in its most underserved schools.
The other camp believes that school performance is too complex to be boiled down to a single figure, and worries that these summative ratings provide insufficient information to parents and stakeholders to inform continuous improvement in low-performing schools.
The Thomas B. Fordham Institute recently released Rating the Ratings: An Analysis of the 51 ESSA Accountability Plans. The report examines rating systems submitted by all fifty states and the District of Columbia, grading them on how clear their rating systems are, whether they incent a focus on all kids, and whether they weight growth along with performance for schools.
This webinar presented findings from Rating the Ratings, followed by a conversation on the merits and downfalls of summative ratings between ratings-proponent Michael Petrilli, president of the Thomas B. Fordham Institute, and ratings-skeptic Phillip Lovell, vice president of policy development and government relations at the Alliance for Excellent Education.
This was not your typical panel where each participant agrees with the next; watch the webinar for a lively discussion.
Please direct questions concerning the webinar to email@example.com. If you are unable to watch the webinar live, an archived version will be available at https://www.all4ed.org/webinars approximately 1–2 business days after the event airs.
The Alliance for Excellent Education is a Washington, DC–based national policy, practice, and advocacy organization dedicated to ensuring that all students, particularly those who are historically underserved, graduate from high school ready for success in college, work, and citizenship. www.all4ed.org
If you are interested in renting the Alliance’s facilities for your next meeting or webinar, please visit our facilities page to learn more.
Phillip Lovell: Good afternoon and welcome to today’s webinar. Making the Grade: Debating School Performance Ratings. My name is Phillip Lovell and I’m the vice president of policy and government relations here at that Alliance for Excellent Education. I’m pleased to be joined today by Mike Petrilli, president of the Fordham Institute.
We’ve got a hot topic for today. Policy wonks care about the nuts and bolts of school ratings, everything from whether we should have school rating in the first place to what individual indicators should be and how much they should weigh. But this is a topic that isn’t only important to policy people like myself and Mike Petrilli. School ratings are important to parents, to businesses, to real estate agents as everyone wants to live in a good neighborhood with a good school.
So, what is a good school? And how do you measure it? Everyone acknowledges that education is complex. Some people, like Mike Petrilli, believe it’s important to boil that complexity down to a simple, understandable rating, like an A to F letter grade. Others, like myself, are a little bit more skeptical.
Under the Every Student Succeeds Act, or ESSA, states have flexibility to assign and A to F grade or some other rating or not. They also have the flexibility when it comes to what goes into those ratings. But, there are some federally set ground rules. For example, the law says that state accountability systems have to include at least four indicators: academic achievement, as measured by proficiency in math and reading, progress and English learning proficiency for English learners, a state-selected measure of school quality or student success, and then, for elementary and middle schools, growth or another academic indicator, and for high schools, graduation rates. The law also says that some of these indicators have to weigh more than others and that the performance of historically underserved students must be measured as part of each indicator. There are other requirements, but by and large, these are the high points.
Now, it’s important to note that ESSA doesn’t require states to assign an A to an F or a similar rating. Under the law, states must have an accountability system that identifies schools for support and improvement. But, the decision to use a rating like A to F is up to them. The policies that will determine school ratings are being developed right now. Under ESSA, each state must develop a plan to implement the new law and have that plan approved by the secretary of education.
These plans include the state’s rating system. 17 states and the District of Columbia have been approved. Just two were approved yesterday and the remaining states are currently pending approval at the U.S. Department of Education. A number of organizations, including us here at All4Ed have been analyzing the ESSA state plans with an eye towards not just the compliance with the statute, but also their focus on equity. You can find our ESSA equity dashboards at the link below.
The Fordham Institute has also been looking at ESSA plans. They’ve analyzed each state’s proposed school rating system. In November, they issued a report rating the ratings, an analysis of the 51 ESSA accountability plans. Mike is going to discuss that report, as well as the pros and cons of rating systems. But before we get that, we’re going to have just a few housekeeping items.
First, please follow us today on twitter using the #ESSA. To ask a question of either myself or Mike, you can use the box below. The webinar will be archived for later viewing at www.all4ed.org/webinars.
Now, let me get to formally introduce our special guest, Mike Petrilli. In addition to serving as president of the Fordham Institute, Mike is a research fellow at Stanford University’s Hoover Institution. He’s executive editor of Education Next and a distinguished senior fellow for Education Commission of the States. He’s an award-winning writer. He’s author of the Diverse Schools Dilemma and editor of Education for Upward Mobility. And Mike helped to create the U.S. Department of Education’s office for innovation and improvement. Pleased to have you on here with us today.
Michael J. Petrilli: Great. Thanks for having me, Phillip. Good to be here.
Phillip Lovell: Excellent. So, let’s get started by having you describe why an A to F system is important in describing your report and sharing some of the findings from your study.
Michael J. Petrilli: All right. That sounds good. So, I’m going to go ahead and click through a PowerPoint here to hit the highlights of what we looked at and what we found when we looked at the 51 ESSA plans. So, let’s go ahead. So, again, rating the ratings.
What we wanted to do was focus in on just a handful of things, Phillip. There are lots of other groups out there that have been evaluating state plans. Probably the best known one is one from the Bellwether Education Partners and the Collaborative for Student Success. They had a very comprehensive review, looked at lots of piece of what the states said.
We homed in just on the accountability systems and really on the ratings, the report cards that states have to put out where they tell the public and tell parents how each school in the state is performing. In our view – and this is definitely based on our own normative values – we think that those ratings should have really three objectives. The first is to assign ratings that are clear and intuitive. I mean, this seems pretty straightforward. If you’re going to go through this process and you’re going to try to provide transparency to the public and also put some pressure on schools to focus on improvement and do the tough things it takes to improve, then the ratings need to be clear.
Back in the No Child Left Behind years, that was not always the case. Often, not the case.
Phillip Lovell: What does AOIP mean?
Michael J. Petrilli: Yeah, these ratings at AYP and continuous improvement and – you know those – there were states where you’d look at their ratings and it’d say something like, “Level seven five.” These things didn’t make sense to anybody. So, we prefer to see A to F because that’s very clear to people in education. Doesn’t have to be that. Could be five stars, 1 to 100, something that is clear.
The second thing is to send a signal to schools that in order to be considered a good school, you have to help all kids make progress, not just kids who are low-performing or not just kids that are right near the proficiency line on state tests. And again, this comes from research from the No Child Left Behind years, where it was clear that in some places at least, there was a whole lot of focus on the quote bubble kids, the kids right near the line, right under and right below that proficiency line, and not a lot of attention on everybody else. That is, of course, not fair and a terrible way to drive behavior at the school level. We want to make sure that every kid is paid attention to.
Then, third, to make sure that we are fairly measuring all schools, including high poverty ones. So, again, back to No Child Left Behind. We had this problem where especially towards the end, virtually every high poverty school in the country was labeled a failure. And that’s because we were judging them based mostly on proficiency rates and graduation rates. And those kinds of rates are very strongly correlated with family background and with students’ previous achievement.
So, if you’re a high school and you have a lot of kids – most kids coming in to you two to three levels behind and you’re going to be measured by whether or not kids get to the proficiency bar, well, guess what? You’re going to be declared a failure. So what we argue here in this report is that instead to focus much more on what schools control. That is how much progress that kids make from one year to the next. Now, look, you’ll notice that all of this discussion is still largely about student outcome and student achievement.
That is something that at Fordham, we still think is incredibly important. That’s not to say that you can’t look at some other things as well. Of course, the law opens the door to looking at other things like social and emotional learning or feedback from parents or teachers or even students. Fine with all of that. Now, I think most people who have reviewed these plans have found that so far, there’s not a lot of that –
Phillip Lovell: There’s actually not very much in there. For all the flexibility, there’s –
Michael J. Petrilli: No. And the reason is, there’s not many measures out there that are ready for prime time. The reason we use standardized testing is because it is valid and reliable. We all know there’re downsides, but it is hard to find other alternatives. But we’re open to them. But we do believe that schools should be evaluated primarily on whether kids are seeing success because of that school. You measure that by looking at where kids are when they come in and where kids are when they leave or from year to year.
So, that’s what we were looking for when we dug into the plans. Again, clear, intuitive labels for each category. We gave a strong, medium, or weak. Again, strong, A to F, five stars, something like that. Medium would be text labels, as long as they’re somewhat easy to understand. They weren’t gibberish. Then, the weak ones were the ones where it was really just these data dashboards where you’ve got tons of information as required by the law, broken down. But, no kind of summative assessment.
Now, we’re going to talk a lot about this, Phillip. Note that states didn’t have to have just a single rating. Okay? If they wanted to have five grades, for example, that would be fine. They’d still get a strong here. But if they had nothing but just a slew of data that they were asking parents, tax-payers –
Phillip Lovell: No context around –
Michael J. Petrilli: No context. That didn’t count. And if all they did was come up with a way of determining which schools were going to be subject to interventions, which is required by the law, which is really the lowest five to ten percent of schools, plus schools that maybe aren’t doing well by subgroups. If that’s all they did, they got a weak. Because what that meant is that 90 percent or let’s say 80 percent of schools out there are not going to get any kind of rating. Nobody’s going to really know how they’re doing.
So how did states do on this front? Actually, quite well. This is where we are quite optimistic about the state plans under ESSA. Maybe one of the only organizations out there who actually think states did a pretty good job. At least on this. 69 percent got strongs. Arizona is one of the A to F states, for example, so there’s a state that did quite well. We found that basically 40 of the states decided to have some kind of summative rating for their schools. 40. Keep in mind, Phillip, that that number could have been zero.
Phillip Lovell: They didn’t have to under that law.
Michael J. Petrilli: They did not have to. They could have all said, “We’re just getting out of the rating business.” In my view, in our view, that would have been a big step back for accountability. But, so we think it’s pretty good news that states, given this flexibility, decided to stick with it.
The focus on all students, here, we wanted to see various ways that states could indicate that all kids matter. That is either by using a performance index or scale scores. Not to get too wonky, but basically to say, “Hey, if you’re going to look at how many kids, what percentage of kids get to proficiency, make sure you look at other performance levels, too. Like, how many kids get to basic, how many kids get to advanced. Then, turn that into an index. Or just look at scale scores, which are basically just the average score that the kids get all mushed together. Then, look at growth as well.” We’re open to different kinds of growth models, but growth over time.
So, at least 50 percent of the ratings came to bat. We viewed that as strong. At least 33 percent as medium. And below that was weak. Here again, the states did pretty well. Not quite as well. But 45 percent strong, which is certainly a majority. I would say about two-thirds of the states we think are doing either okay or quite well on this front.
So, that’s again, good news. We don’t think we’re going to see the same problem we saw under No Child Left Behind, with the focus on just the bubble kids or just the low performers. We’re hopeful. We at Fordham are very concerned about high-achieving, low-income kids. We think they’re going to get a lot more attention under this law. Colorado, we think, is a real exemplar here with 95 percent of their rating focused on scale scores and growth. Again, sending that message that everybody’s got to make improvement. And then fair to high poverty schools, a little different here, again, focusing on growth as the best way to –
Phillip Lovell: Is growth really the theme of the –
Michael J. Petrilli: Growth is the theme.
Phillip Lovell: Of the Fordham analysis.
Michael J. Petrilli: Now here, we are open to different kinds of growth. Some states, for example, have growth to proficiency. Are low performing kids making progress to the proficiency bar? Or growth for certain subgroups, we’re fine with that. We just want to see growth.
This is where the states, in our view, actually do the worst. That there’s still too many states out there, if you look at this, who don’t put enough focus on growth. We worry that that means in lots of states at least, it’s still going to be very hard for high-poverty schools to get good ratings, just because, again, the kids are coming in so far behind and they’re being asked to get them to this very high standard. There’ll be some schools that are amazing schools that are able to do that. But we worry that there’s going to be plenty of high-poverty schools given Cs and Ds or the equivalent that are actually doing a pretty good job. And it’s not going to show up in the state ratings.
So, that is what we looked at, Phillip. Let me just say one thing about why we think these ratings are important. There’s folks out there – California has probably been the most vocal, officials from California – making the argument that we should not do this school rating thing anymore. What we should do is provide lots of data, lots of information, so that when school leaders sit down with their teams and they’re working on school improvement plans, they have a lot of information that they can draw from and they can use that for improvement.
I would say that that’s fine. We have no problem with dashboards. But it should be both/and. School improvement teams should have access to lots of data. But we do think it’s important for the parents and the public to have access to some kind of summative judgement about how the school’s doing. Again, doesn’t have to be a single grade, doesn’t have to be all boiled down to one A or one F.
It could be maybe a grade for growth and another grade for achievement and maybe one for social and emotional learning. You could have a handful of grades like you see on a report card. But, some kind of summative judgement so that we’re telling parents the truth. If their kid is in a school that is being judged by a good accountability system and yet, is still not making any progress – okay, let’s say it’s an F on achievement, but it’s also an F on growth. That school is brain dead. The parents deserve to know. If all they get is a web site with all these databases and a blizzard of numbers, they may not get that message.
Now, it is – one other thing that they might do, that those parents might end up doing is going to other places that are willing to give them a rating. Like greatschools.net. So I’d also say to states, “Look, if you’re not willing to come out and tell a bottom line, someone else is.”
Phillip Lovell: Someone else is.
Michael J. Petrilli: And they’re going to do it probably using data that are much less nuanced. They’re probably just going to look at proficiency rates or they’re just going to look at test scores. So, you’re – if you want to compete with GreatSchools as a rating that people pay attention to, then you need to make it user friendly.
Phillip Lovell: Very helpful. Thanks very much for the presentation. I agree with a decent amount of what you’re saying. Clarity is important. Parents, the public, students, the business community, what have you –
Michael J. Petrilli: You should stop right there, Phillip. You should stop right there because that’s it. You agree.
Phillip Lovell: All of those things are true. What I get concerned about is the difference between clarity and accuracy. So, when I go to the doctor, for example, I go there. They take my height. Hasn’t changed. They take my weight. Unfortunately, that’s gone up a bit. They take my blood pressure, couple other tests, and I get told how I’m doing on each individual thing.
They don’t wrap it up into a summative score because each individual thing is different. It seems to me that with education, it’s fairly similar. If you get – if a school gets an A, the message by and large, school’s good. If you get an F or a D, by and large, school’s not so good. If you get a B or a C especially, what does that really tell you, knowing that these different things are – they’re just that. They’re very different. So what does the letter grade actually tell you?
Michael J. Petrilli: So let me go back with another analogy from the doctor’s. When I go, I also often get a blood test. When I get that test back, they’re looking at 30 different things and it’s all in small type. I’m trying to figure out – they’ll tell you something that I don’t even know what the heck this stuff is, and what the range is and what your number means, and are you in range or not. Well, that’s all I got.
It turned out that there was something in that test showing that there was a problem. I might miss it. My eyes are glazing over. I don’t know what it means. It’s important for the doctor to also say, “Hey, I have analyzed your blood test and here’s the bottom line. You need to worry about x, y, or z. Here’s what we’re going to do about it.”
Phillip Lovell: True.
Michael J. Petrilli: I’m asking for both here. Where I agree with you, Phillip, is that there is a danger that if it’s a single, summative grade and that’s it, that invariably, when you mush all these different things together, then you’re going to lose some context and some accuracy. That could be a problem. Probably, just because of the way the math works, you’re going to end up with a whole lot of schools in the middle. It’s going to be the case that a lot of schools are strong on one thing and weak on another, so everybody gets a C. That may not be very useful.
Phillip Lovell: You’re making my point.
Michael J. Petrilli: Yes. I hear that. I think where we have agreement is that maybe there is a happy medium, which is to say, there should be grades, they should be clear. But they don’t have to be single. Maybe a handful. In Ohio, where we do a lot of on the ground work, we like their report card on the whole. I think a lot of the other ratings like there, they do A to F.
But it’s actually so many grades. I think they’re up to something like 15 grades that get A to Fs. It’s now too much. It’s overwhelming. We have made a proposal. We tried to shrink it back down to, I think, six. Manageable.
And can set some clear – and as a charter school authorizer, we think about those grades in Ohio. We think about the two grades that are the most important: the achievement grade. Now, that one they use a performance index, but still, it is largely correlated with demographics and prior achievement. So you tend to see high-poverty schools get low grades on that.
Phillip Lovell: And growth?
Michael J. Petrilli: And growth.
Phillip Lovell: It’s shocking. I’m so surprised.
Michael J. Petrilli: And what you’ll see, Phillip, is you’ll see schools like the fantastic KIPP school in Columbus will get like a D on achievement and an A on growth. That A is much more representative. But, if there’s a school that gets an F on achievement and an F on growth and that happens two or three years in a row, that tells you that school is brain dead. There’s a big problem there. So if you don’t have –
Phillip Lovell: Brain dead, yes.
Michael J. Petrilli: In California, how would you ever know? I don’t know. Are people going to count out the number of reds? I mean, they have to look at this incredible blizzard of information and nobody in an official position is willing to say, “Here’s the bottom line. You’re school stinks.”
Phillip Lovell: I do think it is important, though, to differentiate between where a school may be low performing and not. Math is different from reading is different from graduation rates is different from growth. So, I’m glad that you have agreed with the notion that it’s not just the single summative score, that’s it’s the –
Michael J. Petrilli: I concede on that point.
Phillip Lovell: That’s great.
Michael J. Petrilli: It might be worth though, Phillip, just spending a few minutes here talking about why it’s important to have these external ratings in the first place. Because I could understand, if you’re an educator, let’s say you’re a principal. You might say, “Why are you trying to rate me?” Or, “Why are you trying to put all this pressure on me?” And I can get that.
What you want, you want data that you can use to build an improvement plan. You’d like to see how you’re doing compared to similar schools or if there are some lessons to be learned by other – maybe there’s a school near you that’s really crushing it on one of these indicators and you want to know that. But, say, “Just give me that data. You don’t have to hit me over the head with a grade.” My response would be that, look, in any part of our society, certainly in the public sector, we have this challenge of how to hold big institutions accountable.
In the private sector we have the same problem. You go in, there’s big bureaucratic companies that provide poor service or where the quality is uneven. It’s not like it’s perfect there. But at least in the private sector, there is this thing called competition. That, you hope, that if it’s working, if the market’s working, whatever the field, whether it’s restaurants or insurance or whatever, people – these companies have an incentive to keep their customers happy, to earn more profits.
That forces them to try to keep focusing on getting better and to make difficult decisions. If there’s an employee that is not doing what they need to do or is in the wrong role, to address it rather than just look the other way and work around it. If they have a vendor that’s not doing what they need to do, to actually address it. To make the tough decisions. And my theory, Phillip, is that basically every organization in the history of the world, of any size, over time becomes complacent and becomes more bureaucratic and just the performance lags unless there’s some countervailing force.
Again, the private sector, competition. Doesn’t always work. We all go into bad restaurants. But it’s a reasonable force. How are you going to do that in the public sector? It’s a real challenge.
How do you make DMVs better? How do you make police departments better? How do you make public universities and hospitals better? It’s not easy thing. But the hope is by putting some pressure on these institutions, and this accountability for results is one way to do it.
To say, we’re going to look at all the data. We’re going to try to do it in a fair way, as you described in our report. And we’re going to come out with a judgement about how you’re doing. That this is going to put some pressure on you. It’s also going to give the superintendent or the principal some leverage to say, “Hey, if we don’t want to get a bad grade on this report, which is going to be a negative for us in a variety of ways, we need to make some changes.
“We need to fire the textbook vendor that we’ve been using and go in a different direction. We need to address low teacher performance that’s not working. We need to stop investing in this technology, the thing that’s not working, and invest in this instead.” That’s the hope.
Phillip Lovell: So essentially, you’re seeing the letter grades as giving somewhat of both the extra external push to make change and sort of the sense of urgency around –
Michael J. Petrilli: Exactly. And there’s evidence from the No Child Left Behind days that this pressure, even though the law had all kinds of problems and the design looks archaic now by our standards, it did work. In the late ‘90s and early 2000s when states adopted consequential accountability systems, then No Child Left Behind came along, those early adopter states, they saw improvements in performance. The other states saw improvements once No Child Left Behind happened, especially for the lowest performing kids who were the focus of those reforms.
So, some of it is that it can work. Now look, if we had every single school in our country, 100,000 schools or whatever we’ve got, each one of those had a phenomenal principal who woke up every day asking, “How can we improve the school? Let’s look at our data. Let’s galvanize change.” We wouldn’t need any of this.
But, that’s not the world we live in. I’d say, that’s not the world in any field. You just don’t have that kind of amazing leadership in every line position that you would necessarily need. You need something else. The hope is that the push gets us there.
Look, for those schools that have great leaders, it is overkill. They don’t need it. They’re going to get better anyway. There’s way too few of those schools.
Phillip Lovell: You’re definitely right about the need for capacity to be enhanced at all levels there. But, to me, the value of the dashboard type approach is that whether you’re doing well or whether you’re doing poorly, it’s important to know. Whether it’s grad rates, reading proficiency, growth, it’s the –
Michael J. Petrilli: But Phillip, you’re going to have – every state’s going to have a dashboard. Right? They have to under the law, publish data on every single one of their indicators. So, they’ll have it.
Phillip Lovell: The downfall, I think, or part of the downfall of the summative score is the way in which high performance on one indicator, as you were saying, can mask low performance on another. We see this all the time, say, with graduation rates and achievement. Where you’ve got – we’ve looked at state data. In one state, for example, there are 72 high schools where looking at African American students, the graduation rate for African American students was in the 80s. Pretty high. Not perfect, but pretty good.
Their math proficiency was in the 50s or below. That’s not so great. You put these things together and you add together a bunch of other indicators. Again, what do you come up with? What would be your counsel to states to ensure that if you’re going to move to a letter grading system, that you don’t have a scenario where high achievement on one thing masks low achievement on something else. Because then you don’t really have the transparency that we agree is really important for these systems to be communicating.
Michael J. Petrilli: So your worry would be even though the grad rate or let’s say the math proficiency rate is reported, it might be buried and could be –
Phillip Lovell: So let’s say you get a B and it’s unclear. So if you get a B, general thinking is, “Okay, school’s decent.” What’s not then – even though there’s data somewhere else that you have to click around for, the focus gets on the B. So then, the pockets of low performance, they get missed.
Michael J. Petrilli: First, I’d say, so, states have to set priorities. So let’s imagine that a state like Ohio decides, “Okay, we’re not going to have just one rating. We’re not to just have a dashboard. We’re going to have six grades that we really want to have attention to.” So you’ve got to think through, which of those do you want to have? Do you want to have graduation rates be in one? Do you want to have achievement be one?
Maybe you want to have it so that you have for a high school, graduation rate. That is an important measure. Achievement, so you know that, well, these kids are graduating, but they’re graduating with pretty low proficiency levels. But then also growth, because growth is something they can actually control. That might be the way to do it.
But go back to your example. I would argue that these high schools, if they’re proficiency rates are low. But let’s imagine that their growth rates are high. And their graduation rates are high. I would argue that they’re some pretty darn good high schools. That, again, if you’re a high school, especially, because you’re at the end of the education system, you are going to have kids coming in way below grade level. And you have no control over that.
Phillip Lovell: But the issue though is that with the single, summative score, whether they’re growing or whether they’re not growing, could easily be missed based on other things in the system, if you’re not looking at – if you don’t have an Ohio-esque approach where you’re looking – where it’s front and center and actually – we’ll actually show a slide from an Ohio school because of the point of agreement. I think that they do a good job of not just focusing on the single, summative score. You can see fairly clearly where each –
Michael J. Petrilli: And look, it does mean that if you’re going to have a summative score like Ohio plans to, you better make sure that it’s putting the weight on the right stuff. Again, that’s the focus of our report. Yeah.
Phillip Lovell: Interesting though, with your report, even you guys didn’t give each state just a single, summative score, which I respected. I thought it was interesting, given the argument being made. But you gave schools strong, medium, weak. I thought that really spoke to the need for some nuance and that it’s hard to – I’m not terribly worried about being fair to adults in the system. Obviously, we should be fair. But to me, the ultimate concern is, are we being clear and accurate? To me, the clear dashboard, the multiple ratings, you get to both clarity and accuracy. Whereas, if you just have one, you lose the accuracy for the clarity.
Michael J. Petrilli: But you are arguing, I think, like I am, for maybe a handful of grades.
Phillip Lovell: I think –
Michael J. Petrilli: That’s really different than a dashboard with a hundred boxes on it.
Phillip Lovell: I agree the hundred boxes with colors here and colors there may be useful for some things. Not for transparency for most parents and the public. I think that should all be publicly available.
Michael J. Petrilli: Yeah.
Phillip Lovell: But I agree with you, it’s not helpful to have to search for, “Are students that look like mine or children that look like mine graduating?” That needs to be clearer. I think that it’s putting that into some context, whether it’s a letter grade or five-star system. I think that’s okay. What really worries me is when we have the focus on just a single summative and there’s not more than that. because I think the single summative sort of sucks the air out of the room, sucks the attention off the page.
In fact, so, I have an example of this particularly when it comes to student subgroups, with historically underserved kids. When we go to the PowerPoint slides. So, this is an example we wanted to – this doesn’t show the name of the school, it doesn’t show the state that it’s from. But we wanted to illustrate this point. This is an example, again, of a high school where they’re rated an excellent in 2015. We looked. They’re also rated an excellent for the year following, they didn’t have ratings yet.
So when we look at this, excellent school. Good on growth. Things are looking pretty good. But then, when you look down into the details, you see, “Hey, wait a second.” African American graduation rate: 60 percent. Not so excellent. So how do we make sure that this front page of the report card, which shows an excellent school, really reflects all kids, including historically underserved kids? When we had the – under the original Obama regulation, there was a requirement saying that if you had a group of consistently underperforming students, the letter grade had to be reduced.
Some states do a decent job – in some cases, more than a decent job of insuring that historically underserved kids are represented. Tennessee, 40 percent of their letter grade is based on student subgroups. In Louisiana and Illinois, you can’t get the highest score in the system if you have a consistently underperforming group of kids. But a lot of states don’t. So what would be some advice you have for schools to avoid the situation in which it says excellent front and center, but then you peel back the onion and it’s not just a small problem for historically underserved kids, but you know, 60 percent graduation rate, that’s a big deal.
Michael J. Petrilli: Look, I think that the basic insight is right. It makes sense to have a letter grade drop if certain subgroups are not performing well. I would question – I’m curious about why this dynamic works, just in terms of the math of it. How is it that the school is able to get –
Phillip Lovell: So it comes out to how things are weighted. In most systems, individual student subgroups don’t actually count in their weighting. As the law actually says that they’re supposed to. But the way this is playing out and the department of ed, I don’t think really has been looking at this carefully enough. Then, what’s interesting is that when you look at the math, even if you do, so let’s say you take your four required indicators, then you have math and reading as part of that. So it’s really five.
If you divide each of those five into the required subgroups, the performance of each subgroup on each indicator matters for less than three percent of an index. So at that point, the math speaks against the single, summative score because then, your low-performing African American kids are another subgroup. It’s just mathematically not going to show up in the letter grade.
Michael J. Petrilli: Or, another way to say it is, I suspect that this must be a school where the African American subgroup is quite small.
Phillip Lovell: Well, it’s about ten percent of the population.
Michael J. Petrilli: But that’s small. If it were larger –
Phillip Lovell: How does ten percent of –
Michael J. Petrilli: But if it were larger, then the graduation rate as a whole would have been dragged down more. That would show up in the rating. I understand. That is – I think you’re going to be able to find some schools like this. I think they’re going to be relatively rare, in part, honestly, because of the nature of our school system being so segregated and so – that most of the schools, especially that serve low-income kids and kids of color are highly segregated. They’re high poverty. They’re –
Phillip Lovell: What’s interesting though is that, it’s actually not the case. A couple years back, education trust actually – and they’re big summative score proponents – they did a report and found that in one state, a different state, in schools that got an A, on average, African American students proficiency rate: 58 percent.
Michael J. Petrilli: Okay. Let’s pause on that. Why are we judging schools by proficiency rates? Why? This is – if you ask any scholar out there, anybody who studies evaluation for a living, how do you evaluate schools? I guarantee you, 100 percent of them will say, “Do not use proficiency rates.” Because those proficiency rates are measures of anything that happened in that kid’s life before they got to that test. And if you’re especially a middle school or a high school, it has a huge amount to do with the schooling they got before they got to you.
Phillip Lovell: All of that is true –
Michael J. Petrilli: If they did the same analysis and found that there were a lot of schools out there that were getting As where the African American kids weren’t making any progress, value added or student growth percentiles, however you measure it, I would be concerned about that. I would be. But the proficiency rate, this is where, Phillip, I feel like a lot of reformers, we end up guilty of just not being good at math. Also there, the irony is striking, having utopian goals.
Which is to say that we are at a time when we’ve set the standards much higher. When about 40 percent of kids are meeting the standard, which is about the same level of how many kids are leaving our system ready for college. So we’ve set that bar at quite a high level. The percentages are much lower for poor and minority kids for a million reasons that we can talk about. We’re all working on changing that.
But to say that a 58 percent proficiency rating for African Americans in bad, well, actually, that’s better than the national average for kids as a whole. And it’s probably three times as high for African American kids.
Phillip Lovell: But if you were to tell a parent that their school is good, it’s an A school, but four out of ten – tell an African American parent – but four out of ten students who look like your child, they’re not reading at grade level.
Michael J. Petrilli: Again, proficiency no longer means grade level.
Phillip Lovell: In the grand scheme of things. That is the case –
Michael J. Petrilli: Four out of ten are not on track for college. Right?
Phillip Lovell: Yeah. Is that – to me, that’s just – it’s clear, but it’s not accurate.
Michael J. Petrilli: Well, see, all right. It gets you back to this idea of saying – let’s say then, we’re all agreeing to have a couple of grades. So that school gets an A for growth and it gets a C for achievement. Then, the parent says, “Well, what does that mean?” Then you say, “That means that the population is achieving so-so in terms of how many kids are hitting this proficiency level. But, man they’re making a lot of progress over time.”
That, to me, is an accurate read on what’s happening. I certainly don’t want that school to get only a C or a D because, Phillip, then you’ve told the school – they figured this out in No Child Left Behind. They said, “Look, no matter what we do, we work miracles in this school and the best we’re ever going to get is a C or a D because we’re helping kids make huge progress, but they’re starting three grade levels behind when they get to us.”
Phillip Lovell: So I definitely agree with you that we should be valuing, rewarding, promoting, and expanding schools that can show that they’re taking students from two to three years behind and then bringing them up to grade level. But it does concern me. We have tons of stories of students who are valedictorians of their class and then you go to post-secondary, go to college, and they need remediation.
Phillip Lovell: This is why – again, an important distinction for reformers out there, Phillip, is – I made the case again and again that growth is the way to evaluate schools. But when it comes to kids, it is hugely important that they and their families know how they are doing against these benchmarks. So for them, they need to know growth, too. They want to know a trajectory of progress every year. But they really also do want to know, are you at this on-track level or not? How much progress will you need to make to get caught up? Totally. So that you don’t have kids that are valedictorians and think they’re on track ending up in remedial education.
But we have to think about these things differently and be smart about it or else, you know what’s going to happen? We’re back in the same trap with No Child Left Behind, where people say, “Well, we have to set the goal at 100 percent proficiency because otherwise, anything less – do you want anything less for your own child?” You know? So we’re back into the utopian land where we set goals that are unachievable or we create this dynamic where we say, “Well, the only way we’re going to get to 100 percent is if we define proficiency down. So now, we can get 100 percent of kids to this minimal level proficiency. So we’ll do that.”
So, if we want to maintain high standards, we got to accept the rep 40 percent of kids reaching those high standards. Our goal –
Phillip Lovell: There’s a huge disconnect between – which I think we’ll agree. There’s a huge disconnect between the percentage of kids who are graduating, on average 84 percent nationwide and then you look at the college maturity rate and, like you’re saying, it’s in the 40s along with the proficiency rate. That gap is hugely problematic.
Michael J. Petrilli: So we want to increase that percentage. The worst thing we could do right now is to say, “Unless you’re getting 100 percent of kids to college and career readiness, you’re not an A school.”
Phillip Lovell: So I agree with that. By and large, that’s not what’s happening in the state systems. The states’ goals are by and large, I would say, pretty high. You would say utopian. But progress against those goals doesn’t actually matter on assessments.
Michael J. Petrilli: That’s why we didn’t pay attention to the goals. That’s right.
Phillip Lovell: So, the goals are aspirational. It says we care about all kids and we want all kids to achieve. A number of states didn’t even base their goals on prior performance data. It was really just an aspirational exercise. And we asked, in our assessment of state goals – we, even if a state’s goals were, say in the 60s, but it represented monumental growth over time, we weren’t going to ding them for not then saying, “No, actually, you have to have 90 percent.” So, totally understand that.
I think the issue though is that within these systems, if it all comes down to the letter grade, that letter grade really has to mean something. What we’re finding is that because of the math, the student subgroup performance isn’t well reflected in that ultimate letter grade.
Michael J. Petrilli: And I hear that. I think there’re going to be some schools with relatively small percentages of – where the African American kids are ten percent of the student body where that risk is there.
Phillip Lovell: What we’ve seen in the data – and we’re all going to those data wonks and policy geeks amongst us and we’ll be diving into how this works when the state systems that are being approved now actually roll out. As we’ve looked at this information in the past, it definitely varies. So in the state that was represented in the example we gave, that scenario of high letter grade but low grad rate for a subgroup, that was ten percent of the high school. So, it was a substantial number. We’ve seen similar data for other states where it’s like half of the schools have this scenario.
This idea of the segregation amongst the schools is definitely true. But it’s also true that – I forget what the exact number is – but there are a whole lot of school districts and schools that are still predominantly white. So, this dynamic really plays out that if we’re really going to value all kids and put emphasis on all kids – outside the major metropolitan areas, then – and we’re going to have these letter grades, we’ve got to make sure they count for all kids.
Michael J. Petrilli: Right, right.
Phillip Lovell: So why don’t we get to some questions that have come from our audience. We appreciate when folks weigh in with questions before we even have the webinar. So, first, this question comes from Mike from Chicago, similar to a question we received from Andrew from Phoenix. We’ve talked about this, so we can just summarize. Mike asks, “What effective alternative other than the A to F methodologies exist to evaluate the effectiveness of K to 12 schools?”
Andrew from Phoenix asks, “Are we arriving at this dichotomy for the topic: either no grade or a single grade? What are your reviews on the dash up concept involving several indicators?” I think by and large, we actually agree that the best case – I think we disagree on the value of the ultimate summative. But we agree on the value of multiple ratings for multiple indicators, so long as you don’t get like 15.
Michael J. Petrilli: I’d say a handful.
Phillip Lovell: A handful.
Michael J. Petrilli: Imagine a kid’s report card as, usually, five or six subjects. A high school report card’s like five or six grades.
Phillip Lovell: So actually, if we can show the slide real fast. This is an example of a school in Ohio. So, hats off to our friends in Ohio for the clarity here, where you can see – and what’s interesting, so this is school grade coming in 2018. My argument would be, I’m not sure how much value the school grade actually adds because you see here, there’s an A for graduation rate. There’s a D for achievement. There’s a B for progress. So, what’s the school grade going to end up being? Is it a C or a D?
Michael J. Petrilli: Right. All the schools in the state are going to get a C.
Phillip Lovell: Right. So like, what does that actually mean? Here, because you’ve the different component, I actually think that it’s both clear and accurate. Now, the one downside, I would say, here – because there’s only so much that can fit on a single piece of paper – is that the performance of historically underserved kids, you’d have to click for more information. I wish that there was a way to clearly illustrate that. Oh, wait a second. There might be.
So here, this is a dashboard that we developed here after looking through some state examples, where here you can have that single, summative score for different indicators. It’s not really a single, summative score. We included the school rating out to the top left, intentionally smaller than the other items. But you can see clearly how subgroups are doing in these different areas. Now, incidentally, this data comes from a real school and from pre-ESSA.
The C in the grad rate, the C in the math proficiency rate, the A in the reading proficiency, it translated into an A. So again, come back to – I know you were saying, where’s growth here? But it, to me, just goes to the point of, if you focus on that single summative and leave out these other points –
Michael J. Petrilli: Can I wonk it up one more level?
Phillip Lovell: Please do, please do, yes.
Michael J. Petrilli: So, looking at this screen, the one other tricky thing is: let’s say we do want to do more growth, well, once you’re looking at growth and you’re looking at scores from year to year, you need an even bigger data set in order for those growth scores to be valid.
Phillip Lovell: And more testing.
Michael J. Petrilli: And more – so here’s the challenge. You want to move to growth, but then, if you want to look at, for example, growth for African American students, if that’s a small group of kids, that’s not going to be reliable. You see this in Ohio, where we’re not capable of breaking out value-added for subgroups, at least a lot of the subgroups. So, the very schools they’re worried about –
Phillip Lovell: Which speaks to another wonky issue, is that I think we need smaller n sizes so that you can –
Michael J. Petrilli: But some of it is just because the data that they’re bouncing around to. It’s the same reason why value-added at a teacher level is problematic, because there are not enough kids, elementary especially. So, if you want to look at growth and you want to look at subgroups, it’ll work if the subgroups are big enough. But if the subgroups are big enough, that’s going to impact the overall growth score anyways. If the subgroups are too small, you probably aren’t going to be able to get a read on it. So it really, in theory –
Phillip Lovell: You’ve got these tradeoffs. So many of these state systems, it’s growth of all kids. It’s not growth of subgroups.
Michael J. Petrilli: Right, right.
Phillip Lovell: So if the population is big enough, it’ll have an impact. If it’s not, then, you know. So, just a reminder that if you’d like to send a question – we don’t have that much time, but please do send your questions to firstname.lastname@example.org. So, a few other items here. This one comes from Michelin, from Quintin township Michigan. She asks, “What determines the criteria of a school’s grade? Is it our state legislature or is it the federal legislature?” And also, “If a school gets a bad grade, like a C or lower, how can they improve?”
Michael J. Petrilli: Hmm. Wow. How much time do you have? Good questions. So, first of all, the law that was drafted and enacted by Congress sets, as you said at the beginning, a bunch of guidelines, some parameters. So, they have to include student achievement and they can include student growth. They got to look at student graduation rates. They can – so, the basic infrastructure is set by Congress.
But then, this law gives a whole lot more flexibility back to the states. So the states get to really flesh it out. In most states, the process was led by the state department of education and the state superintendent. In some states, though, the legislatures did get involved and got engaged in passing laws saying what they thought should be in these systems.
Phillip Lovell: ____ Maryland in particular.
Michael J. Petrilli: Yeah. So, it can be a combination. Kind of depends on the state, who gets to decide. Basically, if the legislatures decide they want to be involved, then they –
Phillip Lovell: They’re involved. Yes.
Michael J. Petrilli: As you go forward. Now, look, in terms of the low performing schools, this is a whole ‘nother topic we haven’t really talked about, which is part of the conversation with accountability. Which is, if you have a chronically low-performing school, under the law supposed to be labeled as needing an intervention, then what? States are supposed to say what their plan is for those schools. Most people that have looked at that part of the plan have come away pretty disappointed. Now, I think the states had an incentive to just say very little here.
Phillip Lovell: They weren’t required to say very much. And the difference between what’s on paper and what actually happens can be pretty significant. So it’s –
Michael J. Petrilli: And let’s be honest, Phillip. There’s not a lot of great examples to point to of things that work for turning around low-performing schools.
Phillip Lovell: So there – that’s – there’s a whole – there’s an additional series of webinars there. I actually think that we’ve seen many examples of success. I point to New York City and their small schools of choice and linkthring in California. But I think that the – to me, the issue is that we haven’t really been able to take what’s working in different places, learn from it and – not replicate it – every place is different. I don’t think you can necessarily replicate it. But sort of apply what’s –
Michael J. Petrilli: Take it to scale. I mean, I guess what I’m saying though, Phil – what should the state of Michigan say it’s going to do for low-performing schools? Now, I have opinions about that. I, too, would say, “Well, I think what Louisiana did with its recovery school district looks pretty promising, where you create a district where you house all the low-performing schools. You get rid of the teacher union contract. You get rid of the rules and regulations that are getting in the way. You get more resources in. promising.”
I think what Texas is doing, where they’re saying, “Hey, what we want to do, at least with some of our federal money, is start brand new charter schools in the communities that have lots of low performing schools.” Not doing turnarounds, but saying, “Let’s create some high-quality schools from scratch that can be better alternatives than what kids are stuck in today.” So there are opinions about that. But I don’t think – I think we just have to be very humble that when it comes to what –
Michael J. Petrilli: What to do with chronically low-performing schools, nobody has completely figured that out yet.
Phillip Lovell: Yeah. Approaching this with a healthy dose of humility is definitely the way to go. So, Tray from Phoenix asks – you mentioned value-added earlier and this question gets to that. He says, “I’m curious to hear the panelists’ thoughts about the inputs that go into the school letter grade. For example, in many states, teachers of the schools value-added scores contribute to a school’s grade, even though numerous debates surrounding the purported technical issues with value-added measures. Couldn’t we argue that school grades that use these potentially faulty inputs are biased and inaccurate from the start?” I’m sure you have opinions about –
Michael J. Petrilli: I do. If I’m hearing this question right from Tray – let’s be clear to differentiate between value-added for teachers and value-added for schools. So, same basic notion, which is, let’s look at the progress kids are making from one year to the next. Some of these models are more complicated than others. They make predictions based on the kids’ trajectory or other issues. But, the issue with the teachers, by and large, is that especially in elementary schools, they’ve only got 25 or 30 kids.
Therefore, it’s not enough data to be really valid and reliable. These scores can bounce around a lot just randomly, if a bunch of kids are sick on test day or whatever. So, we don’t have the same confidence that these are actually doing a good job differentiating between different teachers. At a school level, because there are more kids, you should have enough data where these value-added scores are more reliable. Now, they still have issues. You still want to look over several years. For smaller schools you can have a problem. But, they tend to be better.
Other states use student growth percentiles, which is another way of looking at it. There’s different ways to do this. This is a technology that exists and that is worth embracing because it’s way better than the other alternative, which is just to look at how many kids are passing the test.
Phillip Lovell: So, John from Austin, Texas asks, “Expectations for 21st century teaching and learning, such as project-based learning and student-initiated instruction seem to be at odds with high-stakes teaching, teaching to the test. High-stakes teaching on the other hand, seems to be required to meet the expectations for school accountability performance ratings. Can you explain how schools can incorporate 21st century teaching and learning expectations while also meeting performance ratings that are directly tied to high-stakes testing?”
Michael J. Petrilli: Well, I note that this question’s from Texas. I don’t think that’s completely out of line because, look, Texas, unlike most states, did not adopt the common core. One way to address this, to say that we want our teaching and our learning to be more holistic is to move to standards that themselves are expecting kids to be able to do more than just regurgitate stuff. To be able to write well, do research well, to deal in math with conceptual understanding and all the rest that’s in the common core. So, Texas, you should adopt the common core. We’ll see when that happens.
Then, the assessments in many states are much better. Now, that gets you so far. The same with when we talk about high schools and say, “Well, if you’re teaching to a test, that’s not great. But, if it’s a good test, like the AP exams, where kids have to really show what they know, both on multiple choice kind of things, but also in essays or other, then we feel like it’s signaling – pushing schools in the direction of pretty good teaching.” Now, it still might not be enough for people who are believers in project-based learning. And there, the question is again, can you come up with other ways to validate quality work in those kinds of domains?
There are efforts out there. New Hampshire, for example, has been experimenting for years trying to develop different kinds of assessments that look more like portfolios, more project-based. The challenge is, they can be expensive and it’s also really hard to get them to be valid and reliable.
Phillip Lovell: It’s an issue of scale. I don’t think that the push and pull between standardized testing and emphasis on what some of us call deeper learning, higher level thinking skills – I really don’t think that there needs to be a dichotomy, one pulling at the other. More like you were saying, park and smarter balanced we have assessments that measure higher thinking skills in a way that’s much more accurate, sophisticated than we had under NCLB. There’s actually, written into the law, states can incorporate projects, portfolios, performance assessments as part of that assessment. The issue really is one, as you were saying, of validity, reliability. How do you do it at scale in such a way that you won’t have some kids in rich districts that are constructing airplanes and you have kids in lesser resourced districts that are making paper airplanes and calling that the project?
So in moving in this direction, I know everything takes time. I think that states have really done a good job of incorporating these measures of college and career readiness. They definitely span the gamut. Some states do a better job of this than others. Like, Louisiana, Tennessee are at the – have some good measures there.
But pushing this idea, the other thing that I think – I know you’ll agree with this – that will push the accountability system is pushing this direction, states that are providing credit for not just proficiency but above proficiency. Which is, I know one of the items that was in your report.
Michael J. Petrilli: Really interesting study that just came out. We wrote about it in our weekly newsletter, the Education Gadfly, available on our website at excellence.net.
Phillip Lovell: There you go.
Michael J. Petrilli: Fascinating study out of Michigan, really encouraging, where they were able to look at basically value-added by test scores of high schools. They looked at the 8th grade test scores for kids versus the 11th grade test scores. What they were able to find was that the kids who – the high schools that did better in terms of value-added, their students went on to do better in college. After controlling for all kinds of things. Now, the worry was that if high schools are focused on beefing up, getting those reading and math scores up, they would ignore the deeper learning stuff and kids would not do as well.
Well, it turns out – at least in this one study – the things that these schools are doing that are helping kids read and write and do math better are also helping them prepare for more success in college. And, not just in English and in math, but in all subjects. Even found that kids who went to better high schools, judged by value-added, did better in welding. To me, that indicates that these tests, while imperfect, are measuring some batch of skills that matter in the real world.
Phillip Lovell: Interesting. Definitely. So, James from Pittsburgh, Pennsylvania asks, “Pennsylvania has elected to use a dashboard model rather than a summative rating. What should parents look for in the dashboard to know if their neighborhood, magnet, or charter school are providing a quality education?”
Michael J. Petrilli: That’s a good question. I’m going to sound like a broken record. I would focus mostly on growth. If there’s a growth score, a value-added score, I would hone in on that. Texas has an interesting model, by the way.
They basically have said, “What we’re going to do is we’re going to look at growth and we’re going to look at achievement. Whichever one is higher, that’s the grade you get.” What’s kind of cool about that is if you’re a pretty affluent school in the suburbs, your achievement’s probably pretty high. And everybody thinks you’re a great school. These are the schools that everybody clamors to get into. So, fine, you get an A.
Maybe your value-added, your growth isn’t that great. But you know what? It’s harder when you’ve got a lot of kids at the top anyways. They might be topping out of these tests or they might – and it’s fine. They’re still – nobody’s going to argue that these are bad schools. All right? So that’s fine.
And if you’re a high poverty school and your achievement’s not great, but your growth is amazing, then you are considered an A school. Again back to my KIPP analogy in Ohio, that that school gets an A. I think that’s actually pretty interesting. Now, we have some concerns about it for other reasons. I’d also say, if you’re talking about a pretty affluent school, yeah, the achievement stuff is worth paying attention to. If it’s the high poverty school, growth, growth, growth.
Phillip Lovell: Gotcha. Well, with that, I think they will bring today’s webinar to a close. Well, this has been, I think, an important discussion, one that will be continued in the halls of Congress, state legislatures, departments of education, even some dinner tables. I want to thank you very much for joining us today. We will be having, I’m sure, many more webinars.
Hopefully if you’re watching this, you’re on our list. Know that this webinar will be archived at www.all4ed.org/webinars. I’m Phillip Lovell for the Alliance for Excellent Education. Thanks again to Mike Petrilli. Enjoy the rest of your day.
[End of Audio]
Welcome to the Alliance for Excellent Education’s Action Academy, an online learning community of education advocates. We invite you to create an account, expand your knowledge on the most pressing issues in education, and communicate with others who share your interests in education reform.
or register for Action Academy below: