Assessing Deeper Learning: The Ohio Performance Assessment Pilot Project
January 16, 2014
3:00 pm – 4:00 pm EDT
Having trouble viewing the webcast? Contact Technical Assistance here.
The Alliance for Excellent Education Invites You to Attend a Webinar on
Assessing Deeper Learning: The Ohio Performance Assessment Pilot Project
Mariana Haynes, PhD, Senior Fellow, Alliance for Excellent Education
Stuart Kahl, PhD, Founding Principal, Measured Progress
Lauren Monowar-Jones, PhD, Program Coordinator of Performance Assessment, Office of Curriculum and Assessment, Ohio Department of Education
Please join the Alliance for Excellent Education for a webinar on using curriculum-embedded performance measures to learn and demonstrate deeper learning competencies students need for college and a career. The webinar will focus on the Ohio Performance Assessment Pilot Project (OPAPP), which includes a system of learning and assessment tasks aligned with the Common Core State Standards. Ohio has taken a unique approach to the pilot by including sustained, collaborative professional learning throughout all components of the program, including formative assessment to support student learning, technical training, and writing and scoring of assessment tasks.
The panelists will explore the use of performance tasks to elicit and assess complex thinking and communication skills. They will examine what this means for designing curricula and varied structures for professional learning to provide teachers with the knowledge and skills to help all students attain high-level cognitive and intrapersonal skills. Panelists will also address questions submitted by webinar viewers from across the country.
Register and submit questions for the webinar using the registration form below. After registering, you will receive an email confirmation. Please check your email settings to be sure they are set to receive emails from firstname.lastname@example.org.
Please direct questions concerning the webinar to email@example.com.
If you are unable to watch the webinar live, an archived version will be available at all4ed.org/webinars-events/ usually one or two days after the event airs.
I’m Mariana Haynes, senior fellow for the Alliance For Excellent Education, a non-profit policy and advocacy organization based here in Washington, D.C. We are delighted to have joined us for today’s webinar on using curriculum-embedded performance assessments to help students learn and demonstrate deeper learning competencies. These competencies include a deep understanding of content, and the ability to use that knowledge to think critically and solve problems. It also means that students will need the ability to communicate effectively using a variety of media, the ability to collaborate with their peers, a capacity to reflect on one’s learning, and the appropriate mindsets that foster learning.
So we’re going to learn about the Ohio Performance Assessment Pilot Project, or OPAPP, and how it is designed to elicit and assess deeper learning competencies. Ohio’s taken a unique approach to piloting curriculum-embedded performance assessments. It includes a system of both learning tasks for formative purposes, and assessment tasks. These are aligned with the common core state standards and the next generation science standards. It also integrates sustained collaborative professional learning throughout all components of the program.
Our distinguished guests will discuss the design and the implementation of curriculum-embedded performance assessments that capture important dimensions of student learning. These formative learning tasks afford powerful feedback to teachers and students, providing information about where students are in their learning and where they need to be relative to specific learning goals.
So, let’s meet our guests. With us in our studio today is Doctor Stuart Kahl. He’s founding principal and former CEO of Measured Progress, an educational testing company working with over 20 states on their assessment programs. Stuart has over 35 years of experience in large-scale assessment. His current interests include assessment literacy, formative assessment, and curriculum-embedded performance assessment. In some quarters, he is considered the conscience of the testing industry. He was recognized by the Association of Test Publishers with the 2010 ATP Award for Professional Contributions and Service to Testing.
Next in our studio is Doctor Lauren Monowar-Jones, the program coordinator of performance assessment with the Office of Curriculum and Assessment at the Ohio Department of Education. Lauren holds graduate degrees in astronomy and physics, and has worked at the Ohio Department of Education for the past seven years as a science consultant in the office of both curriculum and assessment. She is an adjunct instructor that the Columbus State Community College, where she teachers online introductory astronomy courses.
If you would like to ask questions of our webinar guests, please do so using the form below this video window, and we will turn to your questions from time to time. Also, if you’re on Twitter, we encourage you to tweet about this webinar using the deeper learning hash tag that you’ll see in the left corner of the video window. So let me set the stage for today’s webinar: profound national and global changes have prompted educators to rethink the competencies students need and the assessments to measure them.
The growing concerns about how to significantly improve the quality of education for all students is generating healthy debate about the use of meaningful assessments that capture outcomes beyond simple academic content knowledge. Criticisms of traditional assessments have pointed out that much of testing seems to have little to do with learning, and looking at assessment practices in this country over the last decade makes it all to easy to say that the critics may have a point. Richard Elmore, noted professor at the Harvard Graduate School of Education, contends that the real accountability system is in the task that students are asked to do. They must know not only what they are expected to do, but also how they are expected to do it, and what knowledge and skills they need in order to learn it.
As a result, students become more self-reliant in planning how to approach a learning task and engage their understanding and progress. This urgent need to change classroom teaching and learning was underscored by the December release of the Program for International Student Assessment, known as PISA. PISA is test of reading, mathematics, and science given every three years to 15-year-olds in the United States and more than 65 countries world-wide. Since 2009, the proportion of top-performers in the United States has declined in reading and math. US rankings fell from 25th to 26th in math, and from 14th to 17th in reading. It is important to look beyond the rankings to examine what can be learned from high-performing systems regarding how they teach and assess student’s deeper learning competencies.
First, understanding what students know and can do is essential to effective teaching; and second, an abundance of literature shows that teachers’ use of high-quality classroom formative assessment used to discover what a student does and does not understand produces some of the largest effects of student achievement reported in educational literature, yet the majority of teachers, both novice and veteran, find evaluating and responding to students’ learning one of the most challenging elements of teaching.
So let’s turn to our guests to learn how states can provide teachers with the tools and professional learning they need to strengthen the connection between curriculum-embedded performance assessment, teaching practices, and student learning. So, first we’re gonna turn to Stuart. Stuart, Measured Progress has a long history in the world of performance assessment, and would you walk us through why states are moving towards the use of these kinds of assessments, and how their design can promote a higher order of thinking skills.
Okay, thank you, Marianna. I appreciate being here, and this is a topic, needless to say, that’s dear to my hear, and I think it’s a very important, also. I see my job, by the way, as being an introduction to the person next to me providing background and a little bit of a – a little bit of history – but a little background in performance assessment and curriculum-embedded performance assessment, because I think, when I talk about issues around that topic, they’re addressed by the program that Lauran has in place, and that’s real important.
Let me start by talking about the call for deeper learning. You ask why the states are moving in that direction. Well, it’s easy. There – there’s all kinds of pressures: there’s political, business, education leaders, there’re the standards – and I don’t have to go through these in great detail – there’s the common core, there’re the science standards, and so on that all call for deeper learning. And so I think that’s certainly reason number one: there’s a demand for it. The reason number two: it’s just good practice. It’s something that is not new. We’ve been here before, we’ve seen the need for having students have higher-order skills that they don’t have to the degree they need, basically, and so it’s good practice.
I should say we get diverted sometimes. Different programs lead us away from this kind of learning – the efficiency of testing with the tremendous demand for assessment in multiple grades and subjects for high-stakes accountability led us to move more towards this – the most efficient type of testing, which really doesn’t tap higher-order thinking skills as well. And so it just seems like it’s a time that we have to address the larger goals of education; and by that I mean, we know that a kid’s ability to add two and three-digit numbers with and without regrouping is not a major goal of education.
Being able to apply skills like that is, and it seems to me we should be more directly dealing with those, but in instruction and assessment. What do we mean by deeper learning? It’s not too hard to bring to you a definition there. There’s lots of them, but the point is – let’s see, the NRC, the National Research Council, had a nice way of breaking down 21st-century skills into cognitive skills, interpersonal, and intrapersonal skills. The deeper learning really fits in that cognitive skill category.
Though, when we talk about curriculum-embedded performance assessment, clearly interpersonal and intrapersonal skills can play a role when the kids are engaged in the activities that are part of curriculum-embedded performance assessment. So the definition, you can see, of deeper learning that NRC has come up with stresses the application of this foundational knowledge and these skills to real problems, and that’s the basis of it. There’re two things that I and others in my office feel are game changers.
And, it’s interesting, I pick these, because these are things that are there at the classroom level. These are things that – it’s where the action is; it’s where we want to impact things in terms of reform and so on. We wanna see teachers and students, in many cases, spending their time differently. And formative assessment is the process – it’s an instructional process. It has many steps in it, one of which is collecting evidence of student learning, and sometimes I’m concerned that my colleagues in the industry have pirated the term and people associate formative assessment with a test, and that’s not the case; it’s a process.
As a process, that means that if we want improvement in it, that requires professional development, not the purchase of tools. And if professional development is effective, then it’s ongoing, on the job, and collaborative; and that certainly talks to the needs in terms of teachers and educators. Curriculum-embedded performance assessment, another one like formative assessment, is something that if it’s done right, it must mean you’re doing a lot of other things right. My example of formative assessment was you must be doing professional development right.
Well, in the case of performance assessment, I think it’s a similar situation. It requires an awful lot of other things to be going well in terms of support within the educational environment and so on, for example. A definition of performance assessment – you can see all kinds of definitions. The one we’re thinking of in this day and age is more engaged activities that lead to products, or performances, or presentations that are scorable or something that can be evaluated for either formative or summative purposes. Curriculum-embedded performance assessment, obviously, is something that is not always at the end of something, but rather it’s ongoing – it’s during instruction.
I like to think of the curriculum-embedded performance assessment as instructional units, and instructional units, there are a series of activities – multiple activities that are both learning and evidence-gathering activities. Some of them can lead to products that might be used for formative purposes; some of them might be used for summative purposes. And so that’s kind of my view on what curriculum-embedded performance assessment is, is really – these are instructional units. Where are things happening? I – there’s a lot of places where good performance assessment is going on; I think mostly at the school and district level.
There are states engaged in this, and more and more are moving towards it; they’re still developmental. I don’t have to talk about OPAP, because you’re going to hear a lot about that, but that, to me, is the model; it’s just a wonderful program. Massachusetts is in the process of developing something that might be similar, but they’re really just beginning. At the school and district consortia level, there’s a quality performance assessment from the Center for Collaborative Education in Boston that I think is a collaboration of schools in three states that have truly made a tremendous commitment, not just to dabble in performance assessment, but to make it a major focus throughout the school year.
And the New York performance standards is another; that’s a collaboration of, I believe, 28 schools in New York, and you can go on their websites and see what kinds of things are up to. I’d like to spend the last two or three minutes talking about issues in performance assessment in the authentic assessment era. This was – call it the 1990s. This was the highlight of my career, because I thought there was some wonderful stuff going on. But at that time, we didn’t do things as well as we would do it now, because we’ve learned a lot, and one of the issues had to do with the content quality.
A lot of the programs back then might have been portfolio assessments on a large scale, and a lot of that was left up to the teachers to decide, without a whole lot of training and background, what goes in those portfolios. And you saw – we saw worksheets, we saw test scores, we saw all kinds of things, and that was a problem. And the second problem that related to the content was, there wasn’t that concern that there is now about alignment of tests to curriculum to standards, and so on –
So a lot of those activities were fun and games, but not really tackling important concepts and skills and so on. And it just wasn’t that attention that there is nowadays to – efficiency, again, back then, expensive, time consuming, all of that, well, i should get the next presentation your going to see. That’s dealt with because if it’s part of the instruction, if it’s instructional units, it’s efficient in that way. And then there were lots of misconceptions that — this probably had a significant impact on the status of performance assessment back around the turn of the century. Misconceptions, we’ll talk about briefly. Then i’ll end there. Too time consuming. They were time consuming, and – and that was a problem. But if it’s curriculum embedded, these are the lessons people will be using. They’re not additive as this next concern mentions. There was concern that there was an additional commitment disconnected with the required curriculum. Well, that’s addressed, too, with the new sepa systems. Less reliable than multiple choice. I could talk for hours on this with all kinds of explanations and so on. This is something that drives me crazy. It’s just not true. I can get the same reliability with eight to ten constructed response questions as a 50-point multiple choice test. And that’s fact. That’s easily demonstrated. And it’s in technical manuals of all the state assessments and so on, ones this use both modes of assessment. So it’s just myth. And yet, you hear that concern. A lot of it has to do with the fact that human scoring is not perfect. Well, when you guess an answer right on a multiple choice question correctly and you get a point, you got a point for not knowing the answer. And there are all kinds of reasons why multiple choice item for item is not as reliable as a more extended task that you might get in a non-selected respond format. So basically that’s just myth. Human scoring that subjective. The same thing. Human scoring doesn’t have to be perfect. It’s a question of how many of these things you have to get the reliability you need. And how good a sampling of the domain also. The sampling of the domain is another issue, and that’s why many states have mixed models. Performance component as well as – as a selected response component. So basically these are things that — i mention them now because we’re at a critical time for this type of assessment. And i — i think it could go either direction because these are still issues people are going to bring up.
Thank you. >> thank you so much. It clearly seem that all of us in terms of practitioners, those that are — serve as district and state leaders, policymakers, have a long way to go really understanding assessment well. And just need to work on our own assessment literacy to really do justice to the kinds of professional learning and the kinds of tools that students need to achieve these very ambitious standards. So something for all of us to take heart, that we need to educate ourselves around many of these issues that do have a certain amount of complexity to them.
So as i mentioned at the top, i’m really delighted, Lauren, that you’re with us today as the head of this very exciting pilot that stuart has lauded. Before we start, we have this video that you were so gracious to provide. And it will give us an opportunity to hear from those doing the frontline implementation on the impact of the pilot on their teaching practice which is fundamental to all of this. We’re going to roll that now, and then we’ll come back to you, Lauren.
lauren, thank you. How has professional learning been in the pilot and the instructional practices?
>> thanks for having me. I appreciate being here. The ohio performance assessment project is a pilot that’s come a long way. We started in 2008 with the foundation money, working with stanford on curriculum embedded performance assessments. At that time, we were really just interested in learning whether performance assessment could work in ohio and how it might pan out. And toward the end of that funding cycle, about the time we were awarded the top grant for ohio we came up with a different model that we’d like to try. I’ll talk about that in a minute.
Before i get to the model, i’d like to tell more about the pilot and where we are so far. One of my colleagues describes the project as a large-scale assessment in miniature. It has all of the elements of a large scale assessment with external review committees and a vendor that writes items and effects in the department that review them. So just to give you an idea of the scope of the project, we’ve engaged teachers in grades three through 12. We’ve written tasks in math, english language arts, science, social studies, and career tech pathways. We’ve piloted 140 learning tasks and 148 assessment tasks, and i’ll explain the difference between the two in a moment. We’ve trained 686 teachers to provide instruction using the project model and trained as many teachers to score these tasks. We’ve scored 23,438 student responses, always scoring them at least twice. We’ve developed teacher training materials for in-person sessions and for electronic on-demand sessions. And we’ve documented the project using an outside connection and used a videographer to get the lovely video of the teachers. Generally we’ve experimented with the aspects of a large scale assessment system. The task dyad learning system is the model we’re using. And there’s a schematic of it in this slide. The model we’re piloting is called the learning system and the d a d doesn’t stand for anything, i get asked that a lot. It’s a pair of closely coupled tasks comprised of an assessment task that’s used to measure performance against the content standards and a learning task that’s used to provide a context for the student to learn the skills that will be assessed. The learning is comprised of instruction, often direct instruction, student practice, often using the same software interface as the assessment. And the teaching practice of making observations and giving additional instruction or feedback to either the collective of students or to individual students if necessary.
The intent is to ensure that all students have had the opportunity to learn that which will be assessed. The objective of the model is not to catch the student not knowing during the assessment but to catch the student showing that he or she has the skills implied by the content standard and instructed with the aid of the learning task. To reiterate the relationship between the learning and the assessment task is like the relationship between what you do with a driver’s permit and driver’s license.
So that analogy is one of my favorite analogies is that you go to the driver’s motor vehicle bureau i guess, we call it the bmv in ohio. And you take a test on the rules of the road. And once you pass the test showing that you know the rules of the road, you’re then given a permit in which you should practice driving. If you have brave parents, they will go with you driving, and usually you will practice things that will not at all be on the driver’s test. In ohio the test is on a closed course. Nonetheless, our students don’t practice on a closed course. They practice on real roads. The reason for that is primarily to prepare them for what the real application of that practice will be. That analogy is similar to the learning task and the assessment task relationship. In the learning task, students will practice things that may not be on the assessment test but that will make them good drivers so to speak. And then the assessment task is going to assess something that they have practiced.
So let me give you a little bit more information about the scope of the pilot which is massive to say the least. So i’ll talk briefly about each component. So far we’ve successfully developed learning and assessment tasks. Our vendor, measure progress, did this at first. Now we have teachers writing the learning tasks and coaches writing the assessment tasks for our middle school pilot with a pretty good level of success. We’ve piloted dyads with teachers in grades 3 through 12 in the four content areas and career tech pathways. Teachers who have piloted have also scored their own assessment tasks in a double-blind situation so they don’t know if they’re scoring their own students’ work or someone else’s students’ work.
One of the initial features of the gate-funded project of the use of teachers as readers. When teachers are taught to score, that experience sharpens a teacher’s understanding of the item, the scoring process, and the content standards all at once. Moreover, when teachers score students taught by others, the teachers might see something in the students’ response that can be used with his or her own students next session. One of the early teacher scoring and training — scoring training and scoring of students responses took so long that it was projected it might take days of teacher time. We have worked to define the assessment tasks and the rubrics to reduce the time to score. The process now takes about one day to complete for any group of teachers. We’ve also field tested diads that were piloted and revised with the teachers who did the online training for implementation. Those field test teachers also scored their own students’ work and the work of other students in the scoring event that took only one day to complete. We have developed a set of online professional learning modules to prepare teachers to implement the diads in their classrooms. And we’ve modified these based on user feedback. We’re in our third trial this month. And we’ve collected a lot of very interesting data. We did some experiment. We collected data that should tell us whether it matters if you teach a learning task or not. We’ve collected data to tell us whether it matters if you are trained to score in a face-to-face scenario or on line. We’ve collected data it tell us whether teachers are biased for or against their own students when they score. And we’re also looking at performance of students and diads compared to their performance on the state and assessments.
Finally, we’re scaling up. And we have a model and a vendor who’s going to help us implement the scaled up version of this model, and our plan is to be in full operational mode by the spring of 2015.
So lessons that we’ve learned along the way. Quite a few actually.
The first lesson is not anything new. Face-to-face professional learning is clearly most effective. Much more than effective than online professional learning. Online learning can be effective even though it may not be the most effective. It can be effective if it includes group work. We learned that after trying with the first group of teachers and gave free reign to design their own implementation of the online learning modules and learned that when they did not work together as groups they ran into technical issues and that derailed many teachers from being able to complete the online modules. There’s also the online models initially did not include many optional sections. We also found it was important in the online professional learning to give the learner flexibility to choose what the learning needs the most and to differentiate. Kind of what we ask teachers to do all the time.
A second lesson learned which which also is probably not a big surprise is that technology is a very big challenge. One of the thing that we noticed besides having issues with teachers not being comfortable with technology was also that teachers and technology specialists were not communicating very well to one another. And there is, of course, the limitation that a lot of k12 institutions are dealing with which is how to provide access for all students. It could be that they’re limited to a computer lab or that they’re insistent on protecting students from being able to sort through information. Both of those provide barriers or interesting obstacles to learn to overcome.
A third lesson learned is the light bulb moment. One of the things i noticed, and i haven’t figured out how to replicate this any faster, is that the teachers involved in the project really understand it only after they have scored their student responses to the assessment task. Up until that moment, no matter how many times i try to tell them the answers to the questions, they’re unwilling to hear it. It’s sort of — one of my friends says that one of the failures is providing answers to questions that have not yet been asked. I think that’s the problem. They don’t ask the questions what am i supposed to do with this learning task or how does this task relate to the assessment task in a way where they’re willing to receive the answer. Until they see what happens with the student responses. What i will say is that when teachers score student work, teaching improves quickly. The next time they do a learning task, students do significantly better because the teachers know how to teach it. They understand what’s important about the learning task right away.
The fourth lesson learn sudden that students love technology. Again, not a big surprise. But another issue is that teachers need some help in learning how to use technology effectively with students. It’s not just that teachers aren’t using technology. I think it’s that they’re using very simple tools that aren’t very effective in helping students learn. And so we need to help teachers find these wonderful resources and use them effectively.
And my last lesson learned is that curriculum embedded performance assessment requires best practices to be effective. For optimal implementation, it’s really important that teachers are fully — fully — what is the word i’m looking for — that they’re really using all of the resources that they have in their toolboxes as teachers to really engage students. And if they’re not using all of those resources, formative instructional techniques, feedback, reengagement, then the implementation will not be effective. Teachers need training that supports using these practices, and time — time to be reflective. We haven’t been giving teachers enough time to think about it. In this pilot we’ve provided teachers with a lot of that time, and in real life, i think that’s something we need to think about changing.
Some challenges we ran into were online delivery of — of the tasks. That’s a new challenge that’s something that involves many layers, not just teachers but technology and district contact and information, back and forth. It’s just difficult. They’re not used to getting these types of pieces of information to the right resources. And that’s coming with all of the new online testing. I think this is something that’s good that we learned what the challenges were and how much we need to help districts do that.
Another issue is that teachers are reticent to teach without knowing all the answers. Especially in high school. We’ve struggled with helping teachers be able it figure out how to be comfortable in that uncomfortable situation. Also keeping the tasks and goals and rubrics aligned to one another is a challenge that teachers that are writing the learning tasks now sometimes debt great ideas for activities that don’t quite match up with what their stated learning goal. So we end one things that are not quite meshing together. So that’s something i’ve developed a lot of and have had a lot of fun doing it, i’ve developed a lot of interactive activities to try to help teachers overcome some of these pitfalls that they keep running into.
Finally, our state model that the state has adopted to continue this work is we’re going to provide these learning systems for the untested grades in science and social studies between grades three and eight. For science, ohio’s testing in fifth and eighth. The other years we’ll be offering the task dyad learning system. Our model is that teachers will create the learning task shelves at summer institutes. Vendors will finish off the learning task and create associated assessment task. And then the teachers, again, will be involved in scoring assessment tasks with moderators to train. This is the model we’ve been using in the pilot, and i think it’s — there are a lot of benefits to having teachers involved at this level. I’ve seen it in other states, and i’d like to bring it to ohio. So i’m excited about that aspects of it. – that aspect of it.
>> thank you very much.
>> you’re welcome.
Haynes:>> i just wanted to ask a couple of questions. You made a decision early on about using an online platform. Right? Even though there’s some of these challenges around using technology. Can you talk about what was your decision to do that.
Jones:>> well, i saw that we were moving toward an online assessment system. The rifp, we hadn’t chosen which consortium we were going to go with. We were on smarter balance, but both were going to be offering online assessment system. I felt that it made stones try out some of the online delivery pieces of that. Just to work out the level one and level-two bugs. And we learned so much because of that. I’m really glad that we did that. It was — i mean, eventually everyone got on board with it. But there was a lot of skepticism that we would end up with a vendor who could provide all of that and an online delivery system. Thank you, measured progress. >> measured progress that does involve this.
Haynes:>> as we begin our panel discussion, i want to remind viewers that you can ask questions via the box below the video window. So stuart, just — kind of go through this again about how curriculum-embedded performance assessments can help teachers taxle two big problems now. One is increasing rigor of what students are asked to do overall with all students given these new expectations or what they need to know and do. As well as addressing learning gaps in students’ learning and performance.
Jones>> well, the first part of that, the rigor. That’s — that’s why we’re moving there. We know that — that the selected response format has problems addressing some of the skills we’re most concerned about. And that’s why we’re moving to performance assessment, to engage them in bigger types of problems, activities that involve them — in so many — so many situations and real-world problems and places where they have to apply an awful lot of different skills as opposed to everything in isolation, number one. Number two, higher order skills. That’s why we’ve moved the performance assessment. So the rigor is kind of there. That’s what they’re intended to do. The gaps, it their is interesting — i — this is interesting — i do believe that it can do a lot for two reasons to deal with the learning gaps. Number one, we talked about the curriculum-embedded performances as being units. We saw from examples there’s the learning tasks and assessment tasks. The listening tasks are still in many cases generating evidence of learning, and so that’s — that’s during construction, that’s formative assessment evidence gathering. The evidence-gathering step with informative assessment process. And –
>> what’s that allow you to do? Once you have that evidence, once –
>> once you have the evidence, the other step to the process are basically — are basically providing feedback, rich, effective, descriptive feedback, not six right out of eight, okay. And also to adjust instruction to adjust learning activities and so on to fill those — to fill the learning gaps or close those gaps. Now when you talk about gaps as – learning performance gaps, the different populations and so on, the research shows clearly that effective performance — excuse me, effective formative assessment which is part is of what we’re talking about here has the biggest benefit for the students who are disadvantaged or are struggling, whatever. And i might also add that when you talk about formative assessment, you talk about the suspects for learning, a big part of the rationale for that has to do with motivation to learn. And i think there’s no question that kids engage in these activities, first off engage in activities for formative purposes, it may not be graded. Their focus shifts. They’re motivated to learn as opposed to get the grade and be satisfied with 80% correct. To me, that’s the other aspect of it.
>> and maybe you can pick up on this same question. And also talk about grading in relationship to these formative tasks.
Jones >> yeah. For the learning tasks, we deliberately made it impossible for teachers to assign grades. We wanted to make sure that they weren’t going to inadvertently start using them for some other kind of summative grading system. It was a challenge. The teachers didn’t like that at first. But they began to see the results quickly as you heard in the video that we showed at the beginning. Many of them after trying it recognized the benefits. That was a very positive thing. But i think that the — another piece about the learning tests is that our learning tests in the — in the cast dye learning system include an extensive teacher’s guide that gets to formative instructional techniques that would be appropriate miscon senses. We use ohio’s model curriculum for the common core standards and our own state standards in science and social studies to help identify areas that might be complicated to teach and strategies in order to get around the misconception that’s students often have.
>> thank you. So when you — i want to look at some of the skills and practices teachers made it to have as part of this formative process. One of them you mentioned. Maybe you can elaborate, reengagement. What is that about? What does that mean?
>> reengagement is a process of reteaching. But the idea is to involve the entire class, even the students really got it, in the process – the students who really got it, in the process of reteaching. Studies have shown that type of reengagement can really help all the learners learn. What it does fundamentally for performance assessment is it helps all of the learners have an understanding and develop a common understanding of what the expectation of performance is. It’s difficult for students sometimes to use a rubric or a question in order to understand what the teacher is expecting of them. Sometimes providing examples will be helpful. Reengagement of the process is taking actual student work, showing it to the class anonymously, of course, and deconstructing some of the responses and saying this response is good, why is it good. What’s not good about this response. Every response has good and bad things about it. A great youtube video about that is my favorite no.
>> i’ll check that out.
>> when you were talking about as far as motivation engagement, those kinds of processes sort of reengage which i guess is the point of the terminology that’s used. But it takes — takes stops another place in terms of their learning.
Jones>> exactly. Listen, there’s another aspect of it, too, that formative assessment is a process that has multiple steps involved. And part of it is the use of other students as resources. To me, whether we talk about reform, we talk about the way teacher and students spend their time. And time keeps coming up. I keep hearing i don’t have time for the assessment and still teach. You got that change that mode of instruction where the teacher is the source of all information presented to the students, then you test and move on. The idea is that students on their own can do a lot it help each other. In some ways that should free up a teacher’s time to be a facilitator, to be moderating things, to be going from group to group or whatever. And doing a variety of things in terms of gathering evidence. To me, that’s another skill, almost a management skill – >> changing the dynamic so that the learners are much more active in accessing their peers as resources and – >> by the way, making the change is not easy. This is a culture shift. This is a — a mindset shift or something. The first time you try some of this, it may not work so well. The first time you don’t grade something for the kids, you might find — >> makes you nervous. >> it doesn’t work, quit it. You know. That’s not the issue. You want the kids to know this isn’t graded, but i know when it is graded. If i can do this now, i’ll be fine then. So that’s a critical part of formative assessment, too.
Haynes>> a challenge is for teachers to feel comfortable when they didn’t have all the answers or when they didn’t know exactly where this would take them on one task and so forth. Can you say more about that?
>> yeah. I think it’s especially difficult in the high school grades where our teachers are expected to be content experts. So asking a question that you don’t know the answer to is hard. A lot of times i try to get teachers to start out by friending they don’t know the answer for a little bit. Sometimes that’s more comfortable for them. The point is that what teachers, what students need to learn is that there won’t always be somebody with an answer to a question that you have to ask. And providing them the opportunity to have an experience where there is no one in the room who has “the answer” is a really important thing to experience.
Haynes >> we have a question from a viewer about where they can see examples of these tasks.
Jones >> there’s a link on the opapp website, go to www.educatio ohio.gov/opapp, at the bottom of that page, there is a link to a resource called i learn ohio with the user name and possword that’s open to — password that’s open to anyone to use. I don’t remember them off the top of my head. But you can get in and see for example tasks. They’re learning tasks only. But there’s one for high school biology, one for algebra one, one for high school social studies, i believe a history task, and a 10th grade level task.
Haynes>> a question from sunny from denver who asks were there any state policies adopted to enable this process, or is there — is there consideration that they’re going to have to generate the policy frame for any of this work?
>> right. So the pilot, it was just a part of the top grant. That was how it — that enabled the pilot to happen. Moving forward, the state model that we’ve adopted is a part of our non-sumtive assessment plan. At this point in time, i’m not aware of any policies that have been developed to establish whether teachers will have to use them or not. The fact that we’re offering them is unique. Not many states are offering formative assessments to their teachers that could be used, that will mimic what the summative assessments will look like. >> some of the questions we’re getting is about how they’re going to integrate the performance assessments with the accountability system. >> for the learning system, those are going to be outside of the summative system. They won’t be part of accountability at this point in time. At least not on the federal level. We are offering, though, end-of-year tests. We are a park state, so our — our english and mathematics exams will be park assessments which include a performance-based assessment and end-ive-year assessment which – end-of-year assessment which will be combined into one score, and we’re mimicking assessments for science and social studies.
>> a lot of overlap.
>> a lot of overlap, yes. >
Haynes> so there’s a couple questions about costs. Taylor from d.c. And circy from iowa asked about the estimated costs of the pilot and what that might look like going forward as you’re trying to scale.
>> we did a cost estimate, and i was surprised that the cost estimate was not very high per student mainly because our model is not to have hundreds of those performance assessments. Costs a lot to develop them. But per student, and ohio has about 1.8 million students, the cost is pretty low. Less than a dollar per student. I believe more like 25 cents per student –
>> really? There’s a perception that this is — very expensive. Expensive to develop. >> it can be if you want to have ten every year, yeah. It’s more expensive. But if we’re look at one diad each year for the full grade level, not that expensive counting by student. >> a lot of the works thus far that was funded to race to the top is something you’ll have going forward. So those develop cost will come down in theory.
>> yeah. What we’ve created from the pilot are more prototypes of what will be used in the summati ve — non-summative assessment system. How the state chooses to use them moving forward has not yet been indicated to me. We’ve not released anything that will be used in the summati ve sense, whether it’s used for summative purposes for local, predictive — i don’t know what they want to use it for. We haven’t put out any of the assessment tasks. We’ve only put out learning tasks so far. More will be released by the end of the semester, i’m sure.
Haynes>> terrific. Do you have thoughts about this as you work with other states in terms of things they wrestle with around all of this?
Kahl>> i do. But you’re not going to be able to force me to name a dollar figure. There’s so many variables that to put a price tag on it is very difficult. It is true that there’s an awful lot of efficiencies in what we’re talking about. What might have been expensive years ago may not be expensive now. The business of in the past of large scale performance assessment was gathering all the work, essentially scoring that you will kind of stuff. If the teachers are doing the scoring, that’s — that’s a savings. They’ve got to be moderated so there isn’t a cost of auditing scores and so on. That’s done on a smaller scale and so on. And there’s good techniques that we know to do. That we’ve also got the submission to work electronically. So — there’s a lot of savings nowadays. So it is really to me, it’s worth the expenditure whatever it might be. But you know, the — the development of tasks, teachers developing them. Maintaining that quality. We think they should go through the same review that the multiple choice items have gone through for years. You know, bias and sensitivity, alignment, all kinds of issues like that have to be addressed in those reviews. That should be done for the performance task, as well. You’ve got a bunch of committee work, you know. Vendor work in terms of — of the logistics of all of this, but to me, the development of these things should be something that’s ongoing, that teachers are submitting and reviewing, revise, posting, that kind of thing. And so — yeah. For — any combination of all these factors could probably put a price tag on and add it up. But i think it’s certainly not – it’s not extraordinary, i don’t think. >> and i would say developmentally the only thing that is different from the learning tasks from a standard item for an assessment would be that the learning tasks do need to be piloted in front of students. One of the things we’ve noticed is that our first attack at writing a — a learning task and teachers’ first attempt at writing a learning task doesn’t always result in the types of responses that you’d expect. Pretty much i think anyone who’s worked on scoring constructive responses at the state level would understand this. You cannot possibly anticipate every response a student would give. And some of them will just be a big surprise, and then you have to rethink how did i frame this question, how did i set up this particular task so that students could respond. And so i would say piloting is one extra piece that has to be added in.
>> okay. And you’ve mentioned it, the active scoring provides teachers with sort of the most powerful professional learning in all of this h. It’s almost sort of — you mentioned a couple of times, almost like an aha moment by virtue of them going through and scoring work. Is it possible to use technology to increase the reliability or ability to practice and score through the use of technology? Are you using technology for those purposes?
Jones>> yeah, uh-huh, we are. We do — we’ve tried both a foois face scoring situation and a remote scoring situation because we felt that scaleability of a face-to-face scoring situation for every teacher in the state was not really realistic. So we tried a distributed scoring model. Measured progress worked with us to develop training modules for the teachers to be trained to score the assessment tasks. And we think it’s been very successful. One limitation that we have is that it was a little bit hard to include validity papers and the large practice sets. So moving forward, i would say we probably need to increase the scoring materials that we’re creating for teachers to be able to practice. We’ve seen pretty good reliability in the way teachers are scoring.
Haynes>> are you seeing that in terms of reliable of – >> general when he it comes to scoring constructed response or extended constructive response, most of the companies have systems now that at some level are distributed. And their training often is on line. The constant monitoring of the scoring consistency might — might — the readings, looking at the discrepancies and dealing with them when they’re scores that are different by more than one point. There are systems that it can be applied to. I think to the — the products that come out of performance assessment. Particularly if they’re submitted electronically and people can look at them, yeah. >> sounds sophisticated. Sort of for all kinds of reasons. In terms of like you said, since the ’90s, the level of expertise and using the technology to really refine some of these scoring methodologies.
>> the early half of the ’90s we were taking live paper and sorting them out into a dozen boxes and you had a dozen tables –
>> i remember those days.
>> i mean – >> yes.
>> that’s long gone.
Haynes >> okay. We have a question from aroundy from columbus. In your neck of the — randy from columbus. In your neck of the woods, about the schedule schedule. How is the school schedule for this, is there time to collaborate on how they’re using or scoring these — these tasks?
Jones >> yeah, that’s a good question. With the pilot we were able to provide funding to the schools so that they could get substitutes and get time off for the teachers. But we also had an agreement with every school that participated in the pilot that they should provide at least one hour of collaborative work time for the teachers that were involved in the pilot. Most of the schools already had something like that built in. They already had team meetings that — that would happen on a weekly basis or twice weekly for a half-hour each. As long as they had an hour where they were able to work together a week, we were pleased with that as a minimum requirement. So i don’t think it had a big impact on what was already happening in most of the schools that participated in the pilot. Although our professional development load the first semester was very high for our face-to-face cohorts. They had eight days out of class which was stressful for a lot of them.
>> but the teachers — how are they recruited into the pilot? >
Jones> we had an open application process, anyone in the state could apply for the first two cohorts, it was a race to the top only. For the field test cohorts we opened it to all the — all the districts in the entire state. And we selected them based on their responses to the questions in the application and their readiness to be able to participate. >> did you get a pretty good response? >> we did. The first one was competitive. We had about 21 — 21 districts supply. And we had had to only take ten. So it’s –
>> great response.
Haynes>> you — we have a lot of questions about what’s available. I mean, folks upon these resources. Are the — folks want these resources. Are the tests available, are s there anything that we can share with viewers in terms of how they may be able to see these? You mentioned one.
>> it four tasks are available on the i learn website. We are moving toward getting the modules available. Right now we’re in the last round of piloting. For the month of january, they’re live, and they’re being used by a group. Teachers. Perhaps later this semester, by march, we’ll have a home for the training modules. May be the same as the home that we have right now.
>> can they go to the opapp site?
>> yes, definitely check the opapp site. I’m sure once we get a home for the p.d. Modules, i will put up where they are and get access for anyone who wants to find them.
Haynes >> how is this — how has this impacted the role of leaders in schools? Do they have to be integrally involved in this in terms of how – what is their role in making this successful within their school?
Jones>> i think most of the schools have participated in elementary school, the leaders, the principals have attended a lot of the professional development. I think that’s very important for the teachers to get that kind of support. In the high schools, i’ve seen lots of that support, more from the curriculum directorer if anyone else from the district office. I think getting support in terms of being given time, not having other duties piloted on top of them in addition to this has been important for the teachers that are in the project.
>> excuse me. I mentioned the center for collaborative education in boston. And there are two p.a.-quality assessment performance programs. They have an implementation guide that — it’s about what it takes to implement this program over a — over their program, multi-year period. So it’s not something you can do overnight to go from a to b just like that. It’s — it’s very challenging. And a lot of that has to do with what the roles and responsibilities of the leaders and the teachers and so on are. So that’s a — that’s a – >> there’s a lot out there, isn’t there? There’s much work out that has been going on for many years now that — that can come to the fore, and the common standards, it makes it accessible and transferrable to other places.
>> i think it’s a lot easier sometimes to do something at the school or district level than it is statedwide.
>> that is true.
Haynes >> so i think we’re just about out of time. Any final comments? I applaud you for this marvelous enterprise. I think we all are very eager to hear how this goes, and again, viewers will be looking for resources that may be available. We will post a few things on — where the archived video is — will be housed in the next day or so. But again, I really want to thank you, lauren, very much, and stuart for spending time with us to share this exciting work. And thanks to our viewers for joining us today. Have a great day. Thank you.