NB: Everything said of PARCC here applies, as well, to all the standardized tests in English language arts currently being used by state departments of education. In other words, PARCC is singled out, here, as an example of a general phenomenon.
The Common Core Curriculum Commissariat College and Career Ready Assessment Program (CCCCCCRAP) foisted unlawfully on the states by the federal Department of Miseducation needs to be scrapped. Here are a few of the reasons why:
The CCSS ELA exams are invalid and unreliable.
First, much of attainment in ELA consists in world knowledge (knowledge of what—the stuff of declarative memories of subject matter. Here are some examples, to illustrate what I mean:
Tolstoy wrote War and Peace; naturalist writers show people being determined by forces outside their control; Vonnegut was a prisoner of war in Dresden during the bombing; in spoken language, variation in pitch produces melody, and variation in volume, or stress, produces rhythm; “Why the Sea Is Salt” is a pourquoi tale; courtly love is a convention of medieval writing; the difference between a fable and a parable is whether the characters are animal or human; Boo Radley saves the lives of Jem and Scout; the couplet at the end of an Elizabethan sonnet often summarizes; Shakespeare’s Romeo and Juliet and Erving Gofman’s The Presentation of Self in Everyday Life both employ the metaphor of the world as a theatre and life as performance; Yeats identified his beloved Maude Gonne with Helen of Troy; Dylan Thomas’s “Do Not Go Gentle into That Good Night” is a villanelle; a reductio assumes the truth of a statement and then shows that it leads to a contradiction; American Puritan writers favored local governance; a pleonasm is a redundant phrase like “I’m reading words.”
The “standards” being tested cover almost no world knowledge and so the tests based on those standards miss much of what constitutes attainment in this subject. Imagine a test of biology that left out almost all world knowledge about biology and covered only biology “skills” like—I don’t know—slide-staining ability—and you’ll get what I mean here. Knowledge matters. This has been a problem with all of these summative standardized tests in ELA since their inception. Based as they are on a vaguely worded list of “skills,” they are almost entirely content free.
Second, much of attainment in ELA consists in procedural knowledge (knowledge of what—the stuff of procedural memories of subject matter). However, the “standards” being tested define skills so vaguely and so generally that they cannot be validly operationalized for testing purposes as written. Consider this gem of a “standard,” repeated at every level in the CC$$: “Cite strong and thorough textual evidence to support analysis of what the text says explicitly as well as inferences drawn from the text.” What kinds of inference are we talking about here? What constitutes “strong evidence,” given that different kinds of inference require evidence of different kinds? There are three main types of inference–induction, abduction, and deduction. There are entire sciences devoted to varieties of inference within each–there are, for example, several varieties of logic (propositional, predicate, modal, deontic, and so on), probability, hypothesis testing and inferential statistics generally, competing schools of thought and whole disciplines devoted to what constitutes evidence in literary studies. The “standard” is so vague that, in fact, one cannot reliably and validly measure whether a student has met this “standard.” That this is so is clearly illustrated by the fact that I can design a question on this “standard” that a head of lettuce could answer or one that would engender heated debate among a roomful of PhD literary critics, and certainly, a couple questions on some high-stakes standardized test are not going to determine whether one has “mastered” a “standard” as vague and vast as this one is. The “standards,” as written, are not specific enough about procedures that a student has learned to be at all testable (e.g., “Write a fable of less than 250 words that is appropriate for children and contains the following elements of a fable–animals, a conflict, and a moral.) Note that I am NOT suggesting that we should replace Coleman’s childish bullet list with standards like that one. I am simply pointing out that his list, as written, is not validly or reliably testable in the way that whether a child knows the multiplication table for single-digit natural numbers is testable.
Clearly, the people who hacked together these “standards” didn’t think carefully about what they were doing or what they were saying or whether these “standards” were, in fact, testable, even though they knew that these “standards” would be made the subject of tests. That’s how thoughtless, how heedless, the creators of these “standards” were.
Third, nothing that students do on these exams EVEN REMOTELY resembles reading and writing as it is actually done in the real world. The test consists largely of what I call New Criticism Lite, or New Criticism for Dummies—inane exercises on identification of examples of literary elements that for the most part skip over entirely what is being communicated in the piece of writing. In other words, these are tests of literature that for the most part skip over the literature, tests of the reading of informative texts that for the most part skip over the content of those texts. Since what is done on these tests does not resemble, even remotely, what actual readers and writers do in the real world when they actually read and write, the tests, ipso facto, cannot be valid tests of real reading and writing. For more on this subject, see the following: https://bobshepherdonline.wordpress.com/2014/04/10/on-developing-curricula-in-the-age-of-the-thought-police/.
Fourth, standard standardized test development practice requires that the testing instrument be validated. Such validation requires that the test maker show that the test correlates strongly with other accepted measures of what is being tested, both generally and specifically (that is, with regard to specific materials and/or skills being tested). No such validation was done for these tests. NONE. And as they are written, based on the standards they are based upon, none COULD BE done. Where is the independent measure of proficiency in CCSS.Literacy.ELA.11-12.4b against which the items in PARCC, that are supposed to measure that standard on this test have been validated? Answer: There is no such measure. None. And PARCC has not been validated against it, obviously LOL. So, the tests fail to meet a minimal standard for a high-stakes standardized assessment—that they have been independently validated.
The test formats are inappropriate.
First, the tests consist largely of objective-format items (multiple-choice and EBSR). These item types are most appropriate for testing low-level skills (e.g., recall of discrete factual detail). However, on these tests, such item formats are pressed into a kind of service for which they are, generally, not appropriate. They are used to test “higher-order thinking.” The test questions therefore tend to be tricky and convoluted. The test makers, these days, all insist on answer choices all being plausible. Well, what does plausible mean? Well, at a minimum, plausible means “reasonable.” So, the questions are supposed to deal with higher-order thinking, and the wrong answers are all supposed to be plausible, so the test questions end up being extraordinarily complex and confusing and tricky, all because the “experts” who designed these tests didn’t understand the most basic stuff about creating assessments–that objective question formats are generally not great for testing higher-order thinking, for example. For many of the sample released questions, there is, arguably, no answer among the answer choices that is correct or more than one answer that is correct, or the question simply is not, arguably, actually answerable as written.
Second, at the early grades, the tests end up being as much a test of keyboarding skills as of attainment in ELA. The online testing format is entirely inappropriate for most third graders.
The tests are diagnostically and instructionally useless.
Many kinds of assessment—diagnostic assessment, formative assessment, performative assessment, some classroom summative assessment—have instructional value. They can be used to inform instruction and/or are themselves instructive. The results of these tests are not broken down in any way that is of diagnostic or instructional use. Teachers and students cannot even see the tests to find out what students got wrong on them and why. So the tests are of no diagnostic or instructional value. None. None whatsoever.
The tests have enormous incurred costs and opportunity costs.
First, they steal away valuable instructional time. Administrators at many schools now report that they spend as much as a third of the school year preparing students to take these tests. That time includes the actual time spent taking the tests, the time spent taking pretests and benchmark tests and other practice tests, the time spent on test prep materials, the time spent doing exercises and activities in textbooks and online materials that have been modeled on the test questions in order to prepare kids to answer questions of those kinds, and the time spent on reporting, data analysis, data chats, proctoring, and other test housekeeping.
Second, they have enormous cost in dollars. In 2010-11, the US spent 1.7 billion on state standardized testing alone. Under CC$$, this increases. The PARCC contract by itself is worth over a billion dollars to Pearson in the first three years, and you have to add the cost of SBAC and the other state tests (another billion and a half?), to that. No one, to my knowledge, has accurately estimated the cost of the computer upgrades that will be necessary for online testing of every child, but those costs probably run to 50 or 60 billion. This is money that could be spent on stuff that matters—on making sure that poor kids have eye exams and warm clothes and food in their bellies, on making sure that libraries are open and that schools have nurses on duty to keep kids from dying. How many dead kids is all this testing worth, given that it is, again, of no instructional value? IF THE ANSWER TO THAT IS NOT OBVIOUS TO YOU, YOU SHOULD NOT BE ALLOWED ANYWHERE NEAR A SCHOOL OR AN EDUCATIONAL POLICY-MAKING DESK.
The tests distort curricula and pedagogy.
The tests drive how and what people teach, and they drive much of what is created by curriculum developers. Every curriculum developer in ELA in the country now begins every project with a spreadsheet containing Lord Coleman’s puerile bullet list of “standards” in the right-hand column and where these “standards” are “covered” by the project in the next column over. Imagine a unit on the Civil War that concentrated mostly on matters like the relative sizes of Union and Rebel cannonballs. This is a vast subject, so I won’t go into it in this brief note. Suffice it to say that the distortions are grave. In U.S. curriculum development today, the tail is wagging the dog. As a result of the new “standards” and the high-stakes tests on these, much of ELA instruction in the US has been reduced to trivial exercises on applying these “standards” to random, trivial snippets of text. This is obscene.
The tests are abusive and demotivating.
Our prime directive as educators is to nurture intrinsic motivation—to create independent, life-long learners. The tests create climates of anxiety and fear. Both science and common sense teach that extrinsic punishment and reward systems like this testing system are highly DEMOTIVATING for cognitive tasks. The summative standardized testing system is a really, really backward extrinsic punishment and reward approach to motivation. It reminds me of the line from the alphabet in the Puritan New England Primer, the first textbook published on these shores:
The idle Fool
Is whip’t in school.
The tests have shown no positive results.
We have had almost two decades, now, of standards-and-testing-based accountability under NCLB. We have seen only miniscule increases in outcomes, and those are well within the margin of error of the calculations. Simply from the Hawthorne Effect, we should have seen SOME improvement!!! And that suggests that the testing has actually DECREASED OUTCOMES, which is consistent with what we know about the demotivational effects of extrinsic punishment and reward systems. It’s the height of stupidity to look at a clearly failed approach and to say, “Gee, we should to a lot more of that.”
The tests not only do not decrease but actually worsen achievement and gender gaps.
Both the achievement and gender gaps in educational performance are largely due to motivational issues, and these tests and the curricula and pedagogical strategies tied to them are extremely demotivating. They create new expectations and new hurdles that widen existing gaps instead of closing them. Ten percent fewer boys than girls, BTW, received a proficient score on the NY CCSS exams–this in a time when 60 percent of kids in college and 3/5ths of people in MA programs are female. The CCSS exams drive more regimentation and standardization of curricula, which will further turn off kids already turned off by school, causing more to turn out and drop out.
How do you prevent another mugging by standardized test of your kid? Have him or her opt out of the test. It’s your duty, as a parent, to ensure that your child is not subjected to abuse, and that what the high-stakes standardized tests are–abuse.
Even if the tests were valid and reliable, the results from them, as reported, cannot be trusted.
High-stakes testing encourages gaming the system. State departments of education routinely do this, setting cut scores for proficiency levels wherever they wish to in order to make whatever point they want to make this carnival season. If they are in the process of introducing a new test and want to show that their draconian new accountability system is needed, then they set the cut scores high. If they’ve had their new test in place for a while, they set the cut scores lower so that they can show improvement. And sickeningly, journalists and politicians fall for this. Many school systems around the country have been caught changing student answer sheets. But the most rampant form of cheating takes the form of drilling kids, over and over and over, on sample test questions because simple familiarity with test question formats improves scores. Scores, not learning.
This message not brought to you by
PARCC: Spell that backward
MAP to nowhere
Scholastic Common Core Achievement Test (SCCAT)
The Bill and Melinda Gates Foundation (“All your base are belong to us”)
For more pieces by Bob Shepherd on the topic of Education “Reform,” go here: https://bobshepherdonline.wordpress.com/category/ed-reform/
For more pieces on the teaching of literature and writing, go here: https://bobshepherdonline.wordpress.com/category/teaching-literature-and-writing/