Killing the Messenger: Attacks on the SAT

Steven Dutch, Natural and Applied Sciences, Universityof Wisconsin - Green Bay
First-time Visitors: Please visit Site Map and Disclaimer. Use"Back" to return here.

It would be a serious mistake to think corporations like Big Tobacco have a monopoly on socially irresponsible denial games. In the past fewdecades, there has been widespread of concern over the decline in American educational standards. One of the major indicators of educational decline has been the steady decline of test scores on college entrance examinations such as the SAT, or Scholastic Aptitude Test. For many years, though, the response ofmany educators to the decline in test scores has been flat denial thatthe decline had any significance. The denial games continue to the present day.

National score data for the SAT are first available for 1952. Between then and 1963, SAT test scores held constant or even increased, despite the fact that the proportion of high-school students taking the SAT rose from 7% in 1952 to 30% in 1963, and thus that many less-qualified students were taking the test. It seems reasonable to conclude that the quality of American education held steady or even increased a bit during those years. In 1964, scores declined, and by 1970, national average scores on the verbal aptitude portion of the SAT had fallen from 478 out of a possible 800 to 460; mathematical aptitude scores fell from 502 to 488. When millions ofpeople are taking the test, even a small variation in the average can be significant. By 1977, verbal scores were down to 429, math scores to 470. By 1981, scores had declined for 19 consecutive years; verbal scores had fallen a total of 54 points to 424, math scores had fallen 36 points to466. In 1982, for the first time in two decades, scores rose; math by one point, verbal scores by two.

If the drop in test scores had lasted only a few years or amounted to only a few points, we might be justified in writing the shift off as a statistical fluke. Denying a 19-year decline is the educational equivalent of denying that smoking causes lung cancer. After critics of the SATcharged that the decline in scores up to 1970 had no significance but merely reflected drift in the grading standards, Educational Testing Service, the authors of the SAT, conducted a survey. They found there had indeed been a drift, but in the direction of leniency, and the actual decline had been ten points greater than scores indicated! Another explanation that had been advanced was that the influx of disadvantaged minorities beginning in the mid-1960's tended to lower test scores because less-skilled students who would not otherwise have taken the SAT were taking it. Most analysts of the decline in test scores have concluded that this effect was strongest in the late 1960's and early 1970's and may account for up to half of the decline. In recent years, minority test scores, though still lower than white scores, have increased. Black math scores rose 12 points and verbal scores 9 points between 1976 and 1982, while white scores dropped ten points in mathand seven points in verbal skills in the same period! After we allow for all the demographic effects, we are still left with almost a full decade during which many educational theorists either denied there was a problem or worse yet, attacked the tests.

The "Deep Thinker" Fallacy

One of the commonest attacks on testing methods might be called the "deep thinker" fallacy. Banesh Hoffmann, a major critic of testing, presented the following example in his book The Tyranny of Testing of a true-false question that is true on one level but false on another: George Washington was born on February 22, 1732 --True/false. A mediocre thinker might immediately answer true, but a deeper thinker, knowing that Britain adopted the Gregorian Calendar after Washington was born, and that Washington's original birthdate was February 11, might well suffer some confusion.

If only it were true that this was a common problem! There would be no need for me to have pages on pseudoscience! A deep thinker might well lose an occasional point because of such a question, but that loss is more than made up by the extra points gained by being able to answer more difficult questions that others miss. The student is not taking the test in a vacuum, either; he or she knows why the test is being given. If the questions are generally simple, the answer is likely to be "true"; in a course on astronomy or the history of science the expected answer might well be "false", but the student who is a really deep thinker(and paying attention in class) should have little trouble telling the two situations apart.

One example of a test question on the SAT that was successfully challenged involved a circle rolling around the circumference of another circle three times larger in diameter. The question: how many revolutions does the smaller circle make? Before going on, answer the question. Then answer these: what happens if the stationary coin is twice as large as the rolling coin? The same diameter? Half the diameter? Now explain why.

The expected answer on the SAT was three, but an outside observer would actually see four; three from the rolling and one more from travelling around the circle. The question is exactly like asking whether the Moon rotates; as seen from the Earth, no, but as seen from anyplace else, yes -- once every month as it travels around the earth. Thousands of students picked up a few points when ETS allowed four as a possible answer, but there is not the slightest reason to think that more than a tiny fraction of them could reason out correctly why four might be a correct answer. To get credit, they should have been asked to explain why four was a correct answer. (Incidentally, the answers to the other questions above are: relative to the stationary coin, 2, 1 and 1/2, and relative to an outside observer, 3, 2,and 1-1/2.)

The "deep thinker" fallacy is simply a variation on the"Galileo fallacy" so widespread among cranks: they persecuted Galileoand he was right. They persecute me, therefore I am also right. In this context,a brilliant student got the rolling-circle question wrong. I also got thequestion wrong, therefore, I must be brilliant. A very widespread versioninvolving Einstein goes that Einstein didn't like school and was considered dullby his teachers, therefore any student who doesn't like school and is considereddull by his teachers is a potential Einstein.

The Nader Games

Few people have been more critical of corporate denial games than Ralph Nader and his followers, but a Nader Group study, The Reign of ETS, by Allan Nairn and Associates, is one of theclassic attacks on testing. The study attacks the SAT and similar tests for not testing relevant skills, for being useless as a predictor of future performance, and for being primarily a measure of social class.

The charge that test scores do not predict future performance is based on the fact that test scores have only a weak correlation with first-year college grades and evenlower correlation with later grades and lifetime earnings. Would the group accept college grades and lifetime earnings asmeasures of competence in some other context? Of course not! College students major in a wide variety of subjects, so that a student with poor math or verbal skills can all too easily find ways to avoid subjects that tax those skills and end up with a high, if meaningless, grade-point average. It is also a sad fact that a semi-literate athlete or entertainer earns more than a college professor. Criticizing tests because they fail to agree with measures that measure nothing is bizarre methodology. The real questions should be, how do students with poor math scoreson the SAT do in situations that require mathematical skills? How well do students with poor verbal scoreson the SAT do when confronted with the need to read complex literature or write something of their own?And really, to deal squarely with the issue of unfairness, the question is howoften do students with poor SAT scores perform at a high level in college, withoutrequiring remedial education?

The best way to consider whether the tests measure relevant skills is to look at two actual SAT questions.

(1) Pick the word that is most nearly opposite:
MITIGATE: A. Solidify B. Humiliate C. Deviate D. Intensify
(2) A family drove 116 miles in four hours. At that rate, how manyhours would it take them to drive 203 miles?
(1) 5-1/2 (2) 6 (3) 6-1/2 (4) 7 (5) 7-1/2

Despite all the inherent problems with testing (andthey are many), these are simply not demanding questions! A person who has difficulty answering these questions can hardly be called literate.Saying that these questions do not test relevant skills is a bit like arguing that, because blind and deaf people can live normal lives, therefore tests of vision and hearing are not relevant.

All of the complaints about testing become utterly irrelevant when weighedagainst the simple fact that the tests are so trivial than no literate personshould have problems with them. The math required is nothing more than simplealgebra and grade-school geometry, and the factual knowledge required is nil.Students are not expected to know the name of a single historical figure, stateor country, star, planet, plant, or animal.

The sociological reasoning of the study is interesting. Consider the following three quotes from The Reign of ETS.

The SAT discriminates among virtually all levels of the country's class structure -- across both income and occupation. The more money a person's family makes, the higher that person tends to score ... (p. 200)
Bowles and Herbert Gintis found that people's eventual earnings were explained not by their test scores but mainly by the social class they started from and by various personality characteristics ... (p. 79)
Test scores themselves bear little significant relationship to lifetime earnings success. (p. 79)

First of all, if income is mostly correlated with one's social class, why criticize the SAT for failing to predict income? It makes about as much sense as criticizing the test because it fails to predict height or hair color. More interestingly, the three statements are mutually incompatible; if income and test scores both increase in proportion tosocial class, then how can there be no apparent correlation between income and test scores? People with high incomes, who come from affluent families, should tend to have higher test scores even if the scores do reflect nothing more than social class. And how could we be certain which factor causes the high income?Is it at all possible that affluent people are affluent because they have morediscipline and better attitudes toward learning? Something is seriously wrong with the statistical methodology.

The usual statistical measure of correlation is called the "correlation coefficient". It has a value of 1 for perfect correlation, zero for no correlation at all, and -1 for perfect negative correlation. The correlation coefficient between SAT scores and first-year college grades is 0.35. This is a rather moderate correlation; it would be more useful to examine the correlation between math SAT scores and mathematics grades, and so on. The Nader Group prefers a different measure, the "percentage of perfect prediction", which is the square of the correlation coefficient; in this case 0.119, which is impressively smaller than 0.35. On this basis, Nairn and Associates claim that the SAT scores account for only 11.9% of perfect prediction of first-year college grades and are essentially worthless.

Now, what's the correlation coefficient between social class and SAT score? Here Nairn and Associates pursue a very different course. Unlike the correlation between test scores and grades, for the link between family income and test scores they provide lengthy tables comparing test scores and average family income. There is indeed a correlation, but without the correlation coefficient, we have no way of knowing how significant the correlation is -- and there is no mention of the correlation coefficient anywhere in the discussion! There is a passing reference to it in a footnote in the back of the book; it is -- 0.35! The level of correlation that supposedly makes tests worthless in predicting grades suddenly becomes ironclad proof that social class determines test scores!

There is, for the record, no doubt that test scores among disadvantaged students are lower than among the affluent. Using this correlation to deny the reality that literacy is lower among the poor than among the rich is about the most socially irresponsible stance imaginable.It in fact amounts to intellectual apartheid. No surer means of keepingthe poor isolated (and dependent on social service programs?) could possibly beimagined. Reform college admission procedures, provide more remedial services -- anything but use statistical mumbo-jumbo to deny reality!

Return to Pseudoscience Index
Return to Professor Dutch's Home Page

Created 8 July 1998, Last Update 24 May, 2020

Not an official UW Green Bay site