UPDATE: Given the large number of (largely new) visitors coming to this post, and the various updates I am adding to it, I will leave it at the top for a few days. Regular readers are encouraged to scroll down for new postings. Thanks
The NRC ranking of doctoral programs for 2005-06 is finally out . A few quick points, and then some details about the results relevant to philosophy:
1. The NRC ranking is not a ranking of faculty quality, so not really comparable to the PGR, even the PGR from 2004-06, the closest in time to the data the NRC collected, which was for 2005-06. While in every past NRC report, the NRC collected and published expert opinion about faculty and program quality, this time the NRC did not. Only one of the 20 variables used by the NRC even has any connection to faculty quality, and that variable (major awards and grants [adjusted for faculty size], such as Guggenheims, NSF Fellowships, American Academy membership and the like) is only a very weak indicator, for a host of obvious reasons: only a small number of such awards occurred during the time period studied, so just one or two faculty can make a huge difference to the results on a small faculty; many of these awards favor some areas of philosophy over others [philosophers of science and logicians can get NSF awards, other philosophers usually can't; Guggenheim and ACLS Fellowships appear to favor historians of philosophy and value theorists over folks in philosophy of mind and epistemology; American Academy membership is 'chummy' and tends to go to "friends of friends" and older faculty, thus being a better backward-looking than forward-looking metric--a school with great younger faculty won't get picked up on this metric, etc.].
2. The NRC ranking purports to be a measure of program quality or attractiveness. (The idea that you could measure program quality without having any real measure of faculty quality is, in itself, astonishing.) It purports to do this by aggregating twenty different factors in the humanities, broken into three categories: Research Activity (meaning the one qualitative variable noted above, plus per capita publications, which imposes no quality control for journal, publisher, impact, etc., so is largely meaningless); Student Support and Outcomes (e.g., graduate student funding [rich, private schools fare better on average on this metric], job placement (without, as far as I can tell, any audit of the data schools reported), time-to-degree, availability of student health insurance, and several other variables); and Diversity of the Academic Environment (i.e., ethnic and gender diversity of the faculty and student body). Note an irony about the use of job placement, which I assume got significant weigh in both the overall rankings (about which more, below). A school reporting job placement in 2005-06 for the preceding five years would be reporting on the success of students who chose the school in the early-to-mid-1990s. The last NRC, which included a regular reputation ranking of faculty quality, came out in 1995. My guess is the correlation between the job placement statistics and the 1995 NRC report (and the mid-to-late 90s PGRs) is probably pretty strong, but that, of course, is because job placement is always a backwards-looking measure.
(I do want to emphasize that there is no indication that the NRC audited all the self-reported data from schools: on job placement, time to degree, even the faculty rosters and the CVs. I know from PGR experience that departments err in their self-reporting of information in only one direction. If I'm mistaken about this, please let me know.)
From this mass of data, the NRC constructed two rankings. The R-Ranking assigned weights to these variables in order to mimic, in effect, the results of a secret reputational survey of an unknown number of putative experts in each field (seriously). More precisely: "a sample group of faculty were asked to rate a sample of programs in their fields. Then, a statistical analysis was used to calculate how the 20 program characteristics would need to be weighed in order to reproduce most closely the sample ratings. In other words, the analysis attempted to understand how much importance faculty implicitly attached to various program characteristics when they rate the sample of programs." The NRC does not report the results of its reputational survey, amazingly. Nor is it clear on my reading whether there was any reason to think that faculty evaluating programs were even aware of, let alone interested in, some of the NRC variables. The NRC insists the R-Ranking is not a reputational survey, and that is right. It's essentially a weird and not very reliable approximation of a reputational survey of an unknown group of evaluators of unknown size. (UPDATE: On page 198 of the NRC report, we learn that "up to 200" evaluators for each field were surveyed, and on p. 286, we learn that a total of 171 philosophy faculty [no indication of how they were chosen, or what distribution of expertise or areas they represented] were each asked to evaluate not more than 50 programs; that each program had an average of 46.7 faculty evaluate it, with a low of 34 faculty for some programs, and a high of 57 faclty for some others. A typical PGR survey collects responses from between 250-300 faculty for each program evaluated, and, of course, the list of evaluators is public. Note that not all the philosophy programs were evaluated by any rater--see the comments by Stigler in Update #8, below.)
The NRC also calculated an S-Ranking, which assigned weights to the variables based on the weights respondents in each field said the criteria deserved. This is vulnerable to the confusions Ned Block (NYU) noted.
Given the huge range of variables and the baroque methodology (which will no doubt generate its own cottage industry of commentary), it should not be surprising that the results (for Philosophy) qualify as somewhere between "odd" and "inexplicable."
3. The huge time lag--the NRC Report released today is already five years out of date--is quite significant, especially for the "Research Activity" measures, which, because it employs per capita (or percentage of the faculty) measures, can be quite sensitive to just one or two faculty movements. And since the Research Activity measures were, it appears, given the most weight in the R- and S-Rankings, these changes are also quite significant. (One guesses, of course--I can't tell from the material I've seen--that it was the Awards & Grants that dominated, since per capita productivity is such an obviously poor measure.) So, consider that Yale, which according to the NRC is not close to the top 25 in either the R-Ranking or in "Research Activity," did not have on its 2005-06 faculty two highly 'decorated' and recognized senior philosophers, Stephen Darwall, who moved from Michigan, and Thomas Pogge, who moved from Columbia. (Even without them, it seems bizarre that Yale was not in the top 25 for "Research Activity.") Given the relatively small size of the Yale Department, it's hard to see how just these two, even by the NRC's criteria, would not have changed the results significantly. Similarly, the 2005-06 University of Chicago faculty roster would have included John Haugeland (Guggenheim winner, now deceased), Charles Larmore (Fellow of the American Academy, now moved to Brown), and William Wimsatt (productive and influential philosopher of biology, now retired). These examples could be multiplied in both directions, though I think the important point to remember is that the NRC was not really measuring the philosophical quality of the faculty.
Putting these criticisms and concerns to one side, there is data in the NRC Report that should be of interest to prospective students, particularly, the systematic data gathered on time-to-degree. I hope they will make that data easily available on the Internet.
Below the fold, a sample of some of the results for Philosophy.
Here is a ranking of the top 25 philosophy programs based on their R-Rating. The rank was given as a 5th percentile to 95th percentile range based on 500 permutations of the weighted variables, allowing for margins of error. For ease of presentation, I just sum the low and high rank (I give the range in the second-parentheses):
1. Rutgers University, New Brunswick (4) (1-3)
2. Princeton University (8) (1-7)
3. University of Michigan, Ann Arbor (9) (1-8)
4. New York University (10) (2-8)
5. University of California, Berkeley (13) (2-11)
6. University of Chicago (14) (3-11)
7. University of Pittsburgh (Philosophy) (20) (4-16)
8. Boston University (21) (5-16)
8. Cornell University (21) (2-19)
8. Massachussetts Institute of Technology (21) (3-18)
11. University of Pittsburgh (HPS) (23) (5-18)
12. Duke University (25) (5-20)
13. Brown University (26) (8-18)
13. University of Notre Dame (26) (7-19)
15. Harvard University (27) (8-19)
16. University of North Carolina, Chapel Hill (34) (11-23)
17. Stanford University (35) (11-24)
18. Columbia University (37) (11-26)
19. University of California, San Diego (42) (15-27)
20. Syracuse University (47) (13-34)
21. University of Texas, Austin (48) (13-35)
22. Boston College (50) (13-37)
22. Georgetown University (50) (17-33)
24. Carnegie-Mellon University (53) (16-37)
25. University of California, Davis (54) (19-35)
Among the top departments that didn't make the NRC's top 25 were Arizona (sum of 58, range of 21-37) and UCLA (sum of 59, range of 19-40). Other bizarre results: UC Irvine (142), Yale University (76), Southern California (104). The absurdly low scores for Yale and USC, which have both improved in quality significantly over the last decade, may be attributable to the job placement factor, which has the problem noted above.
Here are the top schools for 2005-06 based on "Research Activity" according to the NRC, meaning per capita productivity and percentage of grants/awards during the survey period (e.g., Guggenheims, NSF grants, American Academy membership and the like); I have summed the 5th and 95th percentile rankings (so, e.g., Brown had a range of 15-34, so got a 49 in the chart below):
1. Princeton University (6)
2. Columbia University (8)
2. Stanford University (8)
4. Rutgers University, New Brunswick (9)
4. University of Chicago (9)
6. University of Michigan, Ann Arbor (13)
7. University of California, Berkeley (14)
8. Massachussetts Institute of Technology (15)
9. Carnegie-Mellon University (18)
10. University of Hawaii (24)
11. Duke University (25)
11. New York University (25)
13. University of Pennsylvania (31)
14. University of California, San Diego (32)
15. University of Wisconsin, Madison (38)
16. University of Texas, Austin (41)
17. State University of New York, Stony Brook (43)
17. University of Miami (43)
19. State University of New York, Binghamton (45)
20. University of Rochester (46)
21. Brown University (49)
22. University of Notre Dame (51)
23. University of Colorado, Boulder (52)
24. Pennsylvania State University (53)
25. Harvard University (54)
25. University of Pittsburgh History and Philosophy of Science (54)
Just outside the top 25, but still in the top 40, were Cornell University (66), University of Arizona (65), University of California, Riverside (64), and University of North Carolina, Chapel Hill (75), among others. Well outside the top 40 were UCLA (107), Pittsburgh Philosophy (88), and Yale (112). One might ask whether it is a sign of strength, or an embarrassment, to do well in a ranking of "Research Activity" that fails to put UCLA, Pittsburgh, and Yale in the top 20, let alone the top 40. Of course, if we bear in mind that this is mainly a measure of productivity without regard to quality or impact, the results may make more sense.
As I have a chance to digest more of the NRC report (which is huge), I will add updates to this post.
UPDATE #1: There's an amazing (and informative) piece in IHE revealing that even the authors of the report don't really believe the results! An excerpt:
The advance briefing for reporters covering today's release of the National Research Council's ratings of doctoral programs may have made history as the first time a group doing rankings held a news conference at which it seemed to be largely trying to write them off.
While the NRC committee that produced the rankings defended its efforts and the resulting mass of data on doctoral programs now available, no one on the committee endorsed the actual rankings -- and committee members went out of their way to say that there might well be better ways to rank -- better than either of the two methods unveiled....
Rankings have been criticized in the past for suggesting false levels of precision, but that isn't a criticism you'll hear about this process.
"We can't say this is the 10th best program. We can say it's probably between 5th and 20th," said Jeremiah P. Ostriker, chair of the NRC committee that prepared the rankings, and a professor of astronomy and former provost at Princeton University. The approach used is "a little bit unsatisfactory, but at least it's honest," he said. When one of the reporters on a telephone briefing about the rankings asked Ostriker and his fellow panelists if any of them would "defend the rankings," none did so....
Richard Wheeler, a member of the NRC committee who is interim vice chancellor for academic affairs at the University of Illinois at Urbana-Champaign, said that the results of the rankings "would have been quite different had the methodology been tweaked in different ways," and "we don't want to claim that these are the only possible results."
UPDATE #2: A representative sample of how a university (Princeton) is reporting the results.
UPDATE #3: CHE has put together a nice tool for exploring the underlying data. (Having now spent some time on it, I highly recommend it--click on Humanities, and then Philosophy, then choose a department and it will show you that department's score in the different categories relative to the median score in that category and the range; then click on another department to see how it compares to the initial department chosen.)
UPDATE #4: Stanford's press release is a bit more sober than Princeton's!
UPDATE #5: "Awards per faculty member" was, one suspects (and the results bear this out), deemed an important factor in the S- and R-Rankings, so let's take a look at the results for Philosophy in that category. On p. 41 of the NRC Report, the authors explain that "data from a review of 1,393 awards and honors from various scholarly organizations were used for this variable. The awards were identified by the committee as 'Highly Prestigious' or 'Prestigious,' with the former given a weight five times that of the latter." I have not yet found a list of what counts as "Highly Prestigious" and what counts as "Prestigious," but a safe bet is that an NSF, or a Guggenheim, or an NEH Fellowship, or election to the American Academy of Arts & Sciences counts as "Highly Prestigious" while something like, say, the APA's Kavka Prize for an article in political philosophy counts as "Prestigious." Here are the top ten in "Awards per faculty member" according to the NRC (the per capita figure in parentheses):
1. University of Michigan, Ann Arbor (4.91)
2. Duke University (4.70)
3. Massachussetts Institute of Technology (3.92)
4. Cornell University (3.90)
5. Princeton University (3.87)
6. Stanford University (3.81)
7. Rutgers University, New Brunswick (3.34)
8. University of Chicago (3.32)
9. University of California, Berkeley (2.99)
10. Brown University (2.84)
These are all major research departments, but surely the particular ordering of them here bears almost no relationship to the overall quality of the faculty. The particular ordering is also no doubt an artifact of the particular time period studied.
UPDATE #6: Philosopher Don Hubin (Ohio State) writes:
I’m writing to make a quick correction and then mention a number of points that might be of interest to you in your analysis of the NRC assessment of research doctoral programs.
- The NRC threw out the university-reported placement data and relied on the Survey of Earned Doctorates. This has problems of its own, but not the ones you raise about the university-reported data.
- The NRC decided to rely only on GRE verbal scores for all disciplines in the humanities. This potentially skews results when we compare programs that are likely to rely more heavily on the quantitative score (say one with a heavy emphasis on logic or other highly formal area of philosophy) or the analytical writing score with programs that are likely to be concerned almost exclusively with the writing score. So, if a program is rejecting people with high writing scores in favor of those with more balanced scores or particularly high quantitative scores, there will be a differential effect of throwing out the information on GRE quantitative and analytical writing scores. (This might help to explain some of the variation between the S-Rankings and the R-Rankings. My idea is that many people said that they thought GRE scores were very important in measuring the quality of a doctoral program thinking of all facets of the GRE. Then, when the NRC used only the verbal, programs that are less likely to rely on quantitative scores rose relative to others. But when people ranked the programs and the NRC did the regression analysis to determine the relevant factors, GRE scores receded in importance because the GRE scores that the NRC used weren’t as highly correlated with the rankings as they would have been if the full GRE information had been used.) It’s an interesting example of how the classification of philosophy as a humanities can skew the rankings. For social sciences, the NRC ignored the verbal portion of the GRE. On the flip side, Linguistics at OSU is in Humanities, but because the NRC classifies it as a social science, its students’ quantitative scores were counted and not their verbal scores.
- The “research activity” score:
- First, relied solely on ISI data for articles (not books—which were self-reported on CVs) and ignored citations (in the humanities—in the natural and social sciences, citations were counted).
- Looked at publications of those on the faculty roster in 2006, calculated a publication score (5 for books an 1 for articles [which may, itself, skew things within the profession if certain sub-specialties are more likely to produce their research in books than articles]) for all publications between 1986 and 2006,and then divided that number by the number of faculty in 2006. This means that departments with younger faculty will be disadvantaged relative to those with more faculty who were publishing throughout the 20 year period. This is a complex issue. One could argue that a faculty that has more people with a longer time span of publications is, in virtue of that, stronger than another with younger faculty, even if the second is publishing at a much faster rate now. But it is an effect of the method used.
The enormous weight given to books relative to articles explains a lot, I suspect, about the somewhat bizarre 'research activity' and R-Ranking results in philosophy. More on that soon.
UPDATE #7: The slightly embarrassing spectacle of an astrophysicst at Penn State defending the ranking methodology because his program ranked well prompts another astrophysicist, Stacy McGaugh (Maryland) to make a scathingly funny observation in the comment section:
[T]his sort of ranking is an exercise so nearly devoid of meaning as to be nearly useless. Perhaps even dangerous, as university administrators might invest their resources accordingly.
Robust statistical masturbation is merely vigorous, not virtuous. I do not doubt the good intentions of those who did the analysis, but astronomy (like physics) has become too diverse for this to be meaningful, as you imply in your preceding posts. Compare X-ray to planetary astronomy. Citation rates are way different. Does that mean one is intrinsically more important? NASA ought to seriously realign its budget if so, and not along the lines suggested by the latest decadal survey.
I would say this is a matter of comparing apples and oranges, but really it is more like averaging oranges and cats. You can call the 5th percentile the head and the 95th percentile the tail, but neither end purrs and both smell funny.
UPDATE #8: There's a quite damning analysis by Stephen Stigler, a leading statistics scholar here at the University of Chicago, which deserves to be read in full; I'll just quote part:
Since the premise of all the 2006 ratings is that a simple weighted average of rather crude variables can capture genuine quality (mathematically assuming that quality lies on a one dimensional subspace of the 21 dimensional space of it and the variables), I believe the project was doomed from the start when they downplayed reputation, the most valuable measure of earlier studies. If a program was not surveyed (most were not), and differed other than linearly from others (thought of as surrogates) that were surveyed, it would find itself out of synch with those surrogates. The NRC refuses to tell which were surveyed. But even if all had been surveyed the results would be dubious, since the assumption that quality can be captured by such a crude weighted average of crude measures is not credible. For example, the Dimensional Ranking of “Research Activity” in most fields is based on weights of the four components that are close to equal, and is hence about the same as would come from a simple average of the four component indices. But measures like “Publications per Faculty” and “Awards per faculty” will depend crucially of how the faculty is defined and these definitions varied (considerably in some cases) among programs, as they should be expected to. And the data on publications and citations came from a Web of Science data set that reflected activity well before 2006 and relied upon a difficult and problematic file matching procedure (e.g. by author’s zip code) to link to then–current faculty lists. The data had a built in long time lag in 2006 and are quite ancient now.
The results do not include the actual ratings or rankings but give instead what are purported to be bounds (5% and 95%). That these reflect mostly trivial variations in weights (see spreadsheet for coefficient SDs in different fields) indicates that the distinctions being drawn are mainly trivial; that is, the variables being weighted do not capture interesting differences in quality between programs. One use, if absolutely necessary, of these bounds is to take the upper limit (5%) as the ranking, and the R Rankings at least might have a slight input from reputation in some cases. But little credence should be given to any of them.
UPDATE #9: This may be a first for an NRC report on graduate programs: a major professional association representing scholars (in this case, computer scientists) has issued a statement denouncing the NRC ranking, and an NRC official concedes the correctness of the criticisms!