I'll have more to say then, but now seems a good time to bring forward Ned Block's earlier comments about the methodology and its peculiarities:
I am not criticizing using polling to derive rankings. That is what the PGR does and I (like a lot of other people) think the PGR ratings, despite flaws, are good enough to be of considerable utility. However, it is already clear that the NRC procedures are so poor as to make their results specious and misleading. It is important to be clear on what the NRC did. The 21 variables that they used (e.g. number of faculty publications, whether students have health insurance, percentage of international students) were not arrived at by any kind of research. They were a result of deliberation within the NRC committee. There were two methods of “weighting” the 21 variables (20 in the case of the humanities since citation data in the humanities was inadequate)....
The first method was just asking people about their views. These were the “direct weights”, and the method used was NOT complicated or hard to understand. They asked people which characteristics were "the most important to program quality". (The one minor complication was that they asked this question within each of 3 categories and then asked people a similar question about the categories themselves.) My point against this procedure is that in asking people which characteristics were “the most important to program quality” they did not distinguish between indicators of (i.e. factors that give information about) program quality and features that themselves constituted a kind of program quality even if not very relevant to the all important intellectual quality of the program. I mentioned that the percentage of students with portable fellowships (Mellons, NSFs) is an indicator or assay but not itself a kind of program quality. I mentioned health insurance as something that it is itself a kind of quality but not much of an indicator. This is an absolutely crucial confusion since (at least I would argue) the use of many of the variables including the 5 measures of diversity (of 20 variables) could not be justified on the basis that they are significant indicators of the intellectual quality of the program. The whole rationale of assembling a lot of weak indicators into a stronger measure is undermined if the question asked is straightforwardly ambiguous in this obvious way. In addition, it is a repeated result in experimental psychology that people who can make an expert evaluation (e.g. a doctor evaluating symptoms or a faculty member evaluating a graduate application) are not usually very good at saying what the factors are that justify the evaluation or which is more important. Thus academics may know high quality work when they see it but not what justifies the evaluation.
The second method of weighting the variables is the one that is hard to understand. But in overview it is simple. They ascertained which weights made the variables correlate best with a PGR-like ranking. Here my point is that those who object to the PGR ranking (e.g. Christopher Gauker) can hardly have high expectations of a ranking to the extent that it is derived from a PGR-like ranking. A further point is that the NRC has not told us HOW WELL their weighted variables predict the PGR-like ranking. And they are not revealing the PGR-like ranking itself. I think this is an astonishing piece of intellectual acrobatics. They got PGR-like rankings from their respondents but won’t tell us about what they were because they involve mere opinions about the quality of philosophy programs, but they think well enough of those opinions to use them to choose the weights of the variables they used. They actually describe themselves as using the PGR-like rating to weight “objective variables” so as to “imitate, to the extent achievable, the judgment criteria of the initially surveyed faculty” [i.e. the PGR-like rating]. The chair of the NRC committee (Ostriker) and one of the members (Kuh) cite a joint 2003 book as having shown how to do this. Christopher Gauker is right that for each evaluated program, they did provide a link to a page that provided information about the program. One thing I like about their PGR-like ranking (the one they aren’t going to tell us the results of because it isn’t “objective” enough) is that they asked for both a rating and how familiar the rater was with each program. This is something the PGR could learn from (though at the cost of complicating the process).
The NRC methodology is so technical (see Ostriker & Kuh, 2003) that most readers will lack the expertise to understand it. The key fact to focus on though is that eyeballing the list of 20 variables they used, only 4 have much prima facie plausibility as indicators of intellectual quality (as opposed to other kinds of goodness) in a philosophy program. Those are the number of publications of faculty, percent of faculty holding grants, faculty awards and GRE of students. The first of these is rendered uninteresting because no attempt is made to evaluate the substantiveness of the publication or the quality of the venue. One point for articles, five for books; that’s it. The percent of faculty holding grants is a matter of faculty holding grants at the very moment they filled in the form, rendering it uninteresting. GREs…well we all have our own opinion of the value of GREs. The award variable seems to me the most significant, although as Brian pointed out, some awards just perpetuate traditional hierarchies. Putting it all together, no matter how wonderfully advanced their method of weighting these variable is, it is unlikely that the information is there in those variables to make a highly significant predictor of actual intellectual quality of philosophy programs. And when you add the fact that it is already 3 [ed.--now five] years out of date, the word ‘pathetic’ comes to mind.
Recent Comments