Advertise on LR

Recommended Blogs

Search


« PhD Admissions (Again): Info Sought on the Season Just Concluded | Main | Ludlow from Michigan to Toronto »

Using Google Scholar to Assess the Impact of Philosophical Work (J. Stanley)

Academics spend much time trying to assess the relative merits of work in an area. There is no sure-fire way to do this of course. But citation indices are one method to assess the impact work has had on an area. Though philosophers are loathe to use them, they are widely used in other disciplines. Citation indices of course do not tell us everything we need to know to make such a judgment. Much work is of very high quality, but sufficiently specialized to be of interest to only a very few. Conversely, someone can write a paper that sparks a great deal of interest for its obvious flaws. Nevertheless, one can hope that citation indices could give us at least some sense of the major themes in a subject area. My sense is that as philosophy has become more specialized, more and more philosophers have simply lost contact with what is being currently discussed in journals and books. One might hope that citation indices could provide a rough objective map of the terrain of an area that can be used in place of word-of-mouth. 

Since I discovered Google Scholar about six months ago, I’ve been comparing its citation results to my general sense of what is going on in fields in which I work. Generally, it seems quite accurate – papers that have had a significant impact in an area have had correspondingly greater hits on Google Scholar than papers that have had smaller impacts. For example, some test cases: two much-admired recent papers that have created significant literatures in epistemology are Jim Pryor’s paper, “The Skeptic and the Dogmatist” and Adam Elga’s “Self-Locating Belief and the Sleeping Beauty Problem”. Google Scholar correctly reveals this; these are two of the most cited papers in epistemology since 2000 (68 for Pryor’s paper, and 40 for Elga’s paper). Keith DeRose’s “Solving the Skeptical Problem” is one of the most influential papers in epistemology written in the past thirty years, and Google Scholar again reveals this; it has 187 hits, despite being published as recently as 1995. Ted Sider’s book Four-Dimensionalism and Timothy Williamson’s Knowledge and Its Limits have been hugely important works, and Google Scholar clearly reveals this (266 hits and 303 hits respectively, despite publication dates of 2001 and 2000, respectively). One paper that has had no impact whatsoever in its field is Jason Stanley’s 1998 contribution to the literature on personal identity, “Persons and their Properties”. Again, Google Scholar correctly reveals this, since this paper has no hits.

There are of course pitfalls to using Google Scholar. First, one should refrain from comparing hit numbers across areas of philosophy. Some areas of philosophy (e.g. philosophy of mind that borders on philosophy of psychology and philosophy of language that borders on linguistics) are cross-disciplinary, and so have created literatures in multiple fields. This naturally increases the number of researchers reacting to these papers, and correspondingly the number of hits. If one wants to compare the impact just in philosophy of a certain work, this makes things difficult. Furthermore, some areas of philosophy seem to involve more citation than others, or simply more researchers. So one must take care to compare (e.g.) only work in history of modern to other work in history of modern, or work in meta-ethics to other work in meta-ethics. Finally, it takes a number of years for the impact of a work to register on Google Scholar. The publication date of an article is a very large factor, as the older an article or book is the more hits it will receive. It is not yet possible to use Google Scholar to assess the impact of publications from 2004 or after. So in judging the relative impact of work, it’s best to compare work that was published at roughly the same time. Nevertheless, after several months of procrastinating with it, in areas such as metaphysics, epistemology, philosophy of language, and philosophy of mind, when used with appropriate caution, it does deliver results that accord with my sense of what papers and books have created some of the major debates in these areas.

Comments

One additional note of caution: Google Scholar picks up and counts self-citations, and it also picks up citations in on-line CVs, dustjackets of books which might list other books published by the same press, and so on.

A further quirk should be taken into account. It appears that a single reference in a BBS piece (target article, commentary, or author's/authors' response) generates a number of hits equal to the total number of commentaries on the relevant target article plus two (the target article and the response to commentaries) -- a number that is frequently in the thirties or forties. BBS collects together the works cited by authors of the target article and their commentators into a single, master list of works cited. It seems that Google Scholar then treats this master list as the works-cited list for each commentary, target article, and author’s/authors’ response, taken individually. So, double-check BBS citations; what looks like, say, thirty-five hits is likely to be only one or two.

The "relative merit" of a work and its "impact" as measured by a citation index are distinct things. How would Frege have scored in his lifetime on Google Scholar if it had existed? Not well. A citation index is no substitute for the exercise of judgment no matter how hard that may be.

Another thing to note is that it doesn't include every journal yet, and for many journals can only search more recent papers. Thus, if you're looking at a relatively old paper (say from the '60s or '70s), a lot of the papers that actually cited it won't have been searched, so the number will be artificially low. Also, if some important journal in a field isn't indexed, then articles in that field may well have artificially low numbers too.

Anyway, I find it really useful just for getting the articles, especially if I can't be bothered to remember the online address for the journal it's in, or head in to the library that day.

And the comparison with the Web of Science? The latter counts only citations in published papers, not in papers on websites, and only from selected journals. Is there a respect in which Google Scholar is better?

About Frege: the "exercise of judgement" by the bulk of his contemporaries would likewise not have ranked him highly. The exercise of judgement today does rank him highly, but so does a citation count today. It's when he's assessed, not how, that matters.

Citation analysis is fallible, for reasons like those Jason states. But is any other method of assessment, including "judgement," clearly less so?

My impression is that google scholar is pretty frustrating for research in the history of philosophy. It seems to turn up classic (foundational) texts, but not hot ones (ones with immediate contemporary relevance). For instance, the search "aristotle psychology" (my current field of research), turns up Kahn's AGP article from 1966(!), Brentano, Ross, and Hammond (1902 I think). But no Burnyeat, Caston, Modrak, or Wedin. Or at least not on the first 2 pages. This is not a good sample.

although I just did "aristotle mind" instead of "aristotle psychology" and the results were much better.

For those who are really fascinated by this sort of thing, you can obtain a calculation of your 'h-number' based on Google Scholar, which is meant to be a cross-discipline citation ranking.

A scientist has index h if h of his Np papers have at least h citations each, and the other (Np - h) papers have at most h citations each. (For more details, see Wikipedia, of course: http://en.wikipedia.org/wiki/Hirsch_number.)


The site where I first saw this doesn't operate any more because Google has blocked it:

http://www.brics.dk/~mis/hnumber.html

However there said to be is a 'far superior' one, which requires you to download a programme to your own computer, at:

http://www.harzing.com/resources.htm#/pop.htm

I haven't tried this, but when I was told about the first one I tried it out thinking that it was bound to be hilarious, but to my horror found that, on the whole, the better-regarded a philosopher the higher their h-number, although it only worked reliably for people who were already very well-known.

I was told that some social scientists think that this is such a good measure of research quality that we should replace the Research Assessment Exercise with it, and the whole thing could be done on a computer overnight.

FWIW, I've been told by seemingly reliable sources that the average "half life" for philosophy articles (the period within which they see half of the citations they will eventually get) is 20 years or so. So my guess is that most of what you'll get that's recent will be good and some of it may be important, but that some good and important stuff takes a while to make an impression. (Frege's half life would obviously be much longer than this.)

Tom, as I see it google scholar's primary advantage over web of science for these purposes is ease of use. One doesn't have to cope with separate middle initial/ no middle initial searches; it recgonizes complete first names, where web of science starts with first initials and then makes the user decide which results are relevant; the ability to include author name and other search words in the same search lets the user either find one article or be sure that the search is focused the right way.

For reasons people have mentioned, GS is imprecise-- it won't generate the true number of citations in all-but-only peer-reviewed journals. But for work within the same subfield and the same generation, I think it's *unbiased*, which is what one typically cares about for the comparative purpose of citation count rankings. And the ease of use is much, much greater: fewer steps, boolean searches, better name recognition, etc, etc. If one is trying to get a sense of citations attached to a number of articles or authors, the time savings add up quickly.

I downloaded Harzing's program and played around with it a bit. You have to eliminate all the irrelevant results (for example there are other Michael Kremer's in the world with many more publications than me to their credit!). You can narrow your search by year or by field, but only as narrow as "Humanities, Arts, and Social Sciences". The program gives you not only the h-index but several others. The two most interesting appeared to me to be:

h-index: if your h-index is n, n is the greatest number such that you have at least n papers with n or more citations each.

g-index: if your g-index is n, then n is the greatest number such that your first n papers have at least n^2 citations when taken together.

g-index is always higher than h-index, but g-index can give a different picture than h-index if there is one highly cited paper.

Anyway, I found the following interesting -- you can study not only individual citation impact but also journal impact. So I tried looking at journal impact for the period 1997-2007 for some well-known journals.

Here are some results (I did try to screen out irrelevant results -- typing in journal title "philosophical quarterly" gives you APQ and PPQ papers for example -- note that you only have to do this for the top n results where n is g-index, since results below that don't figure in either h or g-index).

Journal of Philosophy: h = 26, g = 34
Mind: h = 21, g = 27
Philosophical Studies: h = 21, g = 29
Philosophy and Phenomenological Research: h = 19, g = 27
Philosophical Review: h = 18, g = 29
Nous: h = 18, g = 26
Synthese: h = 14, g = 21
Ethics: h = 14, g = 19
Journal of Philosophical Logic: h = 13, g = 18
Australasian Journal of Philosophy: h = 13, g = 19
Erkenntnis: h = 13, g = 17
Proceedings of the Aristotelian Society: h = 12, g = 19
Philosophical Quarterly: h = 11, g = 16
Canadian Journal of Philosophy: h = 10, g = 14
European Journal of Philosophy: h = 10, g = 12
American Philosophical Quarterly: h = 9, g = 13
Pacific Philosophical Quarterly: h = 8, g = 14
Philosophy: h = 5, g = 8

So far this mainly corresponds to what I expected. The comparison between, say, Phil Studies and Phil Review clearly reflects at least in part the relative number of papers published in these two journals (Phil Studies coming out more frequently and publishing more paper per issue).

On the other hand, here are a couple of astounding numbers reflecting other factors:

Behavioral and Brain Sciences: h = 68, g = 167.

This must reflect the way in which Google counts citations for BBS papers, noted by Rob Rupert above. But I also expect it has something to do with the way people working in that field cite papers. Something similar seems to be at work in the following, without any influence from anything like BBS's citation system:

Linguistics and Philosophy: h = 33, g = 53.

(In fact, possibly the numbers for Journal of Philosophical Logic above are affected by similar citation tendencies within the subfield.)

And here's one example to show how the g-index can diverge from the h-index:

Review of Metaphysics: h = 4, g = 23. The high g-index reflects one paper with 472 citations -- the next most cited paper has only 7 citations, and there were only 6 papers with 4 or more citations. (The h-index is 4, not 6, because there were not 6 papers with 6 or more citations.) The 23 papers going into the g-index included several with 1 citation each.

Michael:

Interesting about the journal h-indexes, but isn't the calculation affected by how many papers a journal publishes in a given year, i.e., the more papers a journal publishes the better its chance of having (large) n papers with (large) n citations? (In the same way, an individual who publishes more papers is likely to have a higher h-index.) This would explain why Phil Studies, which publishes a gazillion pages a year, has a higher index than Phil Review; it may also explain the high indexes of Synthese and Erkenntnis. When the h-index is applied to individuals, it makes sense to give credit for publishing more, since someone who does that is, at least potentially, making more of a contribution. But I don't think it makes sense for journals -- publishing more papers in a year just isn't a merit in a journal.

Tom,

"isn't the calculation affected by how many papers a journal publishes in a given year"

Yes, and I said that in my post! Precisely about Phil Studies and Phil Review, in fact.

That's entirely right; these scores reflect quantity as well as quality. So the Philosophical Review does pretty well for a low-volume journal.

But I think the scores are still fairly interesting for a lot of reasons. I'm kind of stunned by J Phil's score. The difference between 26 and 21 on a measure like this is enormous. And, apart from with respect to Phil Review, the difference can't be explained by a greater volume of papers published. I would not have guessed that. Nor would I have guessed how even the scores are after the first six. Really interesting stuff.

Sorry, Michael: I was reading and posting quickly at the end of the day. About another of your suggestions, it would indeed be interesting to know how different citation practices are between different subfields in philosophy, i.e. do epistemologists cite more than ethicists? The differences between disciplines are enormous. I remember a citation analysis done on a 20-year span on the Web of Science and while the average philosophy paper was cited (I think) 1.2 times, the average economics paper was cited 5.5 times, and in some of the sciences the number got up into the teens and twenties. (English Lit was, I think, 0.6 citations per paper, or maybe 0.2.) Though Google Scholar counts more things than the Web of Science, e.g. papers on websites, a philosophy paper cited 10 times (as is needed to contribute to an h-index of 10, like CJP's) is already an unusually heavily cited paper.

Brian,

Well, I should correct the J Phil score. I looked at it again, and it seems I neglected to eliminate the false positives on that one (Canadian J Phil, European J Phil, Australasian J Phil). Even so the score is high:

J Phil: h-index 24, g-index 32.

Though not as far ahead of the pack as I first thought.

My experience is that the social sciences cite very heavily, compared with philosophy, and that students are trained to cite absolutely everything that has been written about their subject.

There's a lesson here -- if you want to look good in these indices, write something that social scientists feel obliged to cite. I wrote a book which is (understandably) not widely cited in philosophy, but is a contribution to a literature in the social sciences which continues to grow exponentially, so that it is very heavily cited despite, I suspect, being almost completely unread. (I suspect this because one prominent contributor to the literature cites it regularly, but I know he hasn't read it because I could tell from his review of it!)

Here are a couple of thoughts about Jason's post. First, no measurement of impact will ever be used with caution by university administrators. Just take a look at the "caution" with which administrators use the US News and World Report rankings or the Leiter Report rankings. Second, I want to emphasize just how inaccurate a measurement device Google Scholar is, even when used with the caution Jason rightly recommends. 1. A paper need not be widely cited in order to have a significant impact on the profession. For example, if I think that a paper has solved a problem, I probably won't cite it in future published work, since much of my published work concerns problems that, in my view, have yet to be solved. 2. It can easily happen that a widely cited paper does not have a significant impact on the profession. One of the most common reasons for citing a paper is to criticize it. So the more flaws a paper has, especially if it is written by a well known philosopher, the more Google hits it will generate. Why rely on a measurement device that ranks flawed papers by big shots as having greater impact than fabulous papers by relative unknowns? 3. The more "connected" one is in the profession, the more likely it is that one's work will be read, and so the more likely it is that one's work will be cited. Conversely, the less "connected" one is, the less likely it is that one's work will be cited. So citation indices often don't track work that *should* have a significant impact on the profession.

Sam,

My claim was never that Google Scholar accurately measured philosophical "fabulousness"! Obviously, my paper "Persons and their Properties" is a fabulous paper; indeed it is the ne plus ultra of fabulous papers. It just has had no effect on the field whatsoever. Rather, my claim was that Google Scholar is an accurate indicator of which papers have made substantial impacts on a field. The way I judged this was by taking the fields I knew well and writing down lists of papers that seem to me to have made the greatest impact in terms of shaping debates in those fields. Then I wrote down a list of those papers in the fields I know best that, for whatever reason, really haven't affected people's thinking. The papers in the first category received very high google scholar rankings, and the papers in the second category generally didn't. Then I asked people in other fields what their sense of the papers in the last ten years were that had these characters, and I repeated the experiment. Those seemed accurate as well.

Of course, the property of having an impact in the field is a property an object has partly in virtue of sociological facts. But I strongly disagree with your claim that the facts have to do with being "connected". Many (most?) "connected" people have never written highly cited google scholar papers. Many people who are now "connected" are so not because of who they knew in graduate school, but because they have written influential work, work that now has a very high google scholar rank.

Jason,

1. I too am focused on impact rather than fabulousness. Judging by Google Scholar, my "fabulous" papers have had virtually zero impact. 2. It stands to reason that the papers that have shaped debates will have a high GS rank and that the papers that haven't shaped debates will have a low GS rank. What I would like to know is whether there are papers with high GS rank that have not shaped debates, and whether there are papers with low GS rank that have shaped debates. This is the question of false positives and false negatives. As far as I can tell, your informal method of judging the relation between impact and GS rank does not answer this question. 3. I completely agree that many connected folks have written papers with low GS rank. What I would like to know is whether being connected makes it more likely that one's work will achieve a high GS rank. Of course, it's hard to distinguish cause from effect here, just by looking at connectedness and GS rank. Does high GS rank result in connectedness? Does connectedness result in high GS rank? My own sense (based on no evidence whatsoever) is that the answer is yes to both questions.

Anecdotal reports leave me with the impression that administrators are fairly cautious with their use of U.S. News. They are less so with the PGR, but, of course, the PGR is a much better measure, so there's less need for caution!

I'm fairly agnostic on how good a tool GS is for measuring impact on the field. I've been using GS for a while now -- I find it helpful not only for finding papers on-line that I'm interested in, but especially for finding on-line papers that have already discussed papers I'm interested in. So I think I have *something* of a sense of how it works, but am far from an expert. Based on my highly fallible rough sense, I think that how many citations get counted is quite sensitive to whether those discussing a work are fairly "techie" or "wired" scholars. As has been mentioned, I think GS not only picks up citations from published sources, but also from on-line drafts of papers that scholars have posted on their web sites, etc. So the way to score big on GS is to have a paper that's popular among, for instance, those who post drafts of their papers on-line, and are in other ways relatively "wired." Jason advises not comparing hits across areas of philosophy, partly because work in some areas is more likely to be of interest to those outside of philosophy, and would therefore skew the results as a measure of impact within philosophy. Some, in the interests of interdisciplinarity, will be interested in impact on other areas as well as within philosophy. But here it becomes important if -- as I think is the case -- fields vary quite a bit in how "techie" or "wired" they are. It seems to me that papers that are of interest to, say, psychologists or linguists are likely to generate a high rate of citations that will be counted by GS. By comparison, someone working in say, the history of philosophy may well write material that has an impact among those, for example, who work in history departments, but will get relatively little credit for this on GS, because history is not a very "wired" discipline (or so is my sense of it). Likewise, some portions of philosophy seem more "wired" than others. This is likely true not only across areas of philosophy, but perhaps when things are broken down in other ways. Younger philosophers tend to be more wired than older ones, perhaps, in which case those whose work is popular among the younger set will generate a better GS score, even within an area of philosophy, than those whose work has had as great an impact, but among older philosophers? Anyway, in various ways I think you have to watch out for apparent differences being not so much real differences in impact so much as differences in impact *within the relatively heavily "wired" portions of the field(s)*.

Well, anyway, that's just a further note of caution. But, like I say, for all I know GS may be a relatively good way of measuring impact, especially when one watches out for certain sources of distortion in its measuring.

What I really want to speak against is a *possible* suggestion of what Jason writes. (It's only possible, and Jason may well agree with me here.) As has come out in other blog discussions, I am very concerned that the prestige of the graduate program one gets into and the prestige of one's first job play too big a role in how well one is able to compete for the "goodies" dispensed in our field -- That, say, when a department is hiring, they are too likely to go for a candidate who, because s/he is in a high-prestige job already & travels in high-prestige circles, gets a lot of good word-of-moth as opposed to a candidate who, though stuck in a relatively low-prestige job, has done better work. And I know this is a concern Jason very much shares. My suggestion (which I think Jason agrees with) has been to give publication record -- by which I basically mean just what one can tell about a candidate's publications from their CV (how many papers published and in which journals) relatively more weight, and good word-of-mouth among the well-connected, very little weight. [Of course, in the late stages of a decision, one should read a lot of the work of the various candidates, and not just go by what shows up on a CV. What I'm talking about here is the earlier stages, where you simply can't read a whole lot of material by all the candidates, and you have to use faster means to narrow the field to those who will be looked at more carefully. It's here that I'm suggesting putting a lot of weight on publication records and very little on word-of-mouth.]

And what I'm worried about is the *possible* suggestion, that some may perhaps take from the first paragraph of Jason's post, that impact-as-measured-by-GS is also a good consideration to use in such settings in place of word-of-mouth to address fairness concerns. And here, connecting with one of Sam's points, the substantial likelihood that such impact, even if well-measured, is already heavily influenced by one's connectedness is enough to make me think this isn't a good use of GS-measured impact -- though one might have other reasons to be interested in the relative impact of various papers such that, for all I know, one is well-served in carefully using GS scores.

(Oh, and one possible way to exercise some more caution in using GS citations is this: click on the "cited by" number and take a look at what the sources of these citations are -- lots of journal articles in journals you know to be good, or lots of citations in papers on people's personal web sites?)

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In

Paid Advertisements

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31