AlternateScienceMetrics that might actually work

Last week an actual, real-life, if-pretty-satirical paper was published by Neil Hall in Genome Biology (really? really)  ‘The Kardashian index: a measure of discrepant social media profile for scientists’, in which he proposed a metric of impact that relates the number of Twitter followers to the number of citations of papers in scientific journals. The idea being that there are scientists who are “overvalued” because they Tweet more than they are cited- and drawing a parallel with the career of a Kardashian, who are famous, but not for having done anything truly important (you know like throwing a ball real good, or looking chiseled and handsome on a movie screen).

For those not in the sciences or not obsessed with publication metrics this is a reaction to the commonly used h-index, a measure of scientific productivity. Here ‘productivity’ is traditionally viewed as being publications in scientific journals, and the number of times your work gets cited (referenced) in other published papers is seen as a measure of your ‘impact’. The h-index is calculated as the number of papers you’ve published with citations equal to or greater than that number. So if I’ve published 10 papers I rank these by number of citations and find that only 5 of those papers have 5 or more citations and thus my h-index is 5. There is A LOT of debate about this metric and it’s uses which include making decisions for tenure/promotion and hiring.

Well, the paper itself has drawn quite a bit of well-placed criticism and prompted a brilliant correction from Red Ink. Though I sympathize with Neil Hall and think he actually did a good thing to prompt all the discussion and it was really satire (his paper is mostly a journal-based troll)- the criticism is really spot on. First off for the idea that Twitter activity is less impactful than publishing in scientific journals, a concept that seems positively quaint, outdated, and wrong-headed about scientific communication (a good post here about that). This idea also prompted a blog post from Keith Bradnam who suggested that we could look at the Kardashian Index much more productively if we flipped it on it’s head and proposed the Tesla index, or a measure of scientific isolation. Possibly this is what Dr. Hall had in mind when he wrote it. Second, that Kim Kardashian has “not achieved anything consequential in science, politics or the arts” and “is one of the most followed people on twitter” and this is a bad thing. Also that the joke “punches down” and thus isn’t really funny- as put here. I have numerous thoughts on this one from many aspects of pop culture but won’t go in to those here.

So the paper spawned a hashtag, #AlternateScienceMetrics#AlternateScienceMetrics, where scientists and others suggested other funny (and sometimes celebrity-named) metrics for evaluating scientific impact or other things. These are really funny and you can check out summaries here and here and a storify here. I tweeted one of these (see below) that has now become my most retweeted Tweet (quite modest by most standards, but hey over 100 RTs!). This got me thinking, how many of these ideas would actually work? That is, how many #AlternateScienceMetrics could be reasonably and objectively calculated and what useful information would these tell us? I combed through the suggestions to highlight some of these here- and I note that there is some sarcasm/satire hiding here and there too. You’ve been warned.

    • Name: The Kanye Index
    • What it is: Number of self citations/number of total citations
    • What it tells you: How much does an author cite their own work.
    • The good: High index means that the authors value their own work and are likely building on their previous work
    • The bad: The authors are blowing their own horn and trying to inflate their own h-indices.

    This is actually something that people think about seriously as pointed out in this discussion (h/t PLoS Labs). Essentially from this analysis it looks like self-citations in life science papers are low relative to other disciplines: 21% of all citations in life science papers are self-citations, but this is *double* in engineering where 42% of citations are self citations. The point is that self-citations aren’t a bad thing- they allow important promotion of visibility and artificially suppressing self-citation may not be a good thing. I use self citations since a lot of times my current work (that’s being described) builds on the previous work, which is the most relevant to cite (generally along with other papers that are not from my group too). Ironically, this the first entry in my list of potentially useful #AlternateScienceMetrics is a self reference.

    • Name: The Tesla Index
    • What it is: Number of Twitter followers/number of total citations
    • What it tells you: Balance of social media presence with traditional scientific publications.
    • The good: High index means you value social media for scientific outreach
    • The bad: The authors spend more time on the social media than doing ‘real’ scientific work.

    I personally like Keith Bradnam’s Tesla Index to measure scientific isolation (essentially the number of citations you have divided by the number of Twitter followers). I see that the importance of traditional scientific journals as THE way to disseminate your science is waning. They are still important and lend an air of confidence to the conclusions stated there, which may or may not be well-founded, but there is a lot of very important scientific discussion that is happening elsewhere. Even in terms of how we find out about scientific studies published in traditional journals outlets like Twitter are playing increasingly important roles. So, increasingly, a measure of scientific isolation might be important.

    • Name: The Bechdel Index
    • What it is: Number of papers with two or more women coauthors
    • High: You’re helping to effect a positive change.
    • Low: You’re not paying attention to the gender disparities in the sciences.

    The Bechdel Index is a great suggestion and has a body of work behind it. I’ve posted about some of these issues here and here. Essentially looking at issues of gender discrepancies in science and the scientific literature. There are some starter overviews of these problems here and here, but it’s a really big issue. As an example  one of these studies shows that the number of times a work is cited is correlated with the gender of its first author- which is pretty staggering if you think about it.

    • Name: The Similarity Index
    • What it is: Some kind of similarity measure in the papers you’ve published
    • What it tells you: How much you recycle very similar text and work.
    • The good: Low index would indicate a diversity and novelty in your work and writing.
    • The bad: High index indicates that you plagiarize from yourself and/or that you tend to try to milk a project for as much as it’s worth.

    Interestingly I actually found a great example of this and blogged about it here. The group I found (all sharing the surname of Katoh) have an h-index of over 50 achieved by publishing a whole bunch of essentially identical papers (which are essentially useless).

    • Name: The Never Ending Story Index
    • What it is: Time in review multiplied by number of resubmissions
    • What it tells you: How difficult it was to get this paper published.
    • The good: Small numbers might mean you’re really good at writing good papers the first time.
    • The bad: Large numbers would mean you spend a LOT of time revising your paper.

    This can be difficult information to get, though some journals do report these (PLoS Journals will give you time in review. I’ve also gathered that data for my own papers – I blogged about it here.

    • Name: Rejection Index
    • What it is: Percentage of papers you’ve had published relative to rejected. I would amend to make it published/all papers so it’d be a percentage (see second Tweet below).
    • What it tells you: How hard you’re trying?
    • High: You are rocking it and very rarely get papers rejected. Alternatively you are extremely cautious and probably don’t publish a lot. Could be an indication of a perfectionist.
    • Low: Trying really hard and getting shot down a lot. Or you have a lot of irons in the fire and not too concerned with how individual papers fare.

    Like the previous metric this one would be hard to track and would require self reporting from individual authors. Although you could probably get some of this information (at a broad level) from journals who report their percentage of accepted papers- that doesn’t tell you about individual authors though.

    • Name: The Teaching/Research Metric
    • What it is: Maybe hours spent teaching divided by hours in research
    • What it tells you: How much of your effort is devoted to activity that should result in papers.

    This is a good idea and points out something that I think a lot of professors with teaching duties have to balance (I’m not one of them, but pretty sure this is true). I’d bet they sometimes feel that their teaching load is something that is expected, but not taken into account when the publication metrics are looked evaluated.  

    • Name: The MENDEL Index
    • What it is: Score of your paper divided by the impact factor of the journal where it was published
    • What it tells you: If your papers are being targeted at appropriate journals.
    • High: Indicates that your paper is more impactful than the average paper published in the journal.
    • Low: Indicates your paper is less impactful than the average paper published in the journal.

    I’ve done this kind of analysis on my own publications (read about it here) and stratified my publications by career stage (graduate student, post-doc, PI). This showed that my impact (by this measure) has continued to increase- which is good!

    • Name: The Two-Body Factor
    • What it is: Number of citations you have versus number of citations your spouse has.
    • What it tells you: For two career scientists this could indicate who might be the ‘trailing’ spouse (though see below).
    • High: You’re more impactful than your spouse.
    • Low: Your spouse is more impactful than you.

    This is an interesting idea for a metric for an important problem. But it’s not likely that it would really address any specific problem- I mean if you’re in this relationship you probably already know what’s up, right? And if you’re not in the same sub-sub-sub discipline as your spouse it’s unlikely that the comparison would really be fair. If you’re looking for jobs it is perfectly reasonable that the spouse with a lower number of citations could be more highly sought after because they fit what the job is looking for very well. My wife, who is now a nurse, and I could calculate this factor, but the only papers she has her name on my name is on as well.

    • Name: The Clique Index
    • What it is: Your citations relative to your friend’s citations
    • What it tells you: Where you are in the pecking order of your close friends (with regard to publications).
    • High: You are a sciencing god among your friends. They all want to be coauthors with you to increase their citations.
    • Low: Maybe hoping that some of your friends’ success will rub off on you?

    Or maybe you just like your friends and don’t really care what their citation numbers are like (but still totally check on them regularly. You know, just for ‘fun’)

    • Name: The Monogamy Index
    • What it is: Percentage of papers published in a single journal.
    • What it tells you: Not sure. Could be an indicator that you are in such a specific sub-sub-sub-sub-field that you can only publish in that one journal. Or that you really like that one journal. Or that the chief editor of that one journal is your mom.

    • Name: The Atomic Index
    • What it is: Number of papers published relative to the total number of papers you should have published.
    • What it tells you: Are you parsing up your work appropriately.
    • High: You tend to take apart studies that should be one paper and break them into chunks. Probably to pad your CV.
    • Low: Maybe you should think about breaking up that 25 page, 15 figure, 12 experiment paper into a couple of smaller ones?

    This would be a very useful metric but I can’t see how you could really calculate it, aside from manually going through papers and evaluating.

    • Name: The Irrelevance Factor
    • What it is: Citations divided by number of years the paper has been published. I suggest adding in a weighting factor for years since publication to increase the utility of this metric.
    • What it tells you: How much long-term impact are you having on the field?
    • High: Your paper(s) has a long term impact and it’s still being cited even years later.
    • Low: Your paper was a flash in the pan or it never was very impactful (in terms of other people reading it and citing it). Or you’re an under-recognized genius. Spend more time self-citing and promoting your work on Twitter!

    My reformulation would look something like this: sum(Cy*(y*w)), where Cy is the citations for year y (where 1 is first year of publication) and w is a weighting factor. You could have w be a nonlinear function of some kind if you wanted to get fancy.



So if you’ve made it to this point here’s my summary. There are a lot of potentially useful metrics that evaluate different aspects of scientific productivity and/or weight for and against particular confounding factors. As humans we LOVE to have one single metric to look at and summarize everything. This is not how the world works. At all. But there we are. There are some very good efforts to try to change the ways that we, as scientists, evaluate our impact including ImpactStory and there’ve been many suggestions of much more complicated metrics than what I’ve described here if you’re interested. 

The good, the bad, and the ugly: Open access, peer review, investigative reporting, and pit bulls

We all have strong feelings about things based on anecdotal evidence, it’s part of human nature. Science is aimed at testing those anecdotal feelings (we call them hypotheses) in a more rigorous fashion to support or refute our gut feelings about a subject. Many times those gut feelings are wrong- especially about new concepts and ideas that come along. Open access publishing certainly falls into this category- a new and interesting business model that many people have very strong feelings about. There is, therefore, a need for the  second part: scientific studies that illuminate how well it’s working.

Recently the very prestigious journal Science published an article, titillatingly titled, “Who’s Afraid of Peer Review: A spoof paper concocted by Science reveals little or no scrutiny at many open-access journals.” I’ve seen it posted and reposted on Twitter and Facebook by a number of colleagues, and, indeed, when I first read about it I was intrigued. The post has been accompanied by sentiments such as “I never trusted open access” or “now you know why you get so many emails from open access journals”- in other words, gut feelings about the overall quality of open access journals.

Here’s the basic rundown: John Bohannon concocted a fake, but believable scientific paper with a critical flaw. He submitted it to a large number of open access journals under different names then recorded which journals accepted it, along with recording the correspondence with that journal- some of which is pretty damning (i.e. it looks like they didn’t do any peer review on the paper). Several high-profile open access journals like PLoS One rejected the paper. But many journals accepted the flawed paper. On one hand the study is an ambitious and ground breaking investigation into how well journals execute peer review, the heart of scientific publishing. The author is to be commended on this undertaking, which is considerably more comprehensive (in terms of numbers of journals targeted) than anything in the past.

On the other hand, the ‘study’, which concludes that open access peer review is flawed, is itself deeply flawed and was not, in fact, peer reviewed (it is categorized as a “News” piece for Science). The reason is really simple- the ‘study’ was not conceived as a scientific study at all. It was investigative reporting, which is much different. The goal of investigative reporting is to call attention to important and often times unrecognized problems. In this way Dr. Bohannon’s piece was probably quite successful because it does highlight the very lax or non-existent peer review at a large number of journals. However, the focus on open access is harmful misdirection that only muddies the waters.

Here’s what’s not in question: Dr. Bohannon, found that a large number of the journals he submitted his fake paper to seemed to accept it with little or no peer review. (However, it is worth noting that Gunther Eysenbach, an editor for a journal that was contacted, reports that he rejected the paper because it was out of scope of the journal and that his journal was not listed in the final list of journals in Bohannon’s paper for some reason.)

What this says about peer review in general is striking: this fake paper was flawed in a pretty serious way and should not have passed peer review. This conclusion of the paper is a good and important one: peer review is flawed for a surprising number of journals (or just non-existent).

What the results do not say is anything about whether open access contributes to this problem. Open access was not a variable in Dr. Bohannon’s study. However, it is one of the main conclusions of the paper- that the open access model is flawed. So essentially, this ‘study’ is falsely representing the results of a study that was not designed to answer the question posed: are open access journals more likely than for-pay journals to have shoddy peer review processes? No for-pay journals were tested in the sting, thus no results. It MAY be that open access is worse than for-pay in terms of peer review, but THIS WAS NOT TESTED BY THE STUDY. Partly this is the fault of the promotion for the piece by Science, which does play up the open access angle quite a bit- but it is really implicit in the study itself. Interestingly, this is how Dr. Bohannon describes the spoof paper’s second flawed experiment:

The second experiment is more outrageous. The control cells were not exposed to any radiation at all. So the observed “interactive effect” is nothing more than the standard inhibition of cell growth by radiation. Indeed, it would be impossible to conclude anything from this experiment.

Thus neatly summarizing the fundamental flaw in his own study- the control journals (more traditional for-pay journals) were not queried at all so nothing can be concluded from this study- in terms of open access anyway.

The heart of the problem is that the very well-respected journal Science is now asking the reader to accept conclusions that are not based in the scientific method. This is the equivalent of stating that pitbulls are more dangerous than other breeds because they bite 10,000 people per year in the US (I just made that figure up). End of story. How many people were bitten by other breeds? We don’t know because we didn’t look at those statistics. How do we support our conclusion? Because people feel that pitbulls are more dangerous than other breeds- just as some scientists distrust open access journals as “predatory” or worse. So, in a very real way the well-respected for-pay journal Science is preying upon the ‘gut feelings’ of readers who may distrust open access and feeding them with pseudoscience, or at least pseudo conclusions about open access.

A number of very smart and well-spoken (well, written) people have posted on this subject and made some other excellent points. See posts from Michael EisenBjörn Brembs, Paul Baskin, and Gunther Eysenbach on the subject.

Gaming the system: How to get an astronomical h-index with little scientific impact

The old scientific adage “publish or perish” has garnered a lot of debate lately. I’ve posted about my own scientific impact as well as the impact of papers published about computational methods that are named versus unnamed in the title. Certainly publications remain the currency of scientific careers, for better or worse- though I think this is changing with more emphasis being placed on other, more flexible and open, forms of scientific outreach. There’s a lot of talk about this subject from various places including ByteSizeBiology, Peter Lawrence, and Michael Eisen – to name a few.

The purpose of this post is to highlight an instance of abuse of the system- kind of in a funny (odd, surprising, shocking) way. This is similar in spirit to recent reports that a math paper generated by linking mathematical words together by an algorithm to write papers was accepted into a journal.

I was searching gene names to research a paper I was writing a couple of years ago and started to notice a weird pattern. Some genes were mostly absent from the literature (that is, no one has actually studied their function, and they haven’t been highlighted in any other screen-type studies that identify lots of things). However, a number of publications on completely different genes looked suspiciously similar. Many of these had titles that included the words “integrative genomic analyses” or “identification and characterization of [gene] in silico”, they all had two authors M. Katoh and M. Katoh or Y. Katoh, though some had more authors, and most were published in a few journals, the International Journal of Molecular Medicine and the International Journal of Oncology both with low, but respectable impact factors (1.8 or so). Many, though not all, of these papers seem to be rehashed digests of information obtained from databases combined with review-type information about potential functions related to cancer or biomedicine. This PubMed search retrieves most of these citations for your amusement.

A quick search in Web of Knowledge for “Katoh M” as an author and “INTERNATIONAL JOURNAL OF ONCOLOGY” as a publication retrieves 99 publications, with a jaw-dropping h-index of 48 (h-index is a measurement of scientific impact of a group of publications). Results from the “INTERNATIONAL JOURNAL OF MOLECULAR MEDICINE” were only slightly less impressive (h-index of 37 with exactly 99 publications as well; see the screen capture of results below). Following up with a search of the three main names here, Masaru, Masuko, or Yuriko (there was also a mysteriously named “Mom Katoh”, who may be the ringleader of the bunch- but she/he only had a couple of publications) retrieved 216 publications with a combined h-index of 56, a number that any biologist would die for (or at least should be very happy with).

Web of Knowledge Search for Katoh M

Web of Knowledge Search for Katoh M

Masaru is affiliated with the apparently reputable National Cancer Research Institute in Japan. But Masuko and Yuriko don’t seem to be closely affiliated to any place in particular (judging by a Google search).

Some of these publications may, in fact, be valuable and have valuable information and results in them- I certainly haven’t gone through each and every one. However, a large number of these “integrative genomic analyses” are not useful and seem to have been targeted at genes with little characterization and are written based on template text. The high citation number that they get, then, may be due to lack of care on the part of those citing the publication, and they are included simply because they appear to be the only comprehensive functional study of a particular gene that has turned up in the study. It certainly emphasizes the need for caution when “filling in” citations for a publication that are not central to the main story (and thus writers, myself certainly included, are less critical about the source of their citations).

How important is having a name for your computational method?

When building software tools, databases, or reporting approaches to data analysis or modeling, choosing a name is important. I started out writing this post with the notion that this is true, searched briefly for evidence to back me up, then realized that I could do this analysis myself. Or at least enough to get an idea of how important having a name for your method might be.

Here’s what I did: gathered all publications in the journal Bioinformatics published between 2004 and 2008 (3517 or so) from the Web of Knowledge/Science. I then identified those publications that referenced software tools or databases by starting with a “[name]: [title of paper]” giving approximately 954 publications (there are more than this that fit the bill, more on that in a minute). I calculated the mean number of citations the publications in each group had (not adjusting for years in publication)- that’s the “All” comparison in the figure below. The difference shows that publications that use a name garner more citations (and thus have more ‘impact’ by this measure) and this was statistically significant by t test (0.005). However, this could be due to the difference in the nature of the publication. Perhaps, tools are just more likely to be cited than more scientific studies about specific systems (I think they are). So I went through an arbitrary selection of 500 of the publications without a name and identified a conservative set of 158 that looked like they could have had names associated with them, based on their titles. This was a bit of an arbitrary endeavor, but I think I did an OK job. That comparison is the “Matched” comparison below and shows a much more marked difference.

You can find a spreadsheet with my analysis here: Bioinformatics_Pubs_WOS_2008

The bottom line: The publications with named methods garnered over three times the number of citations as the pubs with no names and this was also statistically significant (0.05, because of the smaller number of publications in the matched set).

Impact analysis of pubs with named methods versus unnamed methods

Impact analysis of pubs with named methods versus unnamed methods

There are a number of ways I could improve on this comparison and I’d be happy to entertain suggestions on it. However I think the results of this are quite interesting. There are some reasons that they might be true (that are unrelated to actually having a name). First thing I can think of is that the named publications are likely to be application notes, which describe the release of more mature, tested software than the non-named publications that may describe more of the research and proving of the method- that is, they may be more likely to have tool that is actually usable by others (and thus citable) than the other kind of publication, which may not even provide software at all. A good way to examine this would be to construct a matched set of publications that have no named method, but do have associated software (or web interface). However, I really don’t have time for doing that, it sounds painfully boring.

However, another non-exclusive notion that this result suggests is that simply the presence of a recognizable, easily usable name for a method increases the likelihood that it will be cited in future work. This allows association of the complicated and hard-to-describe process that is described in the paper with a “handle” for the method that is easy to remember. This is actually fairly interesting psychologically and suggests what I believe many scientists already realize, that marketing (the choice of a good name for example) can be key in scientific impact. We can debate on whether or not that’s a good thing, but it’s generally true in science.

So these results seem to suggest that a way to increase scientific impact is to name your method. Though, of course, correlation does not imply causation- so it certainly might not work that way. I’m really interested in seeing if there are patterns in the choice of name that extend to impact, but I’m not sure about how to do that. The length (number of characters) in the name has no correlation with number of citations, but that’s as far as I’ve gotten. Any suggestions?