Asked and answered: Computational Biology Contribution?

So someone asked me this question today: “as a computational biologist,how can you be useful to the world?”. OK so they didn’t ask me, per se, they got to my blog by typing the question into a search engine and I saw this on my WordPress stats page (see bottom of this post). Which made me think- “I don’t know what page they were directed to- but I know I haven’t addressed that specific question before on my blog”. So here’s a quick answer, especially relevant since I’ve been talking with CS people about this at the ACM-BCB meeting the last few days.

As a computational biologist how can you be useful to the world?

  1. Choose your questions carefully. Make sure that the algorithm you’re developing, the software that you’re designing, the fundamental hypothesis that you’re researching is actually one that people (see collaborators, below) are interested in and see the value in. Identify the gaps in the biology that you can address. Don’t build new software for the sake of building new software- generally people (see collaborators) don’t care about a different way to do the same thing, even if it’s moderately better than the old way.
  2. Collaborate with biologists, clinicians, public health experts, etc. Go to the people who have the problems. What they can offer you is focus on important problems that will improve the impact of your research (you want NIH funding? You HAVE to have impact and probably collaborators). What you can give them is a solution to a problem that they are actually facing. Approach the relationship with care though since this is where the language barrier between fields can be very difficult (a forthcoming post from me in the near future on this). Make sure that you interact with these collaborators during the process- that way you don’t go off and do something completely different than what they had in their heads.
  3. In research be rigorous. The last thing that anyone in any discipline needs is a study that has not considered validation, generalizability, statistical significance, or having a gold-standard or reasonable facsimile thereof to compare to. Consider collaborating with a statistician to at least run your ideas by- they can be very helpful, or a senior computational biologist mentor.
  4. In software development be thoughtful. Consider robustness of your code- have you tested it extensively? How will average users (see collaborators, above) be able to get their data into it? How will average users be able to interpret the results of your methods? Put effort into working with those collaborators to define the user interface and user experience. They don’t (to a point) care about execution times as long as it finishes in a reasonable amount of time (have your software estimate time to completion and display it) and it gives good results. They do care if they can’t use it (or rather they completely don’t care and will stop working with you on the spot).
  5. Sometimes people don’t know what they need until they see it. This is a tip for at least 10th level computational biologists (to make a D&D analogy). This was a tenet of Steve Jobs of Apple and I believe it to be true. Sometimes, someone with passion and skill has to break new ground and do something that no one is asking them to do but that they will LOVE and won’t know how they lived without it. IT IS HIGHLY LIKELY THAT THIS IS NOT YOU. This is a pretty sure route to madness, wearing a tin hat, and spouting “you fools! you’ll never understand my GENIUS”- keep that in mind.
  6. For a computational biologist with some experience make sure that you pass it along. Attend conferences where there are likely to be younger faculty/staff members, students, and post-docs. Comment on their posters and engage. When possible suggest or make connections with collaborators (see above) for them. Question them closely on the four points above- just asking the questions may be an effective way of conveying importance. Organize sessions at these conferences. In your own institution be an accessible and engaged mentor. This has the most potential to increase your impact on the world. It’s true.

Next week: “pathogens found in confectionary” (OK- probably not going to get to that one, but interesting anyway)

People be searchin'

People be searchin’

AlternateScienceMetrics that might actually work

Last week an actual, real-life, if-pretty-satirical paper was published by Neil Hall in Genome Biology (really? really)  ‘The Kardashian index: a measure of discrepant social media profile for scientists’, in which he proposed a metric of impact that relates the number of Twitter followers to the number of citations of papers in scientific journals. The idea being that there are scientists who are “overvalued” because they Tweet more than they are cited- and drawing a parallel with the career of a Kardashian, who are famous, but not for having done anything truly important (you know like throwing a ball real good, or looking chiseled and handsome on a movie screen).

For those not in the sciences or not obsessed with publication metrics this is a reaction to the commonly used h-index, a measure of scientific productivity. Here ‘productivity’ is traditionally viewed as being publications in scientific journals, and the number of times your work gets cited (referenced) in other published papers is seen as a measure of your ‘impact’. The h-index is calculated as the number of papers you’ve published with citations equal to or greater than that number. So if I’ve published 10 papers I rank these by number of citations and find that only 5 of those papers have 5 or more citations and thus my h-index is 5. There is A LOT of debate about this metric and it’s uses which include making decisions for tenure/promotion and hiring.

Well, the paper itself has drawn quite a bit of well-placed criticism and prompted a brilliant correction from Red Ink. Though I sympathize with Neil Hall and think he actually did a good thing to prompt all the discussion and it was really satire (his paper is mostly a journal-based troll)- the criticism is really spot on. First off for the idea that Twitter activity is less impactful than publishing in scientific journals, a concept that seems positively quaint, outdated, and wrong-headed about scientific communication (a good post here about that). This idea also prompted a blog post from Keith Bradnam who suggested that we could look at the Kardashian Index much more productively if we flipped it on it’s head and proposed the Tesla index, or a measure of scientific isolation. Possibly this is what Dr. Hall had in mind when he wrote it. Second, that Kim Kardashian has “not achieved anything consequential in science, politics or the arts” and “is one of the most followed people on twitter” and this is a bad thing. Also that the joke “punches down” and thus isn’t really funny- as put here. I have numerous thoughts on this one from many aspects of pop culture but won’t go in to those here.

So the paper spawned a hashtag, #AlternateScienceMetrics#AlternateScienceMetrics, where scientists and others suggested other funny (and sometimes celebrity-named) metrics for evaluating scientific impact or other things. These are really funny and you can check out summaries here and here and a storify here. I tweeted one of these (see below) that has now become my most retweeted Tweet (quite modest by most standards, but hey over 100 RTs!). This got me thinking, how many of these ideas would actually work? That is, how many #AlternateScienceMetrics could be reasonably and objectively calculated and what useful information would these tell us? I combed through the suggestions to highlight some of these here- and I note that there is some sarcasm/satire hiding here and there too. You’ve been warned.

    • Name: The Kanye Index
    • What it is: Number of self citations/number of total citations
    • What it tells you: How much does an author cite their own work.
    • The good: High index means that the authors value their own work and are likely building on their previous work
    • The bad: The authors are blowing their own horn and trying to inflate their own h-indices.

    This is actually something that people think about seriously as pointed out in this discussion (h/t PLoS Labs). Essentially from this analysis it looks like self-citations in life science papers are low relative to other disciplines: 21% of all citations in life science papers are self-citations, but this is *double* in engineering where 42% of citations are self citations. The point is that self-citations aren’t a bad thing- they allow important promotion of visibility and artificially suppressing self-citation may not be a good thing. I use self citations since a lot of times my current work (that’s being described) builds on the previous work, which is the most relevant to cite (generally along with other papers that are not from my group too). Ironically, this the first entry in my list of potentially useful #AlternateScienceMetrics is a self reference.


    • Name: The Tesla Index
    • What it is: Number of Twitter followers/number of total citations
    • What it tells you: Balance of social media presence with traditional scientific publications.
    • The good: High index means you value social media for scientific outreach
    • The bad: The authors spend more time on the social media than doing ‘real’ scientific work.

    I personally like Keith Bradnam’s Tesla Index to measure scientific isolation (essentially the number of citations you have divided by the number of Twitter followers). I see that the importance of traditional scientific journals as THE way to disseminate your science is waning. They are still important and lend an air of confidence to the conclusions stated there, which may or may not be well-founded, but there is a lot of very important scientific discussion that is happening elsewhere. Even in terms of how we find out about scientific studies published in traditional journals outlets like Twitter are playing increasingly important roles. So, increasingly, a measure of scientific isolation might be important.


    • Name: The Bechdel Index
    • What it is: Number of papers with two or more women coauthors
    • High: You’re helping to effect a positive change.
    • Low: You’re not paying attention to the gender disparities in the sciences.

    The Bechdel Index is a great suggestion and has a body of work behind it. I’ve posted about some of these issues here and here. Essentially looking at issues of gender discrepancies in science and the scientific literature. There are some starter overviews of these problems here and here, but it’s a really big issue. As an example  one of these studies shows that the number of times a work is cited is correlated with the gender of its first author- which is pretty staggering if you think about it.


    • Name: The Similarity Index
    • What it is: Some kind of similarity measure in the papers you’ve published
    • What it tells you: How much you recycle very similar text and work.
    • The good: Low index would indicate a diversity and novelty in your work and writing.
    • The bad: High index indicates that you plagiarize from yourself and/or that you tend to try to milk a project for as much as it’s worth.

    Interestingly I actually found a great example of this and blogged about it here. The group I found (all sharing the surname of Katoh) have an h-index of over 50 achieved by publishing a whole bunch of essentially identical papers (which are essentially useless).


    • Name: The Never Ending Story Index
    • What it is: Time in review multiplied by number of resubmissions
    • What it tells you: How difficult it was to get this paper published.
    • The good: Small numbers might mean you’re really good at writing good papers the first time.
    • The bad: Large numbers would mean you spend a LOT of time revising your paper.

    This can be difficult information to get, though some journals do report these (PLoS Journals will give you time in review. I’ve also gathered that data for my own papers – I blogged about it here.


    • Name: Rejection Index
    • What it is: Percentage of papers you’ve had published relative to rejected. I would amend to make it published/all papers so it’d be a percentage (see second Tweet below).
    • What it tells you: How hard you’re trying?
    • High: You are rocking it and very rarely get papers rejected. Alternatively you are extremely cautious and probably don’t publish a lot. Could be an indication of a perfectionist.
    • Low: Trying really hard and getting shot down a lot. Or you have a lot of irons in the fire and not too concerned with how individual papers fare.

    Like the previous metric this one would be hard to track and would require self reporting from individual authors. Although you could probably get some of this information (at a broad level) from journals who report their percentage of accepted papers- that doesn’t tell you about individual authors though.


    • Name: The Teaching/Research Metric
    • What it is: Maybe hours spent teaching divided by hours in research
    • What it tells you: How much of your effort is devoted to activity that should result in papers.

    This is a good idea and points out something that I think a lot of professors with teaching duties have to balance (I’m not one of them, but pretty sure this is true). I’d bet they sometimes feel that their teaching load is something that is expected, but not taken into account when the publication metrics are looked evaluated.  


    • Name: The MENDEL Index
    • What it is: Score of your paper divided by the impact factor of the journal where it was published
    • What it tells you: If your papers are being targeted at appropriate journals.
    • High: Indicates that your paper is more impactful than the average paper published in the journal.
    • Low: Indicates your paper is less impactful than the average paper published in the journal.

    I’ve done this kind of analysis on my own publications (read about it here) and stratified my publications by career stage (graduate student, post-doc, PI). This showed that my impact (by this measure) has continued to increase- which is good!


    • Name: The Two-Body Factor
    • What it is: Number of citations you have versus number of citations your spouse has.
    • What it tells you: For two career scientists this could indicate who might be the ‘trailing’ spouse (though see below).
    • High: You’re more impactful than your spouse.
    • Low: Your spouse is more impactful than you.

    This is an interesting idea for a metric for an important problem. But it’s not likely that it would really address any specific problem- I mean if you’re in this relationship you probably already know what’s up, right? And if you’re not in the same sub-sub-sub discipline as your spouse it’s unlikely that the comparison would really be fair. If you’re looking for jobs it is perfectly reasonable that the spouse with a lower number of citations could be more highly sought after because they fit what the job is looking for very well. My wife, who is now a nurse, and I could calculate this factor, but the only papers she has her name on my name is on as well.


    • Name: The Clique Index
    • What it is: Your citations relative to your friend’s citations
    • What it tells you: Where you are in the pecking order of your close friends (with regard to publications).
    • High: You are a sciencing god among your friends. They all want to be coauthors with you to increase their citations.
    • Low: Maybe hoping that some of your friends’ success will rub off on you?

    Or maybe you just like your friends and don’t really care what their citation numbers are like (but still totally check on them regularly. You know, just for ‘fun’)


    • Name: The Monogamy Index
    • What it is: Percentage of papers published in a single journal.
    • What it tells you: Not sure. Could be an indicator that you are in such a specific sub-sub-sub-sub-field that you can only publish in that one journal. Or that you really like that one journal. Or that the chief editor of that one journal is your mom.

    • Name: The Atomic Index
    • What it is: Number of papers published relative to the total number of papers you should have published.
    • What it tells you: Are you parsing up your work appropriately.
    • High: You tend to take apart studies that should be one paper and break them into chunks. Probably to pad your CV.
    • Low: Maybe you should think about breaking up that 25 page, 15 figure, 12 experiment paper into a couple of smaller ones?

    This would be a very useful metric but I can’t see how you could really calculate it, aside from manually going through papers and evaluating.


    • Name: The Irrelevance Factor
    • What it is: Citations divided by number of years the paper has been published. I suggest adding in a weighting factor for years since publication to increase the utility of this metric.
    • What it tells you: How much long-term impact are you having on the field?
    • High: Your paper(s) has a long term impact and it’s still being cited even years later.
    • Low: Your paper was a flash in the pan or it never was very impactful (in terms of other people reading it and citing it). Or you’re an under-recognized genius. Spend more time self-citing and promoting your work on Twitter!

    My reformulation would look something like this: sum(Cy*(y*w)), where Cy is the citations for year y (where 1 is first year of publication) and w is a weighting factor. You could have w be a nonlinear function of some kind if you wanted to get fancy.

 

 

So if you’ve made it to this point here’s my summary. There are a lot of potentially useful metrics that evaluate different aspects of scientific productivity and/or weight for and against particular confounding factors. As humans we LOVE to have one single metric to look at and summarize everything. This is not how the world works. At all. But there we are. There are some very good efforts to try to change the ways that we, as scientists, evaluate our impact including ImpactStory and there’ve been many suggestions of much more complicated metrics than what I’ve described here if you’re interested. 

Gender bias in scientific publishing

The short version: This is a good paper about an important topic, gender bias in publication. The authors try to address two main points: What is the relationship between gender and research output?; and What is the relationship between author gender and paper impact? The study shows a bias in number of papers published by gender, but apparently fails to control for the relative number of researchers of each gender found in each field. This means that the first point of the paper, that women publish less than men, can’t be separated from the well-known gender bias in most of these fields- i.e. there are more men than women. This seems like a strange oversight, and it’s only briefly mentioned in the paper. The second point, which is made well and clearly, is that papers authored by women are cited less than those authored by men. This is the only real take home of the paper, though it is a very important and alarming one.
What the paper does say: that papers authored by women are cited less than those authored by men.
What the paper does NOT say: that women are less productive than men, on average, in terms of publishing papers.
The slightly longer version
This study on gender bias in scientific publishing is a really comprehensive look at gender and publishing world-wide (though it is biased toward the US). The authors do a good job of laying out previous work in this area and then indicate that they are interested in looking at scientific productivity with respect to differences in gender. The first stated goal is to provide an analysis of: “the relationship between gender and research output (for which our proxy was authorship on published papers).”
The study is not in any way incorrect (that I can see in my fairly cursory read-through) but it does present the data in a way that is a bit misleading. Most of the paper describes gathering pretty comprehensive data on gender in published papers relative to author position, geographic location, and several other variables. This is then used to ‘show’ that women are less productive than men in scientific publication but it omits a terribly important step- they never seem to normalize for the ratio of women to men in positions that might be publishing at all. That is, their results very clearly reiterate that there is a gender bias in the positions themselves- but doesn’t say anything (that I can see) about the productivity of individuals (how many papers were published by each author, for example).
They do mention this issue in their final discussion:
UNESCO data show10 that in 17% of countries an equal number of men and women are scientists. Yet we found a grimmer picture: fewer than 6% of countries represented in the Web of Science come close to achieving gender parity in terms of papers published.
And, though this is true, it seems like a less-than-satisfying analysis of the data.
On the other hand, the result that they show at the last- the number of times a paper is cited when a male or female name is included in various locations- is pretty compelling and is really their novel finding. This is actually pretty sobering analysis and the authors provide some ideas on how to address this issue, which seems to be part of the larger problem of providing equal opportunities and advantages to women in science.