The Truth

What do you think the truth is? That is what do you think the concept of “truth” actually means? Is it an absolute- a destination that you can reach if you just try hard enough? Or is it something else? A road that stretches out in front of you and constantly changes as you progress and add more evidence?

Well?

TheTruth_comic

Writing Yourself Into A Corner

I’ve been fascinated with the idea of investment, and how it can color your thoughts, feelings, and opinions about something. Not the monetary sense of the word (though probably that too) but the emotional and intellectual sense of the word. If you’ve ever been in a bad relationship you might have fallen prey to this reasoning- “I’m in this relationship and I’m not getting out because reasons so admitting that’s it’s absolutely terrible for me is unthinkable so I’m going to pretend like it’s not and I’m going to believe that it’s not and I’m going to tell everyone that I’m doing great”. I really believe this can be a motivating factor for a big chunk of human behavior.

And it’s certainly a problem in science. When you become too invested in an idea or an approach or a tool- that is, you’ve spent a considerable amount of time researching or promoting it- it can be very difficult to distance yourself from that thing and admit that you might have it wrong. That would be unthinkable.

Sometimes this investment pitfall is contagious. If you’re on a project working together with others for common goals the problem of investment can become more complicated. That is, if I’ve said something, and some amount of group effort has been put into this idea, but it turns out I was wrong about it, it can be difficult to raise that to the rest of the group. Though, I note, that it is really imperative that it is raised. This can become more difficult if the ideas or preliminary results you’ve put forward become part of the project- through presentations made by others or through further investment of project resources to follow up on these leads.

I think this sometimes happens when you’re writing an early draft of a document- though the effect can be more subtle here. If you write words down and put out ideas that are generally sound and on-point it can be hard for you, or others who may edit the paper after you, to erase these. More importantly a first draft, no matter how preliminary or draft-y, can establish an organization that can be hard to break. Clearly if there are parts that really don’t work, or don’t fit, or aren’t true, they can be removed fairly easily. The bigger problems lie in those parts that are *pretty good*. I’ve looked back at my own preliminary drafts and realized (after a whole lot of work trying to get things to fit) that the initial overall organization was somehow wrong- and that I really need to rip it all apart and start over, at least in terms of the organization. I’ve also seen this in other people’s work, where something just doesn’t seem right about a paper, but I really can’t place my finger on what- at least not without a bunch of effort.

Does this mean that you should very carefully plan out your preliminary drafts? Not at all. That’s essentially the route to complete gridlock and non-productivity. Rather, you should be aware of this problem and be willing to be flexible. Realize that what you put down on the paper for the first draft (or early versions of analysis) is subject to change- and make others you are working with aware of this explicitly (simply labeling something as “preliminary analysis” or “rough draft” isn’t explicit enough). And don’t be afraid to back away from it if it’s not working out. It’s much better if that happens earlier in the process than later- that is, it’s better to completely tear down a final draft of a paper than to have reviewers completely miss the point of what you’re trying to say after you’ve submitted it.

writing_corner_v1

The Numerology of License Plates

I posted awhile back about encountering two vehicles with the same 3 letter code on their license plates as mine while driving to work one morning. Interestingly, in the following months I found myself paying more and more attention to license plates and saw at least 6-7 other vehicles in the area (a small three-city region with about 200K residents) with the same code.

Spooky. I started to feel like there was some kind of cosmological numerology going on in license plates around me that was trying to send me a message. BUT WHAT WAS IT?

A conclusion I drew from my thinking on the probability of that happening was that:

it is evident that there can be multiple underlying and often hidden explanatory variables that may be influencing such probabilities [from my post]

It was suggested that part of my noticing the plates could have been confirmation bias, I was looking for something so I noticed that thing more than normal given a pretty variable and unconnected background. I’m sure that’s true. However, I was sitting in traffic one evening (yes, we do have *some* traffic around here) and saw three plates that started with the letters ARK in the space of about 5 minutes. Weird.

So THEN I started really looking at the plates around me and noticed a strong underlying variable that pretty much explains it all. But it’s kinda interesting. I first noticed that Washington state seems to have recently switched from three number-three letter plates to three letter-four number plates. I then noticed that the starting letters for both kinds of plates were in a narrow range, W-Z for the old plates and A-C for the new plates. There don’t seem to be *any* plates outside that range right now (surveying a couple of hundred plates over the last couple of days). W is really underrepresented as is C – the tails of the distribution. This makes me guess that there’s a rolling distribution with a window of about 6 letters for license plates (in the state of Washington, other states have other systems or are on a different pattern). This probably changes with time as people have to renew their plates, buy new vehicles and get rid of the old. So the effective size of the license plate universe I tried to calculate in my previous post is much smaller than what I was thinking.

I don’t know why I find this so interesting but it really is. I know this is just some system that the Washington State Department of Licensing has and I could probably go to an office and just ask, but it seems like it’s a metaphor for larger problems of coincidence, underlying mechanisms, and science. I’m actually pretty satisfied with my findings, even though they won’t be published as a journal article (hey- you’re still reading, right?). On my way to pick up lunch today I noticed some more ARK plates (4) and these two sitting right next to each other (also 3 other ABG plates in other parts of the parking lot).

LicensePlates

The universe IS trying to tell me something. It’s about science stupid.

Magic Hands

Too good to be true or too good to pass up?

Too good to be true or too good to pass up?

There’s been a lot of discussion about the importance of replication in science (read an extensive and very thoughtful post about that here) and notable occurrences of non-reproducible science being published in high-impact journals. The recent retraction of the two STAP stem cell papers from Nature and accompanying debate over who should be blamed and how. The publication of a study (see also my post about this) in which research labs responsible for high-impact publications were challenged to reproduce their findings showed that many of these findings could not be replicated, in the same labs they were originally performed in. These, and similar cases and studies, indicate serious problems in the scientific process- especially, it seems, for some high-profile studies published in high-impact journals.

I was surprised, therefore, at the reaction of some older, very experienced PIs recently after a talk I gave at a university. I mentioned these problems, and briefly explained the results of the study on reproducibility to them- that, in 90% of the cases, the same lab could not reproduce the results that they had previously published. They were generally nonplussed. “Oh”, one said, “probably just a post-doc with magic hands that’s no longer in the group”. And all agreed on the difficulty of reproducing results for difficult and complicated experiments.

So my question is: do these fabled lab technicians actually exist? Are there those people who can “just get things to work”? And is this actually a good thing for science?

I have some personal experience in this area. I was quite good at futzing around with getting a protocol to work the first time. I would get great results. Once. Then I would continue to ‘innovate’ and find that I couldn’t replicate my previous work. In my early experiences I sometimes would not keep notes well enough to allow me to go back to the point where I got it to work. Which was quite disturbing and could send me into a non-productive tailspin of trying to replicate the important results. Other times I’d written things down sufficiently that I could get them to work again. And still others I found that someone else in the lab could consistently get better results out of the EXACT SAME protocol- apparently followed the same way. They had magic hands. Something about the way they did things just *worked*. There were some protocols in the lab that just seemed to need this magic touch- some people had it and some people didn’t. But does that mean that the results these protocols produced were wrong?

What kinds of procedures seem to require “magic hands”? One example is from when I was doing electron microscopy (EM) as a graduate student. We were working constantly at improving our protocols for making two-dimensional protein crystals for EM. This was delicate work, which involved mixing protein with a buffer in a small droplet, layering on a special lipid, incubating for some amount of time to let the crystals form, then lifting the fragile lipid monolayer (hopefully with protein crystals) off onto an EM grid and finally staining with an electron dense stain or flash freezing in liquid nitrogen. The buffers would change, the protein preparations would change, the incubation conditions would change, and how the EM grids were applied to our incubation droplets to lift off the delicate 2D crystals was subject to variation. Any one of these things could scuttle getting good crystals and would therefore produce a non-replication situation. There were several of us in the lab that did this and were successful in getting it to work- but it didn’t always work and it took some time to develop the right ‘touch’ to get it to work. The number of factors that *potentially* contributed to success or failure was daunting and a bit disturbing- and sometimes didn’t seem to be amenable to communication in a written protocol. The line between superstition and required steps was very thin.

But this is true of many protocols that I worked with throughout my lab career* – they were often complicated, multi-step procedures that could be affected by many variables- from the ambient temperature and humidity to who prepared the growth media and when. Not that all of these variables DID affect the outcomes but when an experiment failed there were a long list of possible causes. And the secret with this long list? It probably didn’t include all the factors that did affect the outcome. There were likely hidden factors that could be causing problems. So is someone with magic hands lucky, gifted, or simply persistent? I know of a few examples where all three qualities were likely present- with the last one being, in a way, most important. Yes, my collaborator’s post-doc was able to do amazing things and get amazing results. But (and I know this was the case) she worked really long and hard to get them. She probably repeated experiments many, many times ins some cases before she got it to work. And then she repeated the exact combination to repeat the experiments again. And again. And sometimes even that wasn’t enough (oops, the buffer ran out and had to be remade, but the lot number on the bottle was different, and weren’t they working on the DI water supply last week? Now my experiment doesn’t work anymore.)

So perhaps it’s not so surprising that many of these key findings from these papers couldn’t be repeated, even in the same labs. There was not the same incentive to get it to work for one thing- so that post-doc or another graduate student who’s taken over the same duties, probably tried once to repeat the experiment. Maybe twice. Didn’t work. Huh? That’s unfortunate. And that’s about as much time as we’re going to put in to this little exercise. The protocols could be difficult, complicated, and have many known and unknown variables affecting their outcomes.

But does it mean that all these results are incorrect? Does it mean that the underlying mechanisms or biology that was discovered was just plain wrong? No. Not necessarily. Most, if not all, of these high-profile publications that failed to repeat spawned many follow-on experiments and studies. It’s likely that many of the findings were borne out by orthogonal experiments, that is, experiments that test implications of these findings, and by extension the results of the original finding itself. Because of the nature of this study it was conducted anonymously- so we don’t really know, but it’s probably true. This was an important point, and one that was brought up by these experienced PIs I was talking with, is that sometimes direct replication may not be the most important thing. Important, yes. But perhaps not deal-killing if it doesn’t work. The results still might stand IF, and only if, second, third, and fourth orthogonal experiments can be performed that tell the same story.

Does this mean that you actually can make stem cells by treating regular cultured cells with an acid bath? Well, probably not. For some of these surprising, high-profile findings the ‘replication’ that is discussed is other labs trying to see if the finding is correct. So they try the protocols that have been reported, but it’s likely that they also try other orthogonal experiments that would, if positive, support the original claim.

"OMG! This would be so amazing if it's true- so, it MUST be true!"

“OMG! This would be so amazing if it’s true- so, it MUST be true!”

So this gets back to my earlier discussions on the scientific method and the importance of being your own worst skeptic (see here and here). For every positive result the first reaction should be “this is wrong”, followed by, “but- if it WERE right then X, Y, and Z would have to be true. And we can test X, Y, and Z by…”. The burden of scientific ‘truth’** is in replication, but in replication of the finding– NOT NECESSARILY in replication of the identical experiments.

*I was a labbie for quite a few of my formative years. That is, I actually got my hands dirty and did real, honest-to-god experiments, with Eppendorf tubes, vortexers, water baths, cell culture, the whole bit. Then I converted and became what I am today – a creature purely of silicon and code. Which suits me quite well. This is all just to add to my post a “I kinda know what I’m talking about here- at least somewhat”.

** where I using a very scientific meaning of truth here, which is actually something like “a finding that has extensive support through multiple lines of complementary evidence”

This one goes to 11…

The famous Spinal Tap quote (see the video here) is great because Nigel is explaining how  his amp is better than other rockers since you can turn it up to 11. “Why don’t you just make ten louder and make ten be the top number and make that a little louder?” asks the mockumentarian Rob Reiner. Good question.

The humor in this scene reminds me strongly of this recent paper on the introduction of an artificial nucleotide base pair into a bacteria. Essentially they got a bacteria to incorporate an artificial nucleotide pair into its DNA, it replicates stably (that is, the new pair stays in the bacterial DNA for generations), and its not removed by DNA repair mechanisms that look for problems in the DNA. Novel nucleotides are not new- researchers have created a large number of these and incorporation into DNA has been done in limited ways in test tube (in vitro) systems, not in a living organism. This is really a pretty cool technical achievement – the researchers had to solve a number of complicated problems to get this to work and, more importantly, it’s likely that they got very lucky with their choices (where ‘luck’ here is a combination of knowledge, trial and error, and actual bona fide luck).

The paper itself doesn’t really overstate the implications of this paper. The only implications statement in the paper comes at the end:

In the future, this organism, or a variant with the UBP incorporated at other episomal or chromosomal loci, should provide a synthetic biology platform to orthogonally re-engineer cells, with applications ranging from site-specific labelling of nucleic acids in living cells to the construction of orthogonal transcription networks and eventually the production and evolution of proteins with multiple, different unnatural amino acids.

And all of this seems very reasonable and potentially achievable.

However, as happens with many high profile papers, the press coverage I’ve seen on this is terrible. From Gizmodo touting that “scientists have created alien DNA” (only for a very limited definition of ‘alien’) to New Scientist stating that researchers have expanded the ‘genetic code’ of a bacterium (not really, a code needs to have meaning- that is, to be translatable into something that has meaning, this advance doesn’t yet). However, perhaps the most troubling is coverage from NPR, largely based on an interview with the senior author of the paper. In this piece Floyd Romesburg introduces a simple, and largely apt analogy, for what his work has done:

Maybe you get three consonants and one vowel. Maybe there are some words you can write and you can string them together to make, sort of, primitive stories. But if you could have a couple extra letters, there’s more that you could write. Having the ability to store increased information would allow you to write more interesting words, bigger words, more complicated words, more nuanced words, better stories. – from NPR interview

He goes on to say:

It’s not so much that I think life needs more genetic information but I think that there are things that we could really learn and drugs that could be developed by getting cells to be able to do more

So it’s not a bad analogy, but one where he’s essentially said: “This one goes to 11.” . And it points out exactly why this work is so limited in its implications (exactly the opposite of what he’s trying to point out by using it BTW). They have added, in a very constrained and limited way, added two letters to the standard ATCG alphabet used by nearly all life. Will this introduce the ability to build more complex or useful biological systems? Not in the slightest. Imagine that we added a couple of letters to the English alphabet. Now we give those extra letters to someone like William Shakespeare. Does anyone think that he would be able to do more with more letters? Write better, more complex, more interesting, more profound plays or sonnets? No, of course not. Even if you gave him a whole bunch of new words that contained the new letters (which they haven’t done at all in this paper- they haven’t actually introduced this new addition into the code itself, only into the alphabet), he would likely have produced very similar works. Maybe those works would be slightly shorter, but they would NOT contain more information. Adding a letter to the alphabet doesn’t increase the information or the complexity of the code. It just doesn’t. The computer that I’m writing this on is based on a binary alphabet (1 and 0 are the only letters it uses) and yet I’m able to put these together (with the help of the underlying OS and software) into complicated and information-rich constructs. Having a computer based on 0 , 1 AND 2 wouldn’t help me write this post any more better [sic].

An image of ACTUAL (fictional) alien DNA. This one has a 15 stranded double helix, which CLEARLY makes it more complicated than our humdrum double stranded type. Clearly. (from the movie The Fifth Element)

An image of ACTUAL (fictional) alien DNA. This one has an 8 stranded helix, which CLEARLY makes it more complicated than our humdrum double stranded type. Clearly. (from the movie The Fifth Element)

The idea that this would lead to development of new drugs, new forms of life, new biology is a far, far, far distant stretch that causes confusion and even fear. The problem here is not purely driven by misunderstanding and misrepresentation of the work by scientific journalists (though it looks like there’s some of that) but from the actions and statements of the senior author himself. As I mentioned, this is a sound paper and is pretty interesting- a technical achievement. It may indeed lead to some interesting new discoveries and methodologies that may be broadly applicable. But it’s not alien DNA and it’s not going to help us cure cancer with new drugs, and it’s not going to provide the ability to make the biology more complex, but it might make our rocker friends green with jealousy when we reveal that we have six nucleotide bases compared to their paltry four bases because “these go to eleven” (Nigel Tufnel)

Coincidence

I had a weird thing happen on my way in to work this morning. On the main road just a short distance from my parking lot I noticed that the SUV in front of me had the same three letter combination on their license plate as mine, “YGK”. Then I noticed that the car in front of THEM had the SAME three letter combination! Wow. What are the odds of that happening? Well, I’m not going to tell you the odds of that happening, because I don’t really know. But it did happen. An odd coincidence for sure, but maybe not as cosmically-connected as you might be inclined to think.

First off, let’s think about the odds of drawing the same 3-letter combination from a hat with 26^3 combinations two times in a row (approximating what happened here- because my license plate is fixed). That’s how many different possible 3-letter combinations there are- I suppose probably subtracting one or two for words that aren’t allowed, like “ASS” and, ummm, well maybe there’s another. This is 17,576. The chances of drawing two of the same out of a hat would be 1/17,576 X 1/17,576 – 1 in 300 million. So this means that you could sit and draw letters out of this hat every second (that is drawing two sets of three letters out every second) for about 10 years before you’d be likely to have this happen. Now clearly I’m simplifying here- but still. So for my license plate story I’d be unlikely to have this happen in my lifetime since I’m only driving every now and then and I’m not generally even paying attention to other people’s license plates to see if this has happened or not.

So here are some reasons why it’s not TOO surprising that it did happen. First, assuming all combinations are used, there are 1000 other vehicles in WA state with the same letters, which narrows the field a bit- but only a bit since there are ~6 million registered vehicles (at least in 2012, though some portion of these have the longer 7 number/letter plates). Second, is that it is likely that these are issued in order (though I’m not 100% sure about that, it would seem to make sense) of request. That means that vehicles purchased about the same time as mine (2001) are probably far more likely to have the same set of letters.That’s been about 13 years, which means that those vehicles are going to be of a certain age.  I would also include geography – since that could be another influencing factor as to which numbers/letters you get, but I did get my license plate on the other side of the state. I don’t have a clear idea of how this would bias the probability of seeing three license plates in a row, but it fits in to my next point, which is hidden or partially hidden explanatory variables.

When my wife and I lived in Portland, far before we had such encumbrances as kids to drag us down, we often did a bunch of activities on a weekend. I started to be surprised to notice some of the same people turning up at different places, parks, restaurants, bookstores, museums, etc, far across town. This happened more than you’d expect in a moderately-sized city. Interestingly, in Seattle when we had a kid this also happened. And it happens all the time in our current city(ies), which are much smaller. My idea about this is that it’s not surprising at all. Our choice of activities and times is dictated or heavily influenced by our age, interests, kidlet status, etc. – as are other peoples’. So instead of thinking of the chances of repeatedly bumping in to the same set of people out of the entire population, think about the chances if the background distribution is much more limited, constrained (in part) by those interests and other personal constraints. The probability of this happening then rises considerably because your considering a smaller number of possible people. I’m sure this has been described before in statistics and would love it if someone knew what it’s called (leave a comment).

How does this fit in to my license plate experience? I don’t really have a clear idea, but it is evident that there can be multiple underlying and often hidden explanatory variables that may be influencing such probabilities. Perhaps my work is enriched in people who think like me and hold on to vehicles for a long time- AND purchased vehicles at about the same time. I think that’s probably likely, though I have no idea how to test it. If that’s true then the chances of running in to someone else with the same letters on their plates, or two people at the same time, would have to go up quite a lot. Still, what are the odds?

The false dichotomy of multiple hypothesis testing

[Disclaimer: I’m not a statistician, but I do play one at work from time to time. If I’ve gotten something wrong here please point it out to me. This is an evolving thought process for me that’s part of the larger picture of what the scientific method does and doesn’t mean- not the definitive truth about multiple hypothesis testing.]

There’s a division in research between hypothesis-driven and discovery-driven endeavors. In hypothesis-driven research you start out with a model of what’s going on (this can be explicitly stated or just the amalgamation of what’s known about the system you’re studying) and then design an experiment to test that hypothesis (see my discussions on the scientific method here and here). In discovery-driven research you start out with more general questions (that can easily be stated as hypotheses, but often aren’t) and generate larger amounts of data, then search the data for relationships using statistical methods (or other discovery-based methods).

The problem with analysis of large amounts of data is that when you’re applying a statistical test to a dataset you are actually testing many, many hypotheses at once. This means that your level of surprise at finding something that you call significant (arbitrarily but traditionally a p-value of less than 0.05) may be inflated by the fact that you’re looking a whole bunch of times (thus increasing the odds that you’ll observe SOMETHING just on random chance alone- see this excellent xkcd cartoon for an example, see below since I’ll refer to this example). So you need to apply some kind of multiple hypothesis correction to your statistical results to reduce the chances that you’ll fool yourself into thinking that you’ve got something real when actually you’ve just got something random. In the XKCD example below a multiple hypothesis correction using Bonferroni’s method (one of the simplest and most conservative corrections) would suggest that the threshold for significance should be moved to 0.05/20=0.0025 – since 20 different tests were performed.

Here’s where the problem of a false dichotomy occurs. Many researchers who analyze large amounts of data believe that utilizing a hypothesis-based approach mitigates the effect of multiple hypothesis testing on their results. That is, they believe that they can focus their investigation of the data to a subset constrained by a model/hypothesis and thus reduce the effect that multiple hypothesis testing has on their analysis. Instead of looking at 10,000 proteins in a study they now look at only the 25 proteins that are thought to be present in a particular pathway of interest (where the pathway here represent the model based on existing knowledge). This is like saying, “we believe that jelly beans in the blue green color range cause acne” and then drawing your significance threshold at 0.05/4=0.0125 – since there are ~4 jelly beans tested that are in the blue-green color range (not sure if ‘lilac’ counts or not- that would make 5). All well and good EXCEPT for the fact that the actual chance of detecting something by random chance HASN’T changed. In large scale data analysis (transcriptome analysis, e.g.) you’ve still MEASURED everything else. You’ve just chosen to limit your investigation to a smaller subset and then can ‘go easy’ on your multiple hypothesis correction.

The counter-argument that might be made to this point is that by doing this you’re testing a specific hypothesis, one that you believe to be true and may be supported by existing data . This is a reasonable point in one sense- it may lend credence to your finding that there is existing information supporting your result. But on the other hand it doesn’t change the fact that you still could be finding more things by chance than you realize because you simply hadn’t looked at the rest of your data. It turns out that this is true not just of analysis of big data, but also of some kinds of traditional experiments aimed at testing individual – associative- hypotheses. The difference there is that it is technically unfeasible to actually test a large amount of the background cases (generally limited to one or two negative controls). Also a mechanistic hypothesis (as opposed to an associative one) is based on intervention, which tells you something different and so is not (as) subject to these considerations.

Imagine that you’ve dropped your car keys in the street and you don’t know what they look like (maybe borrowing a friend’s car). You’re pretty sure you dropped them in front of the coffee shop on a block with 7 other shops on it- but you did walk the length of the block before you noticed the keys were gone. You walk directly back to look in front of the coffee shop and find a set of keys. Great, you’re done. You found your keys, right? What if you looked in front of the other stores and found other sets of keys. You didn’t look- but that doesn’t make it less likely that you’re wrong about these keys (your existing knowledge/model/hypothesis “I dropped them in front of the coffee shop” could easily be wrong).

XKCD: significant

Not being part of the rumor mill

I had something happen today that made me stop and think. I repeated a bit of ‘knowledge’ – something science-y that had to do with a celebrity. This was a factoid that I have repeated many other times. Each time I do I state this factoid with a good deal of authority in my voice and with the security that this is “fact”. Someone who was in the room said, “really?” Of course, as a quick Google check to several sites (including snopes.com) showed- this was, at best, an unsubstantiated rumor, and probably just plain untrue. But the memory voice in my head had spoken with such authority! How could it be WRONG? I’m generally pretty good at picking out bits of misinformation that other people present and checking it, but I realized that I’m not always so good about detecting it when I do it myself.

Of course, this is how rumors get spread and disinformation gets disseminated. As scientists we are not immune to it- even if we’d like to think we are. And we actually could be big players in it. You see, people believe us. We speak with the authority of many years of schooling and many big science-y wordings. And the real danger is repeating or producing factoids that fall in “science” but outside what we’re really experts in (where we should know better). Because many non-scientists see us as experts IN SCIENCE. People hear us spout some random science-ish factoid and they LISTEN to us. And then they, in turn, repeat what we’ve said, except that this time they say it with authority because it was stated, with authority, by a reputable source. US. And I realized that this was the exact same reason that it seemed like fact to me. Because it had been presented to me AS FACT by someone who I looked up to and trusted.

So this is just a note of caution about being your own worst critic – even in normal conversation. Especially when it comes to those slightly too plausible factoids. Though it may not seem like it sometimes people do listen to us.

Gender bias in scientific publishing

The short version: This is a good paper about an important topic, gender bias in publication. The authors try to address two main points: What is the relationship between gender and research output?; and What is the relationship between author gender and paper impact? The study shows a bias in number of papers published by gender, but apparently fails to control for the relative number of researchers of each gender found in each field. This means that the first point of the paper, that women publish less than men, can’t be separated from the well-known gender bias in most of these fields- i.e. there are more men than women. This seems like a strange oversight, and it’s only briefly mentioned in the paper. The second point, which is made well and clearly, is that papers authored by women are cited less than those authored by men. This is the only real take home of the paper, though it is a very important and alarming one.
What the paper does say: that papers authored by women are cited less than those authored by men.
What the paper does NOT say: that women are less productive than men, on average, in terms of publishing papers.
The slightly longer version
This study on gender bias in scientific publishing is a really comprehensive look at gender and publishing world-wide (though it is biased toward the US). The authors do a good job of laying out previous work in this area and then indicate that they are interested in looking at scientific productivity with respect to differences in gender. The first stated goal is to provide an analysis of: “the relationship between gender and research output (for which our proxy was authorship on published papers).”
The study is not in any way incorrect (that I can see in my fairly cursory read-through) but it does present the data in a way that is a bit misleading. Most of the paper describes gathering pretty comprehensive data on gender in published papers relative to author position, geographic location, and several other variables. This is then used to ‘show’ that women are less productive than men in scientific publication but it omits a terribly important step- they never seem to normalize for the ratio of women to men in positions that might be publishing at all. That is, their results very clearly reiterate that there is a gender bias in the positions themselves- but doesn’t say anything (that I can see) about the productivity of individuals (how many papers were published by each author, for example).
They do mention this issue in their final discussion:
UNESCO data show10 that in 17% of countries an equal number of men and women are scientists. Yet we found a grimmer picture: fewer than 6% of countries represented in the Web of Science come close to achieving gender parity in terms of papers published.
And, though this is true, it seems like a less-than-satisfying analysis of the data.
On the other hand, the result that they show at the last- the number of times a paper is cited when a male or female name is included in various locations- is pretty compelling and is really their novel finding. This is actually pretty sobering analysis and the authors provide some ideas on how to address this issue, which seems to be part of the larger problem of providing equal opportunities and advantages to women in science.

Reviewer 3… was RIGHT!

I’m just taking a pass at revising a paper I haven’t really looked at in about six months. I’m coming to a sobering realization: reviewer 3 was right! The paper did deserve to be rejected because of the way it was written and, in spots, poor presentation.

I’ve noticed this before but this was a pretty good example. The paper was originally reviewed for a conference and the bulk of the critique was that it was hard to understand and that some of the data that should have been there wasn’t presented. Because I didn’t get a shot at resubmitting (being a conference) I decided to do a bit more analysis and quickly realized that a lot of the results I’d come up with (but not all) weren’t valid. Or rather, they didn’t validate in another dataset. The reviewers didn’t catch that but it meant that I shelved the paper for awhile until I had time to really revise.

Now I’ve redone the analysis, updated with results that actually work, and have been working on the paper. There are lots of places in the paper where I clearly was blinded to my own knowledge at the time- and I think that’s very common. That is, I presented ideas and results without adequate explanation. At the time it all made sense to me because I was in the moment- but now it seems confusing, even to me. One reviewer stated that the paper is “difficult for me to assess its biological significance in its current form” and another that “I find the manuscript difficult to follow.” Yet another noted that the paper, “lacks a strong biological hypothesis”, which was mainly due to poor presentation on my part.

There were some more substantive comments as well- and I’m addressing those in my revision but this was a good wake-up call for someone like me who has a number of manuscripts under their belt, to be more careful about reading my own work with a fresh eye and having more colleagues or collaborators read my work before it goes in. One thing that I like to do (but often don’t do) is to have someone not involved with the manuscript or the project take a read over the paper. That way you get really fresh eyes – like those of a reviewer – that can point out places where things just don’t add up. Wish me luck for the next round with this paper!