A recent technical comment in Science (here) reminded me of a post I’d been meaning to write. We need to be our own worst critics. And by “we” I’m specifically talking about the bioinformaticians and computational biologists who are doing lots of transformations with lots of data all the time- but this generally applies to any scientist.
The technical comment I referred to is behind a paywall so I’ll summarize. The first group published the discovery of a mechanism for X-linked dosage compensation in Drosophila based on, among other things, ChIP-seq data (to determine transcription factor binding to DNA). The authors of the comment found that the initial analysis of the data had used an inappropriate normalization step – and the error is pretty simple: instead of multiplying a ratio by a factor (the square root of the number of bins used in a moving average) they multiplied the log2 transform of the ratio by the factor. This resulted in greatly exaggerated ratios, and artificially inducing a statistically significant difference where there was none. Importantly, the authors of the comment noticed this when,
We noticed that the analysis by Conrad et al. reported unusually high Pol II ChIP enrichment levels. The average enrichment at the promoters of bound genes was reported to be ~30,000-fold over input (~15 on a log2 scale), orders of magnitude higher than what is typical of robust ChIP-seq experiments.
This is important because it means that this was an obvious flag that the original authors SHOULD have seen and wondered about at some point. If they wondered about it they SHOULD have looked further into their analysis and done some simple tests to determine if what they were seeing (30,000 fold increase) was actually reasonable. In all likelihood they would have found their error. Of course, they may not have ended up with a story that could be published in Science- but at least they would not have had the embarrassment of being caught out that way. This is not to say that there is any indication of wrongdoing on the part of the original paper- it seems that they made an honest mistake.
In this story the authors likely fell prey to the Confirmation Bias, the tendency to believe results that support your hypothesis. This is a particularly enticing and tricky bias and I have fallen prey to it many times. As far as I know, these errors have never made it into any of my published work. However, falling for particularly egregious examples (arising from mistakes in machine learning applications, for example) trains you to be on the lookout for it in other situations. Essentially it boils down to the following:
- Be suspicious of all your results.
- Be especially suspicious of results that support your hypothesis.
- The amount you should be suspicious should be proportional to the quality of the results. That is, the better the results are the more you should be suspicious of them and the more rigorously you should try to disprove them.
This is essentially wrapped up in the scientific method (my post about that here)- but it bears repeating and revisiting. You need to be extremely critical of your own work. If something works, check to make sure that it actually does work. If it works extremely well, be very suspicious and look at the problem from multiple angles. If you don’t someone else may, and they may not write as nice of things about you as YOU would.
The example I give above is nice in its clarity and it resulted in calling into question the findings of a Science paper (which is embarrassing). However, there are much, much worse cases with more serious consequences.
Take, for instance, the work Keith Baggerly and Kevin Coombes did to uncover a series of cancer papers that had multiple data processing, analysis and interpretation errors. The NY Times ran a good piece on it. It is more complicated and involves both (likely) unintentional errors in processing, analysis, or interpretation and could actually involve more serious issues of impropriety. I won’t go in to the details here but their original paper in The Annals of Applied Statistics, “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology“, should be reading for any bioinformatics or computational biology researcher. The paper painstakingly and clearly goes through the results of several high profile papers from the same group and reconstructs, first, the steps they must have taken to get the results they did, then second, where the errors occurred, and finally, the results if the analysis had been done correctly.
Their conclusions are startling and scary: they found that the methods were often times not described clearly such that a reader could easily reconstruct what was done and they found a number of easily explainable errors that SHOULD have been caught by the researchers.
These were associated with one group and a particular approach, but I can easily recognize the first, if not the second, in many papers. That is, it is often times very difficult to tell what has actually been done to process the data and analyze it. Steps that have to be there are missing in the methods sections, parameters for programs are omitted, data is referred to but not provided, and the list goes on. I’m sure that I’ve been guilty of this from time to time. It is difficult to remember that writing the “boring” parts of the methods may actually ensure that someone else can do what you’ve done. And sharing your data? That’s just a no-brainer, but something that is too often overlooked in the rush to publish.
So these are cautionary tales. For those of us handling lots of data of different types for different purposes and running many different types of analysis to obtain predictions we must always be on guard against our own worst enemy, ourselves and the errors we might make. And we must be our own worst (and best) critics: if something seems too good to be true, it probably is.