Oh yeah, it’s significant. REALLY significant.

Matthew Hankins recently wrote a very nice post cataloging the ways that researchers try to indicate that some result is *this* close to being significant, but doesn’t quite make the cut. His point is a very good one: a result from a statistical test is either significant, it passes some rather arbitrary threshold (say, less than 0.05), or it isn’t. There’s no almost in significance, no trending toward significance, no flirting with significance. It’s significant or it’s insignificant, period.

I thought it would be useful to also catalog the flip side of this coin: what about when a result passes a significance test and keeps on going? These results are still “just significant”- not “ultra-incredibly significant”, since significance is a binary value. Accordingly, I have assembled a list of ways that authors have expressed that a result has a very low p value and thus is very significant. I also sampled real publications and found that the actual use of these phrases danced lightly about the verge of sending out tendrils to touch something that is close to significance. I feel that this result really wanted to be significant and was moving in that direction, toward significance. With a little more effort and if everyone believes, it can be significant. Say it with me, “I believe it’s significant. I believe it’s significant” (p value 0.99).

Disclaimer: I know that I have, on more than one occasion, been a perpetrator of each of these errors stating that a result is ‘close to significant’ or is ‘highly significant’. I’ll try to be better in the future.

Thanks to Shanon White for the idea for this post. This is an incomplete list. If you have other examples please add them to the comments or Tweet them with the hashtag #ohsosignificant.

  • highly significant (p<1e-8)
  • very significant (p<0.01)
  • extremely significant (p<0.0001)
  • whoah baby that’s significant (p<1e-6)
  • I got your significance right here (p<0.0002)
  • Holy Sh*t! In you FACE b*tches. This sh*t’s significant (p<1e-90)
  • By the power of Greyskull, we have the significance! (p<1e-23)
  • Say hello to my little significance (p<1e-14)
  • You can’t HANDLE the significance (p<1e-30)
  • BAM! There it is daawwg! That’s significance right there! (p<1e-20)
  • You call THAT significant? That’s not significant. THIS is significant (p<1e-45)
  • the mostest significant in the whole wide world (p<1e-29)
  • Neener neener neener motherf**ker (p<1e-65)
  • significance of the utmost elevated level (p<1e-9)
  • Oh that’s good. Really good. Actually I’m thinking that might be Science or Nature good it’s that good. Holy crap, this is actually working. For once it’s working. Oh god I’m so excited, I’m going to totally rub it in the faces of my smug thesis committee. That’ll show them. Yeah. Oh god I hope it’s not wrong. Please let it be not wrong (p<1e-18)
  • solidly, unequivocally significant (p<1e-12)
  • Bonferroni? We don’t need no stinkin’ Bonferroni (p<1e-56)
  • First, we brought you a significant result (p<0.05). Then we rolled out a very significant result (p<0.01). But can we go further? That’s just crazy, right? Nope. We did it, presenting our new ultra significant result (p<1e-20), now with smoother trending.

Here’s the serious part of this post

This is a semantic argument at its heart. It’s a valuable, important, and true fact that statistical significance does not come in shades of gray; it either is or it isn’t. However, we as intelligent, statistically savvy readers interpret these statements, or at least those that are on the border and not hyperbole, as meaning, “if we were to shift our arbitrary threshold we used for statistical significance to a more lenient/conservative value, then the result we talk about would now meet our new criterion for significance”. Yes, the authors should have just set that level of significance to start out with and not bothered to backtrack to make a point. And yes, many of the real statements on Matthew’s post and the (mostly) fake statements on mine are in the realm of the far out and are just silly (a p value of 0.3 being ‘nearly’ significant, really!?). But really the important thing is that you clearly and completely report your findings, the methods you used to arrive at those findings (and conclusions), and provide access to your data so that the interested reader can make their own judgement.

 

 

Job opening: worst critic. Better fill it for yourself, otherwise someone else will.

A recent technical comment in Science (here) reminded me of a post I’d been meaning to write. We need to be our own worst critics. And by “we” I’m specifically talking about the bioinformaticians and computational biologists who are doing lots of transformations with lots of data all the time- but this generally applies to any scientist.

The technical comment I referred to is behind a paywall so I’ll summarize. The first group published the discovery of a mechanism for X-linked dosage compensation in Drosophila based on, among other things, ChIP-seq data (to determine transcription factor binding to DNA). The authors of the comment found that the initial analysis of the data had used an inappropriate normalization step – and the error is pretty simple: instead of multiplying a ratio by a factor (the square root of the number of bins used in a moving average) they multiplied the log2 transform of the ratio by the factor. This resulted in greatly exaggerated ratios, and artificially inducing a statistically significant difference where there was none. Importantly, the authors of the comment noticed this when,

We noticed that the analysis by Conrad et al. reported unusually high Pol II ChIP enrichment levels. The average enrichment at the promoters of bound genes was reported to be ~30,000-fold over input (~15 on a log2 scale), orders of magnitude higher than what is typical of robust ChIP-seq experiments.

This is important because it means that this was an obvious flag that the original authors SHOULD have seen and wondered about at some point. If they wondered about it they SHOULD have looked further into their analysis and done some simple tests to determine if what they were seeing (30,000 fold increase) was actually reasonable. In all likelihood they would have found their error. Of course, they may not have ended up with a story that could be published in Science- but at least they would not have had the embarrassment of being caught out that way. This is not to say that there is any indication of wrongdoing on the part of the original paper- it seems that they made an honest mistake.

In this story the authors likely fell prey to the Confirmation Bias, the tendency to believe results that support your hypothesis. This is a particularly enticing and tricky bias and I have fallen prey to it many times. As far as I know, these errors have never made it into any of my published work. However, falling for particularly egregious examples (arising from mistakes in machine learning applications, for example) trains you to be on the lookout for it in other situations. Essentially it boils down to the following:

  1. Be suspicious of all your results.
  2. Be especially suspicious of results that support your hypothesis.
  3. The amount you should be suspicious should be proportional to the quality of the results. That is, the better the results are the more you should be suspicious of them and the more rigorously you should try to disprove them.

This is essentially wrapped up in the scientific method (my post about that here)- but it bears repeating and revisiting. You need to be extremely critical of your own work. If something works, check to make sure that it actually does work. If it works extremely well, be very suspicious and look at the problem from multiple angles. If you don’t someone else may, and they may not write as nice of things about you as YOU would.

The example I give above is nice in its clarity and it resulted in calling into question the findings of a Science paper (which is embarrassing). However, there are much, much worse cases with more serious consequences.

Take, for instance, the work Keith Baggerly and Kevin Coombes did to uncover a series of cancer papers that had multiple data processing, analysis and interpretation errors. The NY Times ran a good piece on it. It is more complicated and involves both (likely) unintentional errors in processing, analysis, or interpretation and could actually involve more serious issues of impropriety. I won’t go in to the details here but their original paper in The Annals of Applied Statistics, “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology“, should be reading for any bioinformatics or computational biology researcher. The paper painstakingly and clearly goes through the results of several high profile papers from the same group and reconstructs, first, the steps they must have taken to get the results they did, then second, where the errors occurred, and finally, the results if the analysis had been done correctly.

Their conclusions are startling and scary: they found that the methods were often times not described clearly such that a reader could easily reconstruct what was done and they found a number of easily explainable errors that SHOULD have been caught by the researchers.

These were associated with one group and a particular approach, but I can easily recognize the first, if not the second, in many papers. That is, it is often times very difficult to tell what has actually been done to process the data and analyze it. Steps that have to be there are missing in the methods sections, parameters for programs are omitted, data is referred to but not provided, and the list goes on. I’m sure that I’ve been guilty of this from time to time. It is difficult to remember that writing the “boring” parts of the methods may actually ensure that someone else can do what you’ve done. And sharing your data? That’s just a no-brainer, but something that is too often overlooked in the rush to publish.

So these are cautionary tales. For those of us handling lots of data of different types for different purposes and running many different types of analysis to obtain predictions we must always be on guard against our own worst enemy, ourselves and the errors we might make. And we must be our own worst (and best) critics: if something seems too good to be true, it probably is.

 

Time to review for scientific publications revisited

Anna Sharman wrote a couple of excellent posts about time to first response for journals and time to publication after acceptance for journals. Following up my previous post on time spent at the journal (from submission to acceptance) she wrote:

So I went back through my email archive and reconstructed the process for all the papers I previously listed, plus a couple. And I corrected a few inaccuracies in my previous report (I was lumping the two rejections that I then resubmitted anew with their eventually accepted versions- not really fair for the journals). Here are the results which show that the time from submission to first response (in all cases listed except the one with the asterisk this is when I received the first reviews), overall time to acceptance, and finally time to publication after acceptance. The publication time is the first time the article appears on a website since most of these journals have epub before ‘print’ policies. PLoS journals don’t have a physical volume (there is not a physical paper-and-glue PLoS journal) but they do release volumes, collections of articles, at set times.

Overall my reanalysis decreased the mean total time at the journal (from 5.7 months to 4.9 months) and showed that the actual time spent under review (as opposed to time when I was revising the paper according to reviewers’ comments) was about half that- about 2.6 months. I would be interested in seeing if this is typical or not since this is one variable that could be very specific to the way that I work.

The outlier in the analysis is my PLoS One publication that was first considered at PLoS Computational Biology. This appears to have a very short turnaround time, but this is only because the editor at PLoS One evaluated my responses to the reviews received from PLoS Computational Biology and made a decision on that basis.

Finally, this analysis does not take into account several places where the effective time to acceptance was much longer. The aforementioned PLoS One publication was actually submitted to the RECOMB Systems Biology conference where it reviewed well (about 2.5 months) then recommended for consideration at PLoS Computational Biology, where it was reviewed not as well. From start to finish this was close to a year before it was actually published. Likewise the BMC Systems Biology publication that was rejected then resubmitted went through a long process of editorial consideration at the end that extended the time we had it (i.e. shortened the time in review by my calculations) by a lot since we had challenged inappropriate reviews at the editorial level.

The original impetus for the post was

And the current analysis revises my initial assessment since what Nick was really asking about was the turnaround time- that is, the time from submission to receipt of the first reviews for the paper. In this case 100 days is quite a bit longer than normal (as judged by my limited analysis here) since the mean turnaround times I get are about 2 months.

Table. (revised) Survey of time in review for a number of my own papers.

PMID Journal Time to first review Months until acceptance Months spent in review Acceptance to publication
23335946 Expert opinions 1.9 4.7 1.9 0.9
22546282 BMC Sys Bio (rejected) 2.0   2.0  
22546282 BMC Sys Bio (new submission) 1.9 11.2 3.0 1.3
23071432 PLoS CB 3.1 12.2 7.6 1.8
22745654 PLoS One 2.7 6.6 4.2 2.7
22074594 BMC Sys Bio 3.2 3.4 3.3 1.0
21698331 Mol BioSystems 1.8 4.6 3.0 1.0
21339814 PLoS CB (rejected) 1.0   1.0
21339814 PLoS One (new submission *) 0.4 0.4 0.4 0.9
20974833 Infection and Immunity 1.8 1.8 2.2 0.7
20877914 Mol BioSystems 0.8 1.2 0.9 1.4
19390620 PLoS Pathogens 1.8 4.4 2.4 4.6
20974834 Infection and Immunity 2.0 2.9 2.0 0.4
Mean (months) 1.9 4.9 2.6 1.5
  Std dev (days) 24 119 57 35

 

 

 

Fact from fiction: The scientific method is alive and well

Weirdly I’ve read two blog posts today from apparently completely independent sources (here and here) that both state essentially the same thing: the scientific method is harmful to creativity and is not the “only way to do science”. I’ve posted here and here about the scientific method previously. While I applaud their efforts to make science more approachable to all and I do agree that conceiving of the scientific method too rigidly is a mistake, the basic premise of these posts is absolutely wrong.

Both use the examples of the observation being an integral part of science:

In 1928 Alexander Fleming accidentally left a cover off a petri dish used to cultivate bacteria. The plate was contaminated by a mold that contained penicillin. In this case, there was no problem or question to start with. It was an accident. –Rhett Allain

and:

In some instances, scientists may use computers to model, or simulate, conditions. Other times, researchers will test ideas in the real world. Sometimes they begin an experiment with no idea what may happen. They might disturb some system just to see what happens, –Jennifer Cutraro

 

It is certainly true that chance observations can start the process of scientific investigation, but in the first example (which the author follows with several other examples from other fields- all with the same basic problem), the observation being described is the very start and not the end product. Fleming did NOT just discover penicillin by observing a petri dish that was left open by accident. In fact he probably initially had a hunch (based on an internal model of some kind that he’d built up during his career and the results he observed from this accident) that something was interesting and worth pursuing further using, wait for it, THE SCIENTIFIC METHOD. This would involve constructing controlled tests to isolate the source of the mold (and indeed to show that it was mold at all), validate what it was doing (for example, maybe the mold grew but had no effect on the growth of the bacteria, which were instead killed by the sudden draft of cold air, maybe aliens came into the lab at night and killed off the bacteria one-by-one, maybe…), and then identify the factor that was causing this (magic pixie dust! churches! very small rocks!). I’m not up on the history, but I’m sure that this actually took many years and used the scientific method many times in the kind of iterative cycle that I’ve described in my previous posts. The fact that the initial observation was accidental has nearly no relation with the subsequent application of the scientific method to follow up. Many accidents or weird observations are discarded as being uninteresting or not worth pursuing, sometimes in error.

The second blog post simply lists several examples of ways that a scientist might start out using the scientific method (similar to the story of penicillin)- and Ms. Cutraro then uses the words ‘test’ and ‘experiment’, which are both components of the scientific method. She is not describing scientific discovery, she’s describing the very first steps toward scientific discovery. Ms. Cutraro writes:

In contrast, geologists, scientists who study the history of Earth as recorded in rocks, won’t necessarily do experiments, Schweingruber points out. “They’re going into the field, looking at landforms, looking at clues and doing a reconstruction to figure out the past,” she explains. Geologists are still collecting evidence, “but it’s a different kind of evidence.”

This kind of science does not generally involve physical experiments, that’s true. However, the gathering of evidence to support or discard a model is a version of the scientific method. The process of “looking at clues” and “doing a reconstruction” can be part of the scientific method (in fact if you’re looking at clues then you are certainly using the scientific method). Imagine we identify a landform that seems to be formed by running water. We (as  geologists) can test whether this is untrue by performing observations of many such landforms, the terrain around them, and other features that might support the idea of the landform being formed by water, or not- in which case a new hypothesis/model must be formed. There may not be the ability to physically test this hypothesis by running a gigantic experiment involving tons of rock and millions of years of running water, but it is still very much science.

So my position is that the scientific method, in the broad sense but as I’ve previously outlined it, is inherent to our ability to discriminate fact from fiction- in fact the two are essentially the same thing. The act of stating a formal hypothesis is often something that is either unstated or unconscious, but it is always present and it is part of how we learn. To state that we can discover things without using the scientific method is misleading (at best).

And, importantly, both of these discussions sell the idea of a model short.

…make models of stuff. Really, that is what we do in science. We try to make equations or conceptual ideas or computer programs that can agree with real life and predict future events in real life. That is science. –Rhett Allain

That is, they seem to separate the idea of a model from the process of the scientific method, which it is not. A model, whether a conceptual gathering of existing knowledge into a picture of “how things should be” or instantiated in some way, like a computer algorithm, is an absolute requirement of the scientific method and can’t be separated from it. In fact, a model does not exist outside the scientific method. If a model predicts future events then these predictions must be validated using the… yes, that again.

So perhaps what the authors of these two posts really mean (and this is suggested by some of their writing) is that the traditional view of the scientific method as a rigidly defined set of steps, is not wholly comprehensive. Each of these steps must be thought of for what they mean and how they apply to every day science and indeed the rest of life. Science IS the scientific method. It is the way that we learn things about reality. And it is the only way we can exclude sets of plausible fictions to guide us toward fact.