Too good to be true or too good to pass up?
There’s been a lot of discussion about the importance of replication in science (read an extensive and very thoughtful post about that here) and notable occurrences of non-reproducible science being published in high-impact journals. The recent retraction of the two STAP stem cell papers from Nature and accompanying debate over who should be blamed and how. The publication of a study (see also my post about this) in which research labs responsible for high-impact publications were challenged to reproduce their findings showed that many of these findings could not be replicated, in the same labs they were originally performed in. These, and similar cases and studies, indicate serious problems in the scientific process- especially, it seems, for some high-profile studies published in high-impact journals.
I was surprised, therefore, at the reaction of some older, very experienced PIs recently after a talk I gave at a university. I mentioned these problems, and briefly explained the results of the study on reproducibility to them- that, in 90% of the cases, the same lab could not reproduce the results that they had previously published. They were generally nonplussed. “Oh”, one said, “probably just a post-doc with magic hands that’s no longer in the group”. And all agreed on the difficulty of reproducing results for difficult and complicated experiments.
So my question is: do these fabled lab technicians actually exist? Are there those people who can “just get things to work”? And is this actually a good thing for science?
I have some personal experience in this area. I was quite good at futzing around with getting a protocol to work the first time. I would get great results. Once. Then I would continue to ‘innovate’ and find that I couldn’t replicate my previous work. In my early experiences I sometimes would not keep notes well enough to allow me to go back to the point where I got it to work. Which was quite disturbing and could send me into a non-productive tailspin of trying to replicate the important results. Other times I’d written things down sufficiently that I could get them to work again. And still others I found that someone else in the lab could consistently get better results out of the EXACT SAME protocol- apparently followed the same way. They had magic hands. Something about the way they did things just *worked*. There were some protocols in the lab that just seemed to need this magic touch- some people had it and some people didn’t. But does that mean that the results these protocols produced were wrong?
What kinds of procedures seem to require “magic hands”? One example is from when I was doing electron microscopy (EM) as a graduate student. We were working constantly at improving our protocols for making two-dimensional protein crystals for EM. This was delicate work, which involved mixing protein with a buffer in a small droplet, layering on a special lipid, incubating for some amount of time to let the crystals form, then lifting the fragile lipid monolayer (hopefully with protein crystals) off onto an EM grid and finally staining with an electron dense stain or flash freezing in liquid nitrogen. The buffers would change, the protein preparations would change, the incubation conditions would change, and how the EM grids were applied to our incubation droplets to lift off the delicate 2D crystals was subject to variation. Any one of these things could scuttle getting good crystals and would therefore produce a non-replication situation. There were several of us in the lab that did this and were successful in getting it to work- but it didn’t always work and it took some time to develop the right ‘touch’ to get it to work. The number of factors that *potentially* contributed to success or failure was daunting and a bit disturbing- and sometimes didn’t seem to be amenable to communication in a written protocol. The line between superstition and required steps was very thin.
But this is true of many protocols that I worked with throughout my lab career* – they were often complicated, multi-step procedures that could be affected by many variables- from the ambient temperature and humidity to who prepared the growth media and when. Not that all of these variables DID affect the outcomes but when an experiment failed there were a long list of possible causes. And the secret with this long list? It probably didn’t include all the factors that did affect the outcome. There were likely hidden factors that could be causing problems. So is someone with magic hands lucky, gifted, or simply persistent? I know of a few examples where all three qualities were likely present- with the last one being, in a way, most important. Yes, my collaborator’s post-doc was able to do amazing things and get amazing results. But (and I know this was the case) she worked really long and hard to get them. She probably repeated experiments many, many times ins some cases before she got it to work. And then she repeated the exact combination to repeat the experiments again. And again. And sometimes even that wasn’t enough (oops, the buffer ran out and had to be remade, but the lot number on the bottle was different, and weren’t they working on the DI water supply last week? Now my experiment doesn’t work anymore.)
So perhaps it’s not so surprising that many of these key findings from these papers couldn’t be repeated, even in the same labs. There was not the same incentive to get it to work for one thing- so that post-doc or another graduate student who’s taken over the same duties, probably tried once to repeat the experiment. Maybe twice. Didn’t work. Huh? That’s unfortunate. And that’s about as much time as we’re going to put in to this little exercise. The protocols could be difficult, complicated, and have many known and unknown variables affecting their outcomes.
But does it mean that all these results are incorrect? Does it mean that the underlying mechanisms or biology that was discovered was just plain wrong? No. Not necessarily. Most, if not all, of these high-profile publications that failed to repeat spawned many follow-on experiments and studies. It’s likely that many of the findings were borne out by orthogonal experiments, that is, experiments that test implications of these findings, and by extension the results of the original finding itself. Because of the nature of this study it was conducted anonymously- so we don’t really know, but it’s probably true. This was an important point, and one that was brought up by these experienced PIs I was talking with, is that sometimes direct replication may not be the most important thing. Important, yes. But perhaps not deal-killing if it doesn’t work. The results still might stand IF, and only if, second, third, and fourth orthogonal experiments can be performed that tell the same story.
Does this mean that you actually can make stem cells by treating regular cultured cells with an acid bath? Well, probably not. For some of these surprising, high-profile findings the ‘replication’ that is discussed is other labs trying to see if the finding is correct. So they try the protocols that have been reported, but it’s likely that they also try other orthogonal experiments that would, if positive, support the original claim.
“OMG! This would be so amazing if it’s true- so, it MUST be true!”
So this gets back to my earlier discussions on the scientific method and the importance of being your own worst skeptic (see here and here). For every positive result the first reaction should be “this is wrong”, followed by, “but- if it WERE right then X, Y, and Z would have to be true. And we can test X, Y, and Z by…”. The burden of scientific ‘truth’** is in replication, but in replication of the finding– NOT NECESSARILY in replication of the identical experiments.
*I was a labbie for quite a few of my formative years. That is, I actually got my hands dirty and did real, honest-to-god experiments, with Eppendorf tubes, vortexers, water baths, cell culture, the whole bit. Then I converted and became what I am today – a creature purely of silicon and code. Which suits me quite well. This is all just to add to my post a “I kinda know what I’m talking about here- at least somewhat”.
** where I using a very scientific meaning of truth here, which is actually something like “a finding that has extensive support through multiple lines of complementary evidence”