I’ve been thinking lately about how events in your academic life can lead to unintended, and often times unrecognized, downstream effects. Recently I realized that I’m having trouble putting together a couple of papers that I’m supposed to be leading. After some reflection I came to the conclusion that at least one reason is I’ve been affected by the long, tortuous, and somewhat degrading process of trying to get a large and rather important paper published. This paper has been in the works, and through multiple submission/revision cycles, for around five years. And it starts to really wear on your academic psyche after that time, though it can be hard to recognize. I think that my failure to get that paper published (so far) is partly holding me back on putting together these other papers. Partly this is about the continuing and varied forms of rejection you experience in this process, but partly it’s about the fact that there’s something sitting there that shouldn’t be sitting there. Even though I don’t currently have any active tasks that I have to complete for that problem paper it still weighs on me.
The silver lining is that once I recognized that this was a factor things started to seem easier with those projects and the story I was trying to tell. Anyway, I think we as academics should have our own therapists that specialize in problems such as this. It would be very helpful.
This comic is inspired, not by real interactions I’ve had with developers (no developer has ever volunteered to get within 20 paces of my code), but rather by discussions online on the importance of ‘proper’ coding. Here’s a comic from xkcd which has a different point:
My reaction to this– as a bench biology-trained computational biologist who has never taken a computer programming class– is “who cares?” If it works, really, who cares?
Sure, there are very good reasons for standard programming practices, standards, and clean, efficient code. Even in bioinformatics (or especially so). These would be almost exclusively applicable to approaches that you’ve had quite a bit of experience with working out the bugs, figuring out how it works with the underlying data, making sure that it’s actually useful in terms of the biology. This is at least 75% of my job. I try and discard many approaches for any particular problem I’m working on. It’s important to have a record of these attempts, but this code doesn’t have to be clean or efficient. There are exceptions to this, such as when you have code that takes a loooong time to run even once, you probably want to make that as efficient as you can. The vast majority of the things I do- even with large amounts of data- I can determine if they’re working or not in a reasonable amount of time using inefficient code (anything written in R, for example).
The other part, where good coding is important, is when you want the code to be usable by other people. This is an incredibly important part of computational biology and I’m not trying to downplay its importance here. This is when you’re relatively certain that the code will be looked at and/or used by other people in your own group and when you publish or release the code to a wider audience.
For further reading into this subject here’s a post from Byte Size Biology that covers some great ideas for writing *research* code. And here is some dissenting opinion from Living in and Ivory Basement touting the importance of good programming practices (note- I don’t disagree, but do believe that at least 75% of the coding I do should not have such a high bar- not necessary and I’d never get anything done) . Finally, here are some of my thoughts on how coding really follows the scientific method.