Writing Yourself Into A Corner

I’ve been fascinated with the idea of investment, and how it can color your thoughts, feelings, and opinions about something. Not the monetary sense of the word (though probably that too) but the emotional and intellectual sense of the word. If you’ve ever been in a bad relationship you might have fallen prey to this reasoning- “I’m in this relationship and I’m not getting out because reasons so admitting that’s it’s absolutely terrible for me is unthinkable so I’m going to pretend like it’s not and I’m going to believe that it’s not and I’m going to tell everyone that I’m doing great”. I really believe this can be a motivating factor for a big chunk of human behavior.

And it’s certainly a problem in science. When you become too invested in an idea or an approach or a tool- that is, you’ve spent a considerable amount of time researching or promoting it- it can be very difficult to distance yourself from that thing and admit that you might have it wrong. That would be unthinkable.

Sometimes this investment pitfall is contagious. If you’re on a project working together with others for common goals the problem of investment can become more complicated. That is, if I’ve said something, and some amount of group effort has been put into this idea, but it turns out I was wrong about it, it can be difficult to raise that to the rest of the group. Though, I note, that it is really imperative that it is raised. This can become more difficult if the ideas or preliminary results you’ve put forward become part of the project- through presentations made by others or through further investment of project resources to follow up on these leads.

I think this sometimes happens when you’re writing an early draft of a document- though the effect can be more subtle here. If you write words down and put out ideas that are generally sound and on-point it can be hard for you, or others who may edit the paper after you, to erase these. More importantly a first draft, no matter how preliminary or draft-y, can establish an organization that can be hard to break. Clearly if there are parts that really don’t work, or don’t fit, or aren’t true, they can be removed fairly easily. The bigger problems lie in those parts that are *pretty good*. I’ve looked back at my own preliminary drafts and realized (after a whole lot of work trying to get things to fit) that the initial overall organization was somehow wrong- and that I really need to rip it all apart and start over, at least in terms of the organization. I’ve also seen this in other people’s work, where something just doesn’t seem right about a paper, but I really can’t place my finger on what- at least not without a bunch of effort.

Does this mean that you should very carefully plan out your preliminary drafts? Not at all. That’s essentially the route to complete gridlock and non-productivity. Rather, you should be aware of this problem and be willing to be flexible. Realize that what you put down on the paper for the first draft (or early versions of analysis) is subject to change- and make others you are working with aware of this explicitly (simply labeling something as “preliminary analysis” or “rough draft” isn’t explicit enough). And don’t be afraid to back away from it if it’s not working out. It’s much better if that happens earlier in the process than later- that is, it’s better to completely tear down a final draft of a paper than to have reviewers completely miss the point of what you’re trying to say after you’ve submitted it.

writing_corner_v1

Big Data Showdown

One of the toughest parts of collaborative science is communication across disciplines. I’ve had many (generally initial) conversations with bench biologists, clinicians, and sometimes others that go approximately like:

“So, tell me what you can do with my data.”

“OK- tell me what questions you’re asking.”

“Um,.. that kinda depends on what you can do with it.”

“Well, that kinda depends on what you’re interested in…”

And this continues.

But the great part- the part about it that I really love- is that given two interested parties you’ll sometimes work to a point of mutual understanding, figuring out the borders and potential of each other’s skills and knowledge. And you generally work out a way of communicating that suits both sides and (mostly) works to get the job done. This is really when you start to hit the point of synergistic collaboration- and also, sadly, usually about the time you run out of funding to do the research.
BigDataShowdown_v1

Goodbye to two good friends

(Note: this post isn’t nearly as sad as it might seem from the title or the introduction below)

Yesterday I lost two close friends. We had been friends for five years, though our relationships had extended a tumultuous 10 or so months before that. Given that we still have unfinished business I expect our friendships to straggle on a little longer. But really, it’s over. My friends have helped me grow in a number of important ways- become more mature, deal with different personalities, forced me to communicate more clearly and to take criticism in a constructive light. The friendships both challenged me in different ways and supported me through a fragile time in my life. I will miss both of these friends for some different reasons- and some of the same reasons.

Like many friendships they have ended because of what other people thought about them. A small number of people had comments on our friendship- some of the comments, upon reflection, were probably well-placed, others certainly were not. But that outside influence is what really broke us apart. I hope that we can become friends again in the future- but we both will have changed so much in the intervening time that we may well be unrecognizable to each other. Still it would be nice to continue this friendship.

Bye Bye

Farewell Systems Biology of Enteropathogens and Systems Virology Centers – you will be missed but not forgotten.

Here are a few mementos of our time together….

  1. Ansong C, Schrimpe-Rutledge AC, MitchellH, Chauhan S,Jones MB, Kim Y-M, McAteerK, Deatherage B, Dubois JL, Brewer HM, Frank BC, McDermottJE, Metz TO, Peterson SN, Motin VL, Adkins JN. A multi-omic systems approach to elucidating Yersinia virulence mechanisms.Molecular Biosystems 2012. In press.
  2. McDermott JE, Corley C, Rasmussen AL, Diamond DL, Katze MG, Waters KM: Using network analysis to identify therapeutic targets from global proteomics dataBMC systems biology 2012, 6:28.
  3. Yoon H, Ansong C, McDermott JE, Gritsenko M, Smith RD, Heffron F, Adkins JN: Systems analysis of multiple regulator perturbations allows discovery of virulence factors in SalmonellaBMC systems biology 2011, 5:100.
  4. Niemann GS, Brown RN, Gustin JK, Stufkens A, Shaikh-Kidwai AS, Li J, McDermott JE, Brewer HM, Schepmoes A, Smith RD et alDiscovery of novel secreted virulence factors from Salmonella enterica serovar Typhimurium by proteomic analysis of culture supernatantsInfect Immun 2011, 79(1):33-43.
  5. McDermott JE, Yoon H, Nakayasu ES, Metz TO, Hyduke DR, Kidwai AS, Palsson BO, Adkins JN, Heffron F: Technologies and approaches to elucidate and model the virulence program of salmonellaFront Microbiol 2011, 2:121.
  6. McDermott JE, Shankaran H, Eisfeld AJ, Belisle SE, Neumann G, Li C, McWeeney SK, Sabourin CL, Kawaoka Y, Katze MG et alConserved host response to highly pathogenic avian influenza virus infection in human cell culture, mouse and macaque model systemsBMC systems biology 2011, 5(1):190.
  7. McDermott JE, Corrigan A, Peterson E, Oehmen C, Niemann G, Cambronne ED, Sharp D, Adkins JN, Samudrala R, Heffron F: Computational prediction of type III and IV secreted effectors in gram-negative bacteriaInfect Immun 2011, 79(1):23-32.
  8. McDermott JE, Archuleta M, Thrall BD, Adkins JN, Waters KM: Controlling the response: predictive modeling of a highly central, pathogen-targeted core response module in macrophage activationPLoS ONE 2011, 6(2):e14673.
  9. Aderem A, Adkins JN, Ansong C, Galagan J, Kaiser S, Korth MJ, Law GL, McDermott JG, Proll SC, Rosenberger C et alA systems biology approach to infectious disease research: innovating the pathogen-host research paradigmMBio 2011, 2(1):e00325-00310.
  10. Buchko GW, Niemann G, Baker ES, Belov ME, Heffron F, Adkins JN, McDermott JE (2011). A multi-pronged search for a common structural motif in the secretion signal of Salmonella enterica serovar Typhimurium type III effector proteinsMolecular Biosystems. 6(12):2448-58.
  11. Lawrence PK, Kittichotirat W, Bumgarner RE, McDermott JE, Herndon DR, Knowles DP, Srikumaran S: Genome sequences of Mannheimia haemolytica serotype A2: ovine and bovine isolatesJ Bacteriol 2010, 192(4):1167-1168
  12. Yoon H, McDermott JE, Porwollik S, McClelland M, Heffron F: Coordinated regulation of virulence during systemic infection of Salmonella enterica serovar TyphimuriumPLoS Pathog 2009, 5(2):e1000306.
  13. *Taylor RC, Singhal M, Weller J, Khoshnevis S, Shi L, McDermott J: A network inference workflow applied to virulence-related processes in Salmonella typhimuriumAnnals of the New York Academy of Sciences 2009, 1158:143-158.
  14. *Shi L, Chowdhury SM, Smallwood HS, Yoon H, Mottaz-Brewer HM, Norbeck AD, McDermott JE, Clauss TRW, Heffron F, Smith RD, and Adkins JN. Proteomic Investigation of the Time Course Responses of RAW 264.7 Macrophages to Salmonella Infection. Infection and Immunity 2009, 77(8):3227-33.
  15. *Shi L, Ansong C, Smallwood H, Rommereim L, McDermott JE, Brewer HM, Norbeck AD, Taylor RC, Gustin JK, Heffron F, Smith RD, Adkins JN. Proteome of Salmonella Enterica Serotype Typhimurium Grown in a Low Mg/pH Medium. J Proteomics Bioinform. 2009; 2:388-397.
  16. *Samudrala R, Heffron F, McDermott JE: Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systemsPLoS Pathog 2009, 5(4):e1000375.
  17. *McDermott JE, Taylor RC, Yoon H, Heffron F: Bottlenecks and hubs in inferred networks are important for virulence in Salmonella typhimuriumJ Comput Biol 2009, 16(2):169-180.
  18. *Ansong C, Yoon H, Norbeck AD, Gustin JK, McDermott JE, Mottaz HM, Rue J, Adkins JN, Heffron F, Smith RD: Proteomics Analysis of the Causative Agent of Typhoid FeverJ Proteome Res 2008.

*these were really from slightly before our time- but I’ll count them there anyway

Finding your keys in a mass of high-throughput data

There is a common scientific allegory that is often used in criticism of discovery-based science. A person is looking around at night under a lamppost when a passerby asks, “What are you doing?”. “Looking for my keys” the person replies. “Oh, did you lose them here?” the concerned citizen. “No”, the person replies, “I lost them over there, but the light is much better here”.

cropped-night_tree_7.jpg

The argument as applied to science in a nutshell is that we commonly ask questions based on where the ‘light is good’- that is, where we have the technology to be able to answer the question, rather than asking a better question in the first place. This recent piece covering several critiques of cancer genomics projects is a good example, and uses the analogy liberally throughout- referencing its use in the original articles covered.

One indictment of the large-scale NCI project, The Cancer Genome Atlas (TCGA) is as follows:

The Cancer Genome Atlas, an ambitious effort to chart and catalog all the significant mutations that every important cancer can possibly accrue. But these efforts have largely ended up finding more of the same. The Cancer Genome Atlas is a very significant repository, but it may end up accumulating data that’s irrelevant for actually understanding or curing cancer.

Here’s my fundamental problem with the metaphor, and it’s use as a criticism of scientific endeavors such as the TCGA. The problem is this: we don’t know where the keys are a priori, the light is brightest under the lamppost, we would be stupid NOT to look there first. The indictment comes post-hoc, and so benefits from knowing the answer (or at least having a good idea of the answer). Unraveling this anti-metaphor: we don’t know what causes cancer, genomics seems like a likely place to look and the we have the technology to do so, and if we started by looking elsewhere critics would be left wondering why didn’t we look in the obvious way. The fact that we didn’t find anything new with the TCGA (the jury is quite out on that point) is still a positive step forward in the understanding of cancer- it means that we’ve looked in genomics and haven’t found the answer. This can be used to drive the next round of investigation. If we hadn’t done it, we simply wouldn’t know, and that would prevent taking certain steps to move forward.

Of course, the value of the metaphor is that it can be used to urge caution in investigation. If we have a good notion that our keys are not under the light, then maybe we ought to be thinking about going to get our flashlights to look in the right area to start with. We should also be very careful that in funding the large projects to look where the light is, that we don’t sacrifice other projects that may end up yielding fruit. It is true that large projects tend to be over-hyped and the answers are promised (or all but) before they’ve even begun to be answered. Part of this is necessary salesmanship to be able to get these things off the ground at all, but overselling does not reflect well on anyone in the long run. Finally, the momentum of large projects or motivating ideas (“sequence all cancer genomes”) can be significant and may carry the ideas beyond what is useful. When we’ve figured out that the keys are not under the lamppost we had better figure out where to look next rather than combing over the same well-lit ground.

Part of this piece reflects very well on work that I’m involved with- proteomic and phosphoproteomic characterization of tumors from TCGA under the Clinical Proteomics Tumor Analysis Consortium (CPTAC):

“as Yaffe points out, the real action takes place at the level of proteins, in the intricacies of the signaling pathways involving hundreds of protein hubs whose perturbation is key to a cancer cell’s survival. When drugs kill cancer cells they don’t target genes, they directly target proteins”

So examining the signaling pathways that are involved in cancer directly, as opposed to looking at gene expression or modification as a proxy of activity, may indeed be the way to go to elucidate the causes of cancer. We believe that integrating this kind of information, which is closer to actual function, with the depth of knowledge provided by the TCGA will give significant insight into the biology of cancer and it’s underlying causes. But we’ve only started looking under that lamppost.

So, the next time you hear someone using this analogy as an indictment of a project or approach, ask yourself if they are using this argument post-hoc? That is, “they looked under the lamppost and didn’t find anything so their approach was flawed”. It wasn’t- it was likely the clearest, most logical step, that was most likely to yield fruit given a reasonable cost-benefit assessment.

 

Eight red flags in bioinformatics analyses

A recent comment in Nature by C. Glenn Begley outlines six red flags that basic science research won’t be reproducible. Excellent read and excellent points. The point of this comment, based on experience from writing two papers in which:

Researchers — including me and my colleagues — had just reported that the majority of preclinical cancer papers in top-tier journals could not be reproduced, even by the investigators themselves12.

was to summarize the common problems observed in the non-reproducible papers surveyed since the author could not reveal the identities of the papers themselves. Results in a whopping 90% of papers they surveyed could not be reproduced, in some cases even by the same researchers in the same lab, using the same protocols and reagents. The ‘red flags’ are really warnings to researchers of ways that they can fool themselves (as well as reviewers and readers in high-profile journals) and things that they should do to avoid falling into the traps found by the survey. These kinds of issues are major problems in analysis of high-throughput data for biomarker studies, and other purposes as well. As I was reading this I realized that I’d written several posts about these issues, but applied to bioinformatics and computational biology research. Therefore, here is my brief summary of these six red flags, plus two more that are more specific to high-throughput analysis, as they apply to computational analysis- linking to my previous posts or those of others as applicable.

  1. Were experiments performed blinded? This is something I hadn’t previously considered directly but my post on how it’s easy to fool yourself in science does address this. In some cases blinding your bioinformatic analysis might be possible and certainly be very helpful in making sure that you’re not ‘guiding’ your findings to a predetermined answer. The cases where this is especially important is when the analysis is directly targeted at addressing a hypothesis. In these cases a solution may be to have a colleague review the results in a blinded manner- though this may take more thought and work than would reviewing the results of a limited set of Western blots.
  2. Were basic experiments repeated? This is one place where high-throughput methodology and analysis might have a step up on ‘traditional’ science involving (for example) Western blots. Though it’s a tough fight and sometimes not done correctly, the need for replicates is well-recognized as discussed in my recent post on the subject. In studies where the point is determining patterns from high-throughput data (biomarker studies, for example) it is also quite important to see if the study has found their pattern in an independent dataset. Often cross-validation is used as a substitute for an independent dataset- but this falls short. Many biomarkers have been found not to generalize to different datasets (other patient cohorts). Examination of the pattern in at least one other independent dataset strengthens the claim of reproducibility considerably.
  3. Were all the results presented? This is an important point but can be tricky in analysis that involves many ‘discovery’ focused analyses. It is not important to present every comparison, statistical test, heatmap, or network generated during the entire arc of the analysis process. However, when addressing hypotheses (see my post on the scientific method as applied in computational biology) that are critical to the arguments presented in a study it is essential that you present your results, even where those results are confusing or partly unclear. Obviously, this needs to be undertaken through a filter to balance readability and telling a coherent story– but results that partly do not support the hypothesis are very important to present.
  4. Were there positive and negative controls? This is just incredibly central to the scientific method but is a problem in high-throughput data analysis. At the most basic level, analyzing the raw (or mostly raw) data from instruments, this is commonly performed but never reported. In a number of recent cases in my group we’ve found real problems in the data that were revealed by simply looking at these built-in controls, or by figuring out what basic comparisons could be used as controls (for example, do gene expression from biological replicates correlate with each other?). What statistical associations do you expect to see and what do you expect not to see? These checks are good to prevent fooling yourself- and if they are important they should be presented.
  5. Were reagents validated? For data analysis this should be: “Was the code used to perform the analysis validated?” I’ve not written much on this but there are several out there who make it a central point in their discussions including Titus Brown. Among his posts on this subject are here, here, and here. If your code (an extremely important reagent in a computational experiment) does not function as it should the results of your analyses will be incorrect. A great example of this is from a group that hunted down a bunch of errors in a series of high-profile cancer papers I posted about recently. The authors of those papers were NOT careful about checking that the results of their analyses were correct.
  6. Were statistical tests appropriate? There is just too much to write on this subject in relation to data analysis. There are many ways to go wrong here- inappropriate data for a test, inappropriate assumptions, inappropriate data distribution. I am not a statistician so I will not weigh in on the possibilities here. But it’s important. Really important. Important enough that if you’re not a statistician you should have a good friend/colleague who is and can provide specific advice to you about how to handle statistical analysis.
  7. New! Was multiple hypothesis correction correctly applied? This is really an addition to flag #6 above specific for high-throughput data analysis. Multiple hypothesis correction is very important to high-throughput data analysis because of the number of statistical comparisons being made. It is a way of filtering predictions or statistical relationships observed to provide more conservative estimates. Essentially it extends the question, “how likely is it that the difference I observed in one measurement is occurring by chance?” to the population-level question, “how likely is it that I would find this difference by chance if I looked at a whole bunch of measurements?”. Know it. Understand it. Use it.
  8. New! Was an appropriate background distribution used? Again, an extension to flag #6. When judging significance of findings it is very important to choose a correct background distribution for your test. An example is in proteomics analysis. If you want to know what functional groups are overrepresented in a global proteomics dataset should you choose your background to be all proteins that are coded for by the genome? No- because the set of proteins that can be measured by proteomics (in general) is highly biased to start with. So to get an appropriate idea of which functional groups are enriched you should choose the proteins actually observed in all conditions as a background.

The comment by Glenn Begely wraps up with this statement about why these problems are still present in research:

Every biologist wants and often needs to get a paper into Nature or Science or Cell, yet the scientific community fails to recognize the perverse incentive this creates.

I think this is true, but you could substitute “any peer-reviewed journal” for “Nature or Science or Cell”- the problem comes at all levels. It’s also true that these problems are particularly relevant to high-throughput data analysis because they can be less hypothesis directed and more discovery oriented, because they are generally more expensive and there’s thus more scrutiny of the results (in some cases), and due to rampant enthusiasm and overselling of potential results arising from these kinds of studies.

Illustration from Derek Roczen

The big question: Will following these rules improve reproducibility in high-throughput data analysis? The Comment talks about these being things that were present in reproducible studies (that small 10% of the papers) but does that mean that paying attention to them will improve reproducibility, especially in the case of high-throughput data analysis? There are issues that are more specific to high-throughput data (as my flags #7 and #8, above) but essentially these flags are a great starting point to evaluate the integrity of a computational study. With high-throughput methods, and their resulting papers, gaining importance all the time we need to consider these both as producers and consumers.

References

  1. Prinz, F., Schlange, T. & Asadullah, K. Nature Rev. Drug Discov. 10, 712 (2011).
  2. Begley, C. G. & Ellis, L. M. Nature 483, 531–533 (2012).

Leading a collaborative scientific paper: My tips on cat herding

Large collaborative research projects, centers, or consortia have a single goal: to be funded for another round. That’s completely cynical, but it is not so far off the truth. The point of these projects is to advance science by bringing together many different experts in many different areas to do more than what could be done in a single R01-size endeavor. If there are no project-wide collaborative papers that come out of this effort going to high-profile journals there will be nothing- or very little- to make the claim that the project was successful. Why not just fund 3-8 R01-sized project that can work in isolation and accomplish the same thing or more? So publications are important.

The second thing to understand is that there’s no such thing as a ‘group-written’ paper, in my experience. Not truly. Someone always needs to step forward and take ownership of the paper to drive things forward otherwise it’s dead in the water. Maybe it can be two people, maybe it can be more- I’ve never seen it happen. So someone needs to step forward and be chief cat herder. This is a thankless job, but if it results in a solid, collaborative manuscript it can be very satisfying. Not to mention the fact that you will (or very much SHOULD) have your name first in the author order.

Here’s my metaphor for spearheading such a monster, errrr… paper.

Imagine that you’ve gathered a painter, a sculptor who works in clay, a sculptor who works with metal, and a DJ in a room- actually in many cases they’re not even in the same room, they’re distributed around the country in their own studios. Around the room (or in their studios) you have a canvas and paint, a block of clay, a pile of metal, and a box of vinyl. Your job is to assemble a work of art that incorporates all those elements together, blends them where appropriate, and is clear about how the pieces all fit together. You have a limited time to accomplish this. Art critics will be visiting after you’re finished to evaluate your work. Go.

Here are my list of thoughts on how to approach this kind of problem.

  1. Don’t think of this as a collaborative paper. In all likelihood the actual driving of the paper will be done by one person, and that’s you. If you wait around for everyone to chime in, contribute, take ownership for their sections, you will never get anything done. If you aren’t the leader of the paper, but the leader isn’t leading it MAY be possible to just start the process and take leadership. This can be politically dangerous and really depends on the specifics of the project and collaborations, but it’s something to keep in mind. You could be a hero.
  2. Think of this as a collaborative paper. This is a collaborative effort. I realize that this is directly contradictory to my first point. However, it is very important that you don’t lose sight of the fact that you are not the expert in many areas of the paper that you have to put together. Make use of others’ expertise but try to put this in direct requests for input of well-defined portions.
  3. Have a basic understanding of each component. This is really important. Everyone has different expertise and you will not become an expert in a new area by writing a paper. Don’t try. But if there are things that you really are not familiar with that need to go into the paper brush up on them by reading (actually reading from start to finish) previous papers from the group or current review articles in the area. This will allow you to understand at least where the collaborator is coming from and what they can offer.
  4. Don’t overload collaborators with many outlines and drafts. This will only make your collaborators stop paying attention. Instead try to put out one or two outlines, with discussion (teleconference or in person) between. Also with the draft, work with individuals to get portions completed instead of doing everything in multiple rounds of drafts that are commented on by everyone.
  5. Choose a way of collaborating on writing and communicate it with contributors. If you use MS Word for drafts make sure everyone uses the “Tracking Changes” option turned on. Otherwise it’s a nightmare to figure out what parts have been changed. Part of your job will be to manually merge all these changes into a single document. This is a tremendous pain in the ass, but it allows you to evaluate all contributions and make decisions about what to include or how things should be worded. Google Docs seems to work well for producing drafts collaboratively, but at some point the draft should be moved to a single document for finalization.
  6. At the early stages include, don’t exclude. Welcome everyone’s input and suggestions. At some point it may be necessary to make hard decisions about directions of the paper and that may make people unhappy. That’s something you have to live with- but try to listen to the group about these decisions. If there are people with suggestions on more work to do (either experiments/analysis or writing) and their suggestions seem reasonable, make it clear that it’s up to them to carry through with the actual work and try to get a timeline from them for completion. If their piece is essential to the project make sure that you have a plan for extracting this from them- there’s probably a nicer way to put this, but that’s the idea.
  7. At the later stages don’t let newcomers (or others) distract from the plan. If they have really great suggestions, listen to them. If their suggestions seem to distract from the story you are telling fall back on the, “well that’s a great idea, why don’t you investigate that and we can include it if the reviewers request it”- that is, after submission and review.
  8. Have a strategy to create the story you’re going to tell. It can be very difficult to start on a paper cold, when there’s only been discussion about what should be done. A reasonable approach is to do some preliminary analysis yourself then take this to the larger group for input. Make it clear that this is only one possible path and that you’re just trying to promote discussion. Make sure you’re telling a story- this is actually what a scientific paper is about. Be flexible about what the story is. It has to be consistent with the data available- but you may choose to incorporate portions of the results and leave out others that do not help the story along. See also my post on how to write a scientific paper.
  9. Try to avoid redundant effort. Generally this isn’t an issue because everyone is an expert in different areas so the actual work shouldn’t be redundant. Sometimes data analysis needs to be defined to avoid redundancy. If there are large sections to be written (such as an Introduction) it’s better to break it into smaller bits for different people to work on and call this out in the outline or draft so people are clear on who’s doing what. Everyone can revise/comment on all sections toward the end and that’s easier to merge than two disparate documents that are trying to talk about the same thing.
  10. Navigate author order and authorship carefully. This is tremendously important for most people on the project. The critical positions to identify are first author and last author (for biology papers anyway). If you are leading the paper you should be first author, but always remember that for many journals you can specify two or even three ‘first’ authors. For this kind of paper that might be necessary. Don’t try to limit authorship too much. These kinds of papers will have lots of authors. But try to be consistent; if you accept suggestions from everyone’s groups wholesale, it can cause conflicts. Consider that one group might consider technicians who performed the work to be worthy of authorship. If you say OK to this the other groups may chime in with all their technicians, etc. Follow the rules of authorship that you feel comfortable with and believe are ethically consistent, but remember that many, many people may have made significant contributions to the paper. This can be one of the most politically treacherous portions of the paper- have fun!
  11. Find a champion. Identify a senior author who you can communicate with and who you believe will support your positions, or at least will listen to your positions. There may arise situations that require having someone with authority agreeing with you to get others to fall in line.

Finally, here’s an example of a large collaborative research paper that I’ve recently published. It didn’t turn out quite as grand as I’d hoped (what paper does?) but it’s still a nice example of integrating the input of many different groups. I am currently working on (leading) at least three more such papers that are in various stages of being completed.

McDermottJE, ShankaranH, EisfeldAJ, BelisleSE, NeumanG, LiC, McWeeneyS, SabourinC, KawaokaY, Katze MG, Waters KM. (2011). Conserved host response to highly pathogenic avian influenza virus infection in human cell culture, mouse and macaque model systems. BMC Systems Biology. 5(1):190.