What is a hypothesis?

So I got this comment from a reviewer on one of my grants:

The use of the term “hypothesis” throughout this application is confusing. In research, hypotheses pertain to phenomena that can be empirically observed. Observation can then validate or refute a hypothesis. The hypotheses in this application pertain to models not to actual phenomena. Of course the PI may hypothesize that his models will work, but that is not hypothesis-driven research.

There are a lot of things I can say about this statement, which really rankles. As a thought experiment replace all occurrences of the word “model” with “Western blot” in the above comment. Does the comment still hold?

At this point it may be informative to get some definitions, keeping in mind that the _working_ definitions in science can have somewhat different connotations.

From Google:

Hypothesis: a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation.

This definition has nothing about empirical observation- and I would argue that this definition would be fairly widely accepted in biological sciences research, though the underpinnings of the reviewer’s comment- empirically observed phenomena- probably are in the minds of many biologists.

So then, also from Google:

Empirical: based on, concerned with, or verifiable by observation or experience rather than theory or pure logic.

Here’s where the real meat of the discussion is. Empirical evidence is based on observation or experience as opposed to being based on theory or pure logic. It’s important to understand that the “models” being referred to in my grant are machine learning statistical models that have been derived from sequence data (that is, observation).

I would argue that including some theory or logic in a model that’s based on observation is exactly what science is about- this is what the basis of a hypothesis IS. All the hypotheses considered in my proposal were based on empirical observation, filtered through some form of logic/theory (if X is true then it’s reasonable to conclude Y), and would be tested by returning to empirical observations (either of protein sequences or experimentation at the actual lab bench).

I believe that the reviewer was confused by the use of statistics, which is a largely empirical endeavor (based on the observation of data- though filtered through theory) and computation, which they do not see as empirical. Back to my original thought experiment, there’s a lot of assumptions, theory, and logic that goes into interpretation of Western blot – or any other common lab experiment. However, this does not mean that we can’t use them to formulate further hypotheses.

This debate is really fundamental to my scientific identity. I am a biologist who uses computers (algorithms, visualization, statistics, machine learning and more) to do biology. If the reviewer is correct, then I’m pretty much out of a job I guess. Or I have to settle back on “data analyst” as a job title (which is certainly a good part of my job, but not the core of it).

So I’d appreciate feedback and discussion on this. I’m interested to hear what other people think about this point.

Proposal gambit – Betting the ranch

Last spring I posted about a proposal I’d put in where I’d published the key piece of preliminary data in F1000 Research, a journal that offers post-publication peer review.

The idea was that I could get my paper published (it’s available here) and accessible to reviewers prior to submission of my grant. It could then be peer-reviewed and I could address the revisions after that. This strategy was driven by the lag time between proposal submission and review for NIH, which is about 4 months. Also, it used to be possible to include papers that hadn’t been formally accepted by a journal as an appendix to NIH grants. This hasn’t been possible for some time now. But I figured this might be a pretty good way to get preliminary data out to the grant reviewers in a published form with quick turnaround. Or at least that you could utilize that lag time to also function as review time for your paper.

I was able to get my paper submitted to F100 Research and obtained a DOI and URL that I could include as a citation in my grant. Details here.

The review for the grant was completed in early June of this year and the results were not what I had hoped- the grant wasn’t even scored, despite being totally awesome (of course, right?). But for this post I’ll focus on the parts that are pertinent to the “gambit”- the use of post-publication peer review as preliminary data.

The results here were mostly unencouraging RE post-publication peer review being used this way, which was disappointing. But let me briefly describe the timeline, which is important to understand a large caveat about the results.

I received first-round reviews from two reviewers in a blindingly fast 10 and 16 days after initial submission. Both were encouraging, but had some substantial (and substantially helpful) requests. You can read them here and here. It took me longer than it could have to address these completely – though I did some new analysis and added additional explanation to several important points. I then resubmitted on around May 12th or so. However, due to some kind of issue the revised version wasn’t made available by F1000 Research until May 29th. Given that the NIH review panel met in the first week of June it is likely that the reviewers didn’t see the revised (and much improved version). The reviewers then got back final comments in early June (again- blindingly fast). You can read those here and here. The paper was accepted/approved/indexed in mid-June.

The grant had comments from three reviewers and each had something to say about the paper as preliminary data.

The first reviewer had the most negative comments.

It is not appropriate to point reviewers to a paper in order to save space in the proposal.

Alone this comment is pretty odd and makes me think that the reviewer was annoyed by the approach. So I can’t refer to a paper as preliminary data? On the face of it this is absolutely ridiculous. Science, and the accumulation of scientific knowledge just doesn’t work in a way that allows you to include all your preliminary data completely (as well as your research approach and everything else) in the space of 12 page grant. However, their further comments (which directly follow this one) shed some light on their thinking.

The PILGram approach should have been described in sufficient detail in the proposal to allow us to adequately assess it. The space currently used to lecture us on generative models could have been better used to actually provide details about the methods being developed.

So reading between the (somewhat grumpy) lines I think they mean to say that I should have done a better job of presenting some important details in the text itself. But my guess is that the first reviewer was not thrilled by the prospect of using a post-publication peer reviewed paper as preliminary data for the grant. Not thrilled.

  • Reviewer 1: Thumbs down.

Second reviewer’s comment.

The investigators revised the proposal according to prior reviews and included further details about the method in the form of a recently ‘published’ paper (the quotes are due to the fact that the paper was submitted to a journal that accepts and posts submissions even after peer review – F1000 Research). The public reviewers’ comments on the paper itself raise several concerns with the method proposed and whether it actually works sufficiently well.

This comment, unfortunately, is likely due to the timeline I presented above. I think they saw the first version of the paper, read the paper comments, and figured that there were holes in the whole approach. If my revisions had been available it seems like there still would have been issues, unless I had already gotten the final approval for the paper.

  • Reviewer 2: Thumbs down- although maybe not with the annoyed thrusting motions that the first reviewer was presumably making.

Finally, the third reviewer (contrary to scientific lore) was the most gentle.

A recent publication is suggested by the PI as a source of details, but there aren‟t many in that manuscript either.

I’m a little puzzled about this since the paper is pretty comprehensive. But maybe this is an effect of reading the first version, not the final version. So I would call this neutral on the approach.

  • Reviewer 3: No decision.

Summary

The takeaway from this gambit is mixed.

I think if it had been executed better (by me) I could have gotten the final approval through by the time the grant reviewers were looking at it and then a lot of the hesitation and negative feelings would have gone away. Of course, this would be dependent on having paper reviewers that were as quick as those that I got- which certainly isn’t a sure thing.

I think that the views of biologists on preprints, post-publication review, and other ‘alternative’ publishing options are changing. Hopefully more biologist will start using these methods- because, frankly, in a lot of cases they make a lot more sense than the traditional closed-access, non-transparent peer review processes.

However, the field can be slow to change. I will probably try this, or something like this, again. Honestly, what do I have to lose exactly? Overall, this was a positive experience and one where I believe I was able to make a contribution to science. I just hope my next grant is a better substrate for this kind of experiment.

Other posts on this process:

 

 

Best Practices

BestPracticesComic

This comic is inspired, not by real interactions I’ve had with developers (no developer has ever volunteered to get within 20 paces of my code), but rather by discussions online on the importance of ‘proper’ coding. Here’s a comic from xkcd which has a different point:

My reaction to this– as a bench biology-trained computational biologist who has never taken a computer programming class– is “who cares?” If it works, really, who cares?

Sure, there are very good reasons for standard programming practices, standards, and clean, efficient code. Even in bioinformatics (or especially so). These would be almost exclusively applicable to approaches that you’ve had quite a bit of experience with working out the bugs, figuring out how it works with the underlying data, making sure that it’s actually useful in terms of the biology. This is at least 75% of my job. I try and discard many approaches for any particular problem I’m working on. It’s important to have a record of these attempts, but this code doesn’t have to be clean or efficient. There are exceptions to this, such as when you have code that takes a loooong time to run even once, you probably want to make that as efficient as you can. The vast majority of the things I do- even with large amounts of data- I can determine if they’re working or not in a reasonable amount of time using inefficient code (anything written in R, for example).

The other part, where good coding is important, is when you want the code to be usable by other people. This is an incredibly important part of computational biology and I’m not trying to downplay its importance here. This is when you’re relatively certain that the code will be looked at and/or used by other people in your own group and when you publish or release the code to a wider audience.

For further reading into this subject here’s a post from Byte Size Biology that covers some great ideas for writing *research* code. And here is some dissenting opinion from Living in and Ivory Basement touting the importance of good programming practices (note- I don’t disagree, but do believe that at least 75% of the coding I do should not have such a high bar- not necessary and I’d never get anything done) . Finally, here are some of my thoughts on how coding really follows the scientific method.

Another word about balance

[4/17/2015 updated: A reader pointed out that my formulae for specificity and accuracy contained errors. It turns out that both measures were being calculated correctly, just a typing error on the blog. I’ve corrected them below.] 

TL;DR summary

Evaluating a binary classifier based on an artificial balance of positive examples and negative examples (which is commonly done in this field) can cause underestimation of method accuracy but vast overestimation of the positive predictive value (PPV) of the method. Since PPV is likely the only metric that really matters to a particular kind of important end user, the biologist wanting to find a couple of novel positive examples in the lab based on your prediction, this is a potentially very big problem with reporting performance.

The long version

Previously I wrote a post about the importance of having a naturally balanced set of positive and negative examples when evaluating the performance of a binary classifier produced by machine learning methods. I’ve continued to think about this problem and realized that I didn’t have a very good handle on what kinds of effects artificially balanced sets would have on performance. Though the metrics I’m using are very simple I felt that it would be worthwhile to demonstrate the effects so did a simple simulation.

  1. I produced random prediction sets with a set portion of positives predicted correctly (85%) and a set portion of negatives predicted correctly (95%).
  2. The ‘naturally’ occurring ratio of positive to negative examples could be varied but for the figures below I used 1:100.
  3. I varied the ratio of positive to negative examples used to estimate performance and
  4. Calculated several commonly used measures of performance:
    1. Accuracy (TP+FP TN)/(TP+FP+TN+FN); that is, the percentage of positive or negative predictions that are correct relative to the total number of predictions)
    2. Specificity (TN/(TN+FN)(TN+FP); that is, the percentage of negative predictions that are correct relative to the total number of negative examples)
    3. AUC (area under the receiver operating characteristic curve; a summary metric that is commonly used in classification to evaluate performance)
    4. Positive predictive value (TP/(TP+FP); that is, out of all positive predictions what percentage are correct)
    5. False discovery rate (FDR; 1-PPV; percentage of positive predictions that are wrong)
  5. Repeated these calculations with 20 different random prediction sets
  6. Plotted the results as box plots, which summarize the mean (dark line in the middle), standard deviation (the box), and the lines (whiskers) showing 1.5 times the interquartile range from the box- dots above or below are outside this range.

The results are not surprising but do demonstrate the pitfalls of using artificially balanced data sets. Keep in mind that there are many publications that limit their training and evaluation datasets to a 1:1 ratio of positive to negative examples.

Accuracy

Accuracy estimates are actually worse than they should be for the artificial splits because fewer of the negative results are being considered.

Accuracy estimates are actually worse than they should be for the artificial splits because fewer of the negative results are being considered.

Specificity

Specificity stays largely the same and is a good estimate because it isn't affected by the ratio of negatives to positive examples. Sensitivity (the same measure but for positive examples) also doesn't change for the same reason.

Specificity stays largely the same and is a good estimate because it isn’t affected by the ratio of negatives to positive examples. Sensitivity (the same measure but for positive examples) also doesn’t change for the same reason.

AUC

Happily the AUC doesn't actually change that much- mostly it's just much more variable with smaller ratios of negatives to positives. So an AUC from a 1:1 split should be considered to be in the right ballpark, but maybe off from the real value by a bit.

Happily the AUC doesn’t actually change that much- mostly it’s just much more variable with smaller ratios of negatives to positives. So an AUC from a 1:1 split should be considered to be in the right ballpark, but maybe off from the real value by a bit.

Positive predictive value (PPV)

Aaaand there's where things go to hell.

Aaaand there’s where things go to hell.

False discovery rate (FDR)

Same thing here. The FDR is extremely high (>90%) in the real dataset, but the artificial balanced sets vastly underestimate it.

Same thing here. The FDR is extremely high (>90%) in the real dataset, but the artificial balanced sets vastly underestimate it.

 

 

Why is this a problem?

The last two plots, PPV and FDR, are where the real trouble is. The problem is that the artificial splits vastly overestimate PPV and underestimate FDR (note that the Y axis scale on these plots runs from 0 to close to 1). Why is this important? This is important because, in general, PPV is what an end user is likely to be concerned about. I’m thinking of the end user that wants to use your great new method for predicting that proteins are members of some very important functional class. They will then apply your method to their own examples (say their newly sequenced bacteria) and rank the positive predictions. They could care less about the negative predictions because that’s not what they’re interested in. So they take the top few predictions to the lab (they can’t afford to do 100s, only the best few, say 5, predictions) and experimentally validate them.

If your method’s PPV is actually 95% it’s fairly likely that all 5 of their predictions will pan out (it’s NEVER really as likely as that due to all kinds of factors, but for sake of argument) making them very happy and allowing the poor grad student who’s project it is to actually graduate.

However, the actual PPV from the example above is about 5%. This means that the poor grad student who slaves for weeks over experiments to validate at least ONE of your stinking predictions will probably end up empty-handed for their efforts and will have to spend another 3 years struggling to get their project to the point of graduation.

Given a large enough ratio in the real dataset (e.g. protein-protein interactions where the number of positive examples is somewhere around 50-100k in human but the number of negatives is somewhere around 4.5x10e8, a ratio of ~1:10000) the real PPV can fall to essentially 0, whereas the artificially estimated PPV can stay very high.

So, don’t be that bioinformatician who publishes the paper with performance results based on a vastly artificial balance of positive versus negative examples that ruins some poor graduate student’s life down the road.

 

Big Data Showdown

One of the toughest parts of collaborative science is communication across disciplines. I’ve had many (generally initial) conversations with bench biologists, clinicians, and sometimes others that go approximately like:

“So, tell me what you can do with my data.”

“OK- tell me what questions you’re asking.”

“Um,.. that kinda depends on what you can do with it.”

“Well, that kinda depends on what you’re interested in…”

And this continues.

But the great part- the part about it that I really love- is that given two interested parties you’ll sometimes work to a point of mutual understanding, figuring out the borders and potential of each other’s skills and knowledge. And you generally work out a way of communicating that suits both sides and (mostly) works to get the job done. This is really when you start to hit the point of synergistic collaboration- and also, sadly, usually about the time you run out of funding to do the research.
BigDataShowdown_v1

Asked and answered: Computational Biology Contribution?

So someone asked me this question today: “as a computational biologist,how can you be useful to the world?”. OK so they didn’t ask me, per se, they got to my blog by typing the question into a search engine and I saw this on my WordPress stats page (see bottom of this post). Which made me think- “I don’t know what page they were directed to- but I know I haven’t addressed that specific question before on my blog”. So here’s a quick answer, especially relevant since I’ve been talking with CS people about this at the ACM-BCB meeting the last few days.

As a computational biologist how can you be useful to the world?

  1. Choose your questions carefully. Make sure that the algorithm you’re developing, the software that you’re designing, the fundamental hypothesis that you’re researching is actually one that people (see collaborators, below) are interested in and see the value in. Identify the gaps in the biology that you can address. Don’t build new software for the sake of building new software- generally people (see collaborators) don’t care about a different way to do the same thing, even if it’s moderately better than the old way.
  2. Collaborate with biologists, clinicians, public health experts, etc. Go to the people who have the problems. What they can offer you is focus on important problems that will improve the impact of your research (you want NIH funding? You HAVE to have impact and probably collaborators). What you can give them is a solution to a problem that they are actually facing. Approach the relationship with care though since this is where the language barrier between fields can be very difficult (a forthcoming post from me in the near future on this). Make sure that you interact with these collaborators during the process- that way you don’t go off and do something completely different than what they had in their heads.
  3. In research be rigorous. The last thing that anyone in any discipline needs is a study that has not considered validation, generalizability, statistical significance, or having a gold-standard or reasonable facsimile thereof to compare to. Consider collaborating with a statistician to at least run your ideas by- they can be very helpful, or a senior computational biologist mentor.
  4. In software development be thoughtful. Consider robustness of your code- have you tested it extensively? How will average users (see collaborators, above) be able to get their data into it? How will average users be able to interpret the results of your methods? Put effort into working with those collaborators to define the user interface and user experience. They don’t (to a point) care about execution times as long as it finishes in a reasonable amount of time (have your software estimate time to completion and display it) and it gives good results. They do care if they can’t use it (or rather they completely don’t care and will stop working with you on the spot).
  5. Sometimes people don’t know what they need until they see it. This is a tip for at least 10th level computational biologists (to make a D&D analogy). This was a tenet of Steve Jobs of Apple and I believe it to be true. Sometimes, someone with passion and skill has to break new ground and do something that no one is asking them to do but that they will LOVE and won’t know how they lived without it. IT IS HIGHLY LIKELY THAT THIS IS NOT YOU. This is a pretty sure route to madness, wearing a tin hat, and spouting “you fools! you’ll never understand my GENIUS”- keep that in mind.
  6. For a computational biologist with some experience make sure that you pass it along. Attend conferences where there are likely to be younger faculty/staff members, students, and post-docs. Comment on their posters and engage. When possible suggest or make connections with collaborators (see above) for them. Question them closely on the four points above- just asking the questions may be an effective way of conveying importance. Organize sessions at these conferences. In your own institution be an accessible and engaged mentor. This has the most potential to increase your impact on the world. It’s true.

Next week: “pathogens found in confectionary” (OK- probably not going to get to that one, but interesting anyway)

People be searchin'

People be searchin’

Human Protein Tweetbots

I came up with an interesting idea today based on someone’s joke at a meeting. I’m paraphrasing here but the joke was “let’s just get all the proteins Facebook accounts and let their graph algorithms sort everything out”. Which isn’t as nutty as it sounds- at least using some of FBs algorithms, if they’re available, to figure out interesting biology from

The Cellular Social Network Can be a tough place

The Cellular Social Network Can be a tough place

protein networks. But it got me thinking about social media and computational biology.

Scientists use Twitter for a lot of different purposes. One of these is to keep abreast of the scientific literature. This is generally done by following other scientists in disciplines that are relevant to your work, journals and preprint archives that post their newest papers as they’re published, and other aggregators like professional societies and special interest groups.

Many biologists have broad interests, but even journals for your sub-sub-sub field publish papers that you might not be that interested in. Many biologists also have specific genes, proteins, complexes, or pathways that are of interest to them.

My thought was simple. Spawn a bunch of Tweetbots (each with their own Twitter account) that would be tied to a specific gene/protein, complex, or pathway. These Tweetbots would search PubMed (and possibly other sources) and post links to ‘relevant’ new publications – probably simply containing the name of the protein or an alias. I think that you could probably set some kind of popularity bar for actually having a Tweetbot (e.g. BRCA1 would certainly have one, but a protein like SLC10A4 might not).

Sure there are other ways you can do this- for example you can set up automatic notifications on PubMed that email you new publications with keywords- and there might already be specific apps that try to do something like this- but they’re not Twitter. One potential roadblock would be the process of opening so many Twitter accounts- which I’m thinking you can’t do automatically (but don’t know that for sure). To make it useful you’d probably have to start out with at least 1000 of them, maybe more, but wouldn’t need to do all proteins (!) or even all ~30K human proteins.

I’m interested in getting feedback about this idea. I’m not likely to implement it myself (though could probably)- but would other biologists see this as useful? Interesting? Could you see any other applications or twists to make it better?

 

The false dichotomy of multiple hypothesis testing

[Disclaimer: I’m not a statistician, but I do play one at work from time to time. If I’ve gotten something wrong here please point it out to me. This is an evolving thought process for me that’s part of the larger picture of what the scientific method does and doesn’t mean- not the definitive truth about multiple hypothesis testing.]

There’s a division in research between hypothesis-driven and discovery-driven endeavors. In hypothesis-driven research you start out with a model of what’s going on (this can be explicitly stated or just the amalgamation of what’s known about the system you’re studying) and then design an experiment to test that hypothesis (see my discussions on the scientific method here and here). In discovery-driven research you start out with more general questions (that can easily be stated as hypotheses, but often aren’t) and generate larger amounts of data, then search the data for relationships using statistical methods (or other discovery-based methods).

The problem with analysis of large amounts of data is that when you’re applying a statistical test to a dataset you are actually testing many, many hypotheses at once. This means that your level of surprise at finding something that you call significant (arbitrarily but traditionally a p-value of less than 0.05) may be inflated by the fact that you’re looking a whole bunch of times (thus increasing the odds that you’ll observe SOMETHING just on random chance alone- see this excellent xkcd cartoon for an example, see below since I’ll refer to this example). So you need to apply some kind of multiple hypothesis correction to your statistical results to reduce the chances that you’ll fool yourself into thinking that you’ve got something real when actually you’ve just got something random. In the XKCD example below a multiple hypothesis correction using Bonferroni’s method (one of the simplest and most conservative corrections) would suggest that the threshold for significance should be moved to 0.05/20=0.0025 – since 20 different tests were performed.

Here’s where the problem of a false dichotomy occurs. Many researchers who analyze large amounts of data believe that utilizing a hypothesis-based approach mitigates the effect of multiple hypothesis testing on their results. That is, they believe that they can focus their investigation of the data to a subset constrained by a model/hypothesis and thus reduce the effect that multiple hypothesis testing has on their analysis. Instead of looking at 10,000 proteins in a study they now look at only the 25 proteins that are thought to be present in a particular pathway of interest (where the pathway here represent the model based on existing knowledge). This is like saying, “we believe that jelly beans in the blue green color range cause acne” and then drawing your significance threshold at 0.05/4=0.0125 – since there are ~4 jelly beans tested that are in the blue-green color range (not sure if ‘lilac’ counts or not- that would make 5). All well and good EXCEPT for the fact that the actual chance of detecting something by random chance HASN’T changed. In large scale data analysis (transcriptome analysis, e.g.) you’ve still MEASURED everything else. You’ve just chosen to limit your investigation to a smaller subset and then can ‘go easy’ on your multiple hypothesis correction.

The counter-argument that might be made to this point is that by doing this you’re testing a specific hypothesis, one that you believe to be true and may be supported by existing data . This is a reasonable point in one sense- it may lend credence to your finding that there is existing information supporting your result. But on the other hand it doesn’t change the fact that you still could be finding more things by chance than you realize because you simply hadn’t looked at the rest of your data. It turns out that this is true not just of analysis of big data, but also of some kinds of traditional experiments aimed at testing individual – associative- hypotheses. The difference there is that it is technically unfeasible to actually test a large amount of the background cases (generally limited to one or two negative controls). Also a mechanistic hypothesis (as opposed to an associative one) is based on intervention, which tells you something different and so is not (as) subject to these considerations.

Imagine that you’ve dropped your car keys in the street and you don’t know what they look like (maybe borrowing a friend’s car). You’re pretty sure you dropped them in front of the coffee shop on a block with 7 other shops on it- but you did walk the length of the block before you noticed the keys were gone. You walk directly back to look in front of the coffee shop and find a set of keys. Great, you’re done. You found your keys, right? What if you looked in front of the other stores and found other sets of keys. You didn’t look- but that doesn’t make it less likely that you’re wrong about these keys (your existing knowledge/model/hypothesis “I dropped them in front of the coffee shop” could easily be wrong).

XKCD: significant

Spaghetti plots? Sashimi? Food-themed Plots for Science!

For whatever reason bioinformaticians and other plot makers like to name (or re-name) plotting methods with food themes. Just saw this paper for “Sashimi plots” to represent alternative isoform expression from RNA-seq data.

Sashimi plots: Quantitative visualization of alternative isoform expression from RNA-seq data

That prompted me to post this from my Tumblr http://scieastereggs.tumblr.com (growing collection of funny bits in scientific publications):

————————————————————————————————

Spaghetti plots? Lasagne? OK then I can do rigatoni plots

This possibly somewhat satirical paper makes the case for “lasagne plots”, following on the spaghetti plots that are popular in some fields for representing longitudinal data. Lasagne plots are presented as an alternative for large datasets though the authors state: ”To remain consistent with the Italian cuisine-themed spaghetti plot, we refer to heatmaps as ‘lasagna plots.” The remainder of the paper is a pretty straight-on discussion and demonstration of why and when these plots are better than the spaghetti plots.

Lasagna plots: A saucy alternative to spaghetti plots

Bruce J. Swihart, Brian Caffo, Bryan D. James, Matthew Strand, Brian S. Schwartz, Naresh M. Punjabi

Interestingly, a recent paper reimagines heatmaps as “quilt” plots (though less satirically so). This opens whole new doors in the thematic renaming of methods for plotting data.

(h/t @leonidkruglyak)

But, in keeping with the Italian cuisine-themed spaghetti and lasagne plots: Now introducing Rigatoni plots!

(no pasta was harmed in the making of this plot. Well, OK. It was harmed a little)

Need to show outliers? Tasty, tasty outliers? No problem! (thanks @Lewis_Lab)

(capers. They’re capers)

via Spaghetti plots? Lasagne? OK then I can do rigatoni plots.