Friday, July 22, 2016

Having my cake and eating it too?

Several years ago I blogged about some of the challenges around doing science in a field with emerging methodological standards.  Today, a person going by the handle "Student" posted a set of pointed questions to this post, which I am choosing to respond to here as a new post rather than burying them in the comments on the previous post. Here are the comments:

Dr. Poldrack has been at the forefront of advocating for increased rigor and reproducibility in neuroimaging and cognitive neuroscience. This paper provides many useful pieces of advice concerning the reporting of fMRI studies, and my comments are related to this paper and to other papers published by Dr. Poldrack. One of the sections in this paper deals specifically with the reporting of methods and associated parameters related to the control of type I error across multiple tests. In this section, Dr. Poldrack and colleagues write that "When cluster-based inference is used, this should be clearly noted and both the threshold used to create the clusters and the threshold for cluster size should be reported". I strongly agree with this sentiment, but find it frustrating that in later papers, Dr. Poldrack seemingly disregards his own advice with regard to the reporting of extent thresholds, opting to report only that data were cluster-corrected at P<0.05 (e.g. http://cercor.oxfordjournals.org/content/20/3/524.long, http://cercor.oxfordjournals.org/cgi/content/abstract/18/8/1923, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876211/). In another paper (http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19915091/), the methods report that "Z (Gaussianised T ) statistic images were thresholded using cluster-corrected statistics with a height threshold of Z > 2.3 (unless otherwise noted) and a cluster probability threshold of P < 0.05, whole- brain corrected using the theory of Gaussian random fields", although every figure presented in the paper notes that the statistical maps shown were thresholded at Z>1.96, P<0.05, corrected. This last instance is particularly confusing, and borders on being misleading. While these are arguably minor omissions, I find it odd that I am thus far unable to find a paper where Dr. Poldrack actually follows his own advice here.  
In another opinion paper regarding fMRI analyses and reporting (http://www.ncbi.nlm.nih.gov/pubmed/21856431), Dr. Poldrack states “Some simple methodological improvements could make a big difference. First, the field needs to agree that inference based on uncorrected statistical results is not acceptable (cf. Bennett et al., 2009). Many researchers have digested this important fact, but it is still common to see results presented at thresholds such as uncorrected p<.005. Because such uncorrected thresholds do not adapt to the data (e.g., the number of voxels tests or their spatial smoothness), they are certain to be invalid in almost every situation (potentially being either overly liberal or overly conservative).” This is a good point, but given the fact that Dr. Poldrack has published papers in high impact journals that rely heavily on inferences from data using uncorrected thresholds (e.g. http://www.ncbi.nlm.nih.gov/pubmed/16157284), and does not appear to have issued any statements to the journals regarding their validity, one wonders whether Dr. Poldrack wants to have his cake and eat it too, so to say. A similar point can be made regarding Dr. Poldrack’s attitude regarding the use of small volume correction. In this paper, he states “Second, I have become increasingly concerned about the use of “small volume corrections” to address the multiple testing problem. The use of a priori masks to constrain statistical testing is perfectly legitimate, but one often gets the feeling that the masks used for small volume correction were chosen after seeing the initial results (perhaps after a whole-brain corrected analysis was not significant). In such a case, any inferences based on these corrections are circular and the statistics are useless”. While this is also true, one wonders whether Dr. Poldrack only trusts his group to use this tool correctly, since it is frequently employed in his papers. 
In a third opinion paper (http://www.ncbi.nlm.nih.gov/pubmed/20571517), Dr. Poldrack discusses the problem of circularity in fMRI analyses. While this is also an important topic, Dr. Poldrack’s group has also published papers using circular analyses (e.g. http://www.jneurosci.org/content/27/14/3743.full.pdf, http://www.jneurosci.org/content/26/9/2424, http://www.ncbi.nlm.nih.gov/pubmed/17255512). 
I would like to note that the reason for this comment is not to malign Dr. Poldrack or his research, but rather to attempt to clarify Dr. Poldrack’s opinion of how others should view his previous research when it fails to meet the rigorous standards that he persistently endorses. I am very much in agreement with Dr. Poldrack that rigorous methodology and transparency are important foundations for building a strong science. As a graduate student, it is frustrating to see high-profile scientists such as Dr. Poldrack call for increased methodological rigor by new researchers (typically while, rightfully, labeling work that does not meet methodological standards as being unreliable) when they (1) have benefited (and arguably continue to benefit) from the relatively lower barriers to entry that come from having entered a research field before the emergence of a rigid methodological framework (i.e. in having Neuron/PNAS/Science papers on their CV that would not be published in a low-tier journal today due to their methodological problems) , and (2) not applying the same level of criticism or skepticism to their own previous work as they do to emerging work when it does not meet current standards of rigor or transparency. I would like to know what Dr. Poldrack’s opinions are on these issues. I greatly appreciate any time and/or effort spent reading and/or replying to this comment. 

I appreciate these comments, and in fact I have been struggling with exactly these same issues myself, and my realizations about the shortcomings of our past approaches to fMRI analysis have shaken me deeply. Student is exactly right that I have been a coauthor on papers using methods or reporting standards that I now publicly claim to be inappropriate. S/he is also right that my career has benefited substantially from papers published in high profile journals prior using these methods that I now claim to inappropriate.  I'm not going to either defend or denounce the specific papers that the commentator mentions.  I am in agreement that some of my papers in the past used methods or standards that we would now find problematic, but I am actually heartened by that: If we were still satisfied with the same methods that we had been using 15 years ago, then that would suggest that our science had not progressed very far.  Some of those results have been replicated (at least conceptually), which is also heartening, but that's not really a defense.

I also appreciate Student's frustration with the fact that someone like myself can become prominent doing studies that are seemingly lacking according to today's standards, but then criticize the field for doing the same thing.  But at the same time I would ask: Is there a better alternative?  Would you rather that I defended those older techniques just because they were the basis for my career?  Should I lose my position in the field because I followed what we thought were best practices at the time but which turned out to be flawed? Alternatively, should I spend my entire career re-analyzing my old datasets to make sure that my previous claims withstand every new methodological development?  My answer to these questions has been to try to use the best methods I can, and to to be as open and transparent as possible.  Here I'd like to outline a few of the ways in which we have tried to do better.

First, I would note that if someone wishes to look back at the data from our previous studies and reanalyze them, almost all of them are available openly through openfmri.org, and in fact some of them have been the basis for previous analyses of reproducibility.  I and my lab have also spend a good deal of time and effort advocating for and supporting data sharing by other labs, because we think that ultimately this is one of the best ways to address questions about reproducibility (as I discussed in the recent piece by Greg Miller in Science).

Second, we have done our best to weed out questionable research practices and p-hacking.  I have become increasingly convinced regarding the utility of pre-registration, and I am now committed to pre-registering every new study that our lab does (starting with our first registration committed this week).  We are also moving towards the standard use of discovery and validation samples for all of our future studies, to ensure that any results we report are replicable. This is challenging due to the cost of fMRI studies, and it means that we will probably do less science, but that's part of the bargain.

Third, we have done our best to share everything.  For example, in the MyConnectome study, we shared the entire raw dataset, as well as putting an immense amount of working into sharing a reproducible analysis workflow.  Similarly, we now put all of our analysis code online upon publication, if not earlier.  

None of this is a guarantee, and I'm almost certain that in 20 years, either a very gray (and probably much more crotchety) version of myself or someone else will come along and tell us why the analyses were we doing in 2016 were wrong in some way that seems completely obvious in hindsight.  That's not something that I will get defensive about because it means that we are progressing as a science.  But it also doesn't mean that we weren't justified to do what we are doing now, trying to follow the best practices that we know how.  





Saturday, May 21, 2016

Scam journals will literally publish crap

In the last couple of years, researchers have started to experience an onslaught of invitations to attend scam conferences and submit papers to scam journals.  Many of these seem to emanate from the OMICS group of Henderson, NV and its various subsidiaries.  A couple of months ago I decided to start trolling these scammers, just to see if I could get a reaction.  After sending many of these, I finally got a response yesterday, which speaks to the complete lack of quality of these journals.  

This was the solicitation:
On May 20, 2016, at 12:55 AM, Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com> wrote: 
Dear Dr. Russell A. Poldrack,Greetings from the Journal of Abnormal and Behavioural Psychology
Journal of Abnormal and Behavioural Psychology is successfully publishing quality articles with the support of eminent scientists like you.
We have chosen selective scientists who have contributed excellent work, Thus I kindly request you to contribute a (Research, Review, Mini Review, Short commentary) or any type of article.
The Journal is indexed in with EBSCO (A-Z), Google Scholar, SHERPA-Romeo, Open J-gate, Journal Seek, Electronic Journals Library, Academic Keys, Safety Lit and many more reputed indexing databases.
 
We publish your manuscript within seven days of Acceptance. For your Excellent Research work we are offering huge discount in the publishing fee (70%). So, we will charge you only 300 USD. This huge offer we are giving in this month only. 
...
With kind regards
Sincerely,
Joyce V. Andria

I had previously received exactly this same solicitation about a month ago, to which I had responded like this:
Dear Ms Andria, 
Thanks for your message.  I just spent three minutes reading and thinking about your email.  My rate for commercial consulting is $500/hour.  Can you please remit your payment of $25 to me at the address below?  I’m sure you can understand that the messages from your organization take valuable time away from scientists, and that you would agree that it’s only fair to renumerate us for this time.
I look forward to receiving your payment promptly.  If you do remit within 30 days I will be forced to send this invoice out for collection.
Sincerely,
Russ Poldrack
I got no response to that message.  So when I received the new message, I decided to step up my troll-fu:
Dear Ms. Andria,
Many thanks for your message soliciting a (Research, Review, Mini Review, Short commentary) or any type of article for your journal. I have a paper that I would like to submit but I am not sure what kind of article it qualifies as. The title is "Tracking the gut microbiome". The paper does not include any text; it is composed entirely of photos of my bowel movements taken every morning for one year. Please let me know if your journal has the capability to publish such a paper; I have found that many other journals are not interested.
Sincerely,
Russell Poldrack
Within 12 hours, I had a response:
From: Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com>
Subject: RE: Appreciated your Excellent Research work
Date: May 20, 2016 at 9:47:28 PM PDT
To: "'Russell Alan Poldrack'" <russpold@stanford.edu>
Dear Dr. Russell A. Poldrack,

Greetings from the Journal of Abnormal and Behavioural Psychology

Thank you for your reply.

I hereby inform you that your article entitled: “Tracking the gut microbiome” is an image type article.

We are happy to know that you want to publish your manuscript with us.

We are waiting for your  earliest submission.

We want to introduce your research work in this month to our Journal. We will be honored to be a part of your scientific journey.

Kindly submit your article on before 26th may, 2016.


Awaiting your response.,

With kind regards
Sincerely,
Anna Watson
Journal Coordinator
Journal of Advances in Automobile Engineering
There you have it: These journals will literally publish complete crap. I hope the rest of you will join me in trolling these parasites - post your trolls and any results in the comments.

Friday, May 20, 2016

Advice for learning to code from scratch

I met this week with a psychology student who was interested in learning to code but had absolutely no experience.  I personally think it’s a travesty that programming is not part of the basic psychology curriculum, because doing novel and interesting research in psychology increasingly requires the ability to collect and work with large datasets and build new analysis tools, which are almost impossible without solid coding skills.  

Because it’s been a while since I learned to code (back when programs were stored on cassette tapes), I decided to ask my friends on the interwebs for some suggestions.  I got some really great feedback, which I thought I would synthesize for others who might be in the same boat.  

Some of the big questions that one should probably answer before getting started are:

  1. Why do you want to learn to code?  For most people who land in my office, it’s because they want to be able to analyze and wrangle data, run simulations, implement computational models, or create experiments to collect data.  
  2. How do you learn best?  I can’t stand watching videos, but some people swear by them.  Some people like to just jump in and start doing, whereas others like to learn the concepts and theories first.  Different strokes...
  3. What language should you start with?  This is the stuff of religious wars.  What’s important to realize, though, is that learning to program is not the same as learning to use a specific language.  Programming is about how to think algorithmically to solve problems; the specific language is just an expression of that thinking.  That said, languages differ in lots of ways, and some are more useful than others for particular purposes.  My feeling is that one should start by learning a first-class language, because it will be easier to learn good practices that are more general.  Your choice of a general purpose language should probably be driven by the field you are in; neuroscientists are increasingly turning to Python, whereas in genomics it seems that Java is very popular.  I personally think that Python offers a nice mix of power and usability, and it’s the language that I encourage everyone to start with.  However, if all you care about doing it performing statistical analyses, then learning R might be your first choice, whereas if you just want to build experiments for mTurk, then Javascript might be the answer.  There may be some problem for which MATLAB is the right answer, but I’m no longer sure what it is. A caveat to all of this is that if you have friends or colleagues who are programming, then you should strongly consider using whatever language they are using, because they will be your best source of help.
  4. What problem do you want to solve?  Some people can learn for the sake of learning, but I find that I need a problem in order to keep me motivated.  I would recommend thinking of a relevant problem that you want to solve and then targeting your learning towards that problem.  One good general strategy is to find a paper in your area of research interest, and try to implement their analysis. Another (suggested by Christina van Heer) is to take some data output from an experiment (e.g. in an Excel file), read it in, and compute some basic statistics.  If you don't have your own data, another alternative is to take a large open dataset (such as health data from NHANES or an openfmri dataset from openfmri.org ) and try to wrangle the data into a format that lets you ask an interesting question.
OK then, so where do you look for help in getting started?

The overwhelming favorite in my social media poll was codeacademy.  It offers interactive exercises in lots of different languages, including Python.  Another Pythonic suggestion was http://learnpythonthehardway.org/book/ which looks quite good. 

For those of you who prefer video courses, there were also a number of votes for online courses, including those from Coursera:
And  FutureLearn:
If you like video courses then these would be a good option.  

Other suggestions included:

Here are some suggested sites with various potentially useful tips




Finally, it’s also worth keeping an eye out for local Software Carpentry workshops.

If you have additional suggestions, please leave them in the comments!

Monday, April 18, 2016

How folksy is psychology? The linguistic history of cognitive ontologies

I just returned from a fabulous meeting on Rethinking the Taxonomy of Psychology, hosted by Mike Anderson, Tim Bayne, and Jackie Sullivan.  I think that in another life I must have been a philosopher, because I always have so much fun hanging out with them, and this time was no different.  In particular, the discussions at this meeting moved from simply talking about whether there is a problem with our ontology (which is old hat at this point) to specifically how we can think about using neuroscience to revise the ontology.  I was particularly excited to see all of the interest from a group of young philosophers whose work is spanning philosophy and cognitive neuroscience, who I am counting on to keep the field moving forward!

I have long made the the point that the conceptual structure of current psychology is not radically different from that of William James in the 19th century.  This seems plausible on its face if you look at some of the section headings from his 1890 “To How Many Things Can We Attend At Once?”
  • “The Varieties Of Attention.”
  • “The Improvement Of Discrimination By Practice”
  • “The Perception Of Time.”
  • “Accuracy Of Our Estimate Of Short Durations”
  • “To What Cerebral Process Is The Sense Of Time Due?”
  • “Forgetting.”
  • “The Neural Process Which Underlies Imagination”
  • “Is Perception Unconscious Inference?”
  • “How The Blind Perceive Space.”
  • “Emotion Follows Upon The Bodily Expression In The Coarser Emotions At Least.”
  • “No Special Brain-Centres For Emotion”
  • “Action After Deliberation”:
Beyond the sometimes flowery language, there are all topics that one could imagine being topics of research papers today, but for my talk I wanted to see if there was more direct evidence that the psychological ontology is less different (and thus more "folksy") than ontologies in other sciences.   To address this, I did a set of analyses that looked at the linguistic history of terms in the contemporary psychological ontology (as defined in the Cognitive Atlas) as compared to terms from contemporary biology (as enshrined in the Gene Ontology).  I started (with a bit of help from Vanessa Sochat) by examining the proportion of terms from the Cognitive Atlas that were present in James' Principles (from the full text available here).  This showed that 22.9% of the terms in our current ontology were present in James's text (some examples are: goal, deductive reasoning, effort, false memory, object perception, visual attention, task set, anxiety, mental imagery, unconscious perception, internal speech, primary memory, theory of mind, judgment).

How does this compare to biology?  To ask this, I obtained two biology textbooks published around the same time as James' Principles (T. H. Huxley's Course of Elementary Instruction in Practical Biology from 1892, and T. J. Parker's Lessons in Elementary Biology from 1893), which are both available in full text from Google Books.  In each of these books I assessed the presence of each term from the Gene Ontology, separately for each of the GO subdomains (biological processes, molecular functions, and cellular components).  Here are the results:

Huxley Parker Overlap
biological process (28,566) 0.09% (26) 0.1% (32) 20
molecular functions (10,057) 0 0 -
cellular components (3,903) 1.05% (41) 1.01% (40) 25

The percentages of overlap are much lower, perhaps not surprisingly since the number of GO terms is so much larger than the number of Cognitive Atlas terms.  But even the absolute numbers are substantially lower, and there is not one mention of any of the GO molecular functions (striking but completely unsurprising, since molecular biology would not be developed for many more decades).

These results were interesting, but it could be that they are specific to these particular books, so I generalized the analysis using the Google N-Gram corpus, which indexes the presence of individual words and phrases across more than 3 million books.  Using a python package that accesses the ngram viewer API, I estimated the presence of all of the Cognitive Atlas terms as well as randomly selected subsets of each of the GO subdomains in the English literature between 1800 and 2000; I'm planning to rerun the analysis on the full corpus using the downloaded version of the N-grams corpus, but using this API required throttling that prevented me from the full sets of GO terms.  Here are the results for the Cognitive Atlas:

It is difficult to imagine stronger evidence that the ontology of psychology is relying on pre-scientific concepts; around 80% of the one-word terms in the ontology were already in use in 1800! Compare this to the Gene Ontology terms (note that there were not enough single-word molecular function terms to get a reasonable estimate):




It's clear that the while a few of the terms in these ontologies were in use prior to the development of the biosciences, the proportion is much smaller than what one sees for psychology. In my talk, I laid out two possibilities arising from this:

  1. Psychology has special access to its ontology that obviates the need for a rejection of folk concepts
  2. Psychology is due for a conceptual revolution that will leave behind at least some of our current concepts
My guess is that the truth lies somewhere in between these.  The discussions that we had at the meeting in London provided some good ideas about how to conceptualize the kinds of changes that neuroscience might drive us to make to this ontology. Perhaps the biggest question to come out of the meeting was whether a data-driven approach can ever overcome the fact that the data were collected from experiments that are based on the current ontology. I am guessing that it can (given, e.g. the close relations between brain activity present in task and rest), but this remains one of the biggest questions to be answered.  Fortunately there seems to be lots of interest and I'm looking forward to great progress on these questions in the next few years.

Friday, February 26, 2016

Reproducibility and quantitative training in psychology

We had a great Town Hall Meeting of our department earlier this week, which was focused on issues around reproducibility, which Mike Frank has already discussed in his blog.  A number of the questions that were raised by both faculty and graduate students centered around training, and this has gotten many of us thinking about how we should update our quantitive training to address these concerns.  Currently the graduate statistics course is fairly standard, covering basic topics in probability and statistics including basic probability theory, sampling distributions, null hypothesis testing, general(ized) linear models (regression, ANOVA), and mixed models, with exercises done primarily using R.  While many of these topics remain essential for psychologists and neuroscientists, it's equally clear that there are a number of other topics that we might want to cover that are highly relevant to issues of reproducibility:

  • the statistics of reproducibility (e.g., implications of power for predictive validity; Ioannidis, 2005)
  • Bayesian estimation and inference
  • bias/variance tradeoffs and regularization
  • generalization and cross-validation
  • model-fitting and model comparison
There are also a number of topics that are clearly related to reproducibility but fall more squarely under the topic of "software hygiene":
  • data management
  • code validation and testing
  • version control
  • reproducible workflows (e.g., virtualization/containerization)
  • literate programming
I would love to hear your thoughts about what a 21st century graduate statistics course in psychology/neuroscience should cover- please leave comments below!

Wednesday, December 9, 2015

Reproducible analysis in the MyConnectome project

Today our paper describing the MyConnectome project was published in Nature Communications.  This paper is unlike any that I have ever worked on before (and probably ever will again), as it reflects analyses of data collected on myself over the course of 18 months from 2012-2014.  A lot has been said already about what the results might or might not mean.  What I want to discuss here is the journey that ultimately led me to develop a reproducible shared analysis platform for the study.

Data collection was completed in April 2014, shortly before I moved to the Bay Area, and much of that summer was spent analyzing the data.  As I got deeper into the analyses, it became clear that we needed a way to efficiently and automatically reproduce the entire set of analyses.  For example, there were a couple of times during the data analysis process when my colleagues at Wash U updated their preprocessing strategy, which meant that I had to rerun all of the statistical analyses that relied upon those preprocessed data. This ultimately led me to develop a python package (https://github.com/poldrack/myconnectome) that implements all of the statistical analyses (which use a mixture of python, R, and **cough** MATLAB) and provides a set of wrapper scripts to run them.  This package made it fairly easy for me to rerun the entire set of statistical analyses on my machine by executing a single script, and provided me with confidence that I could reproduce any of the results that went into the paper.  

The next question was: Can anyone else (including myself at some later date) reproduce the results?  I had performed the analyses on my Mac laptop using a fairly complex software stack involving many different R and python packages, using a fairly complex set of imaging, genomic, metabolomic, and behavioral data.  (The imaging and -omics data had been preprocessed on large clusters at the Texas Advanced Computing Center (TACC) and Washington University; I didn’t attempt to generalize this part of the workflow).  I started by trying to replicate the analyses on a Linux system; identifying all of the necessary dependencies was an exercise in patience, as the workflow would break at increasingly later points in the process.  Once I had the workflow running, the first analyses showed very different results between the platforms; after the panic subsided (fortunately this happened before the paper was submitted!), I tracked the problem down to the R forecast package on Linux versus Mac (code to replicate issue available here).  It turned out that the auto.arima() function (which is the workhorse of our time series analyses) returned substantially different results on Linux and Mac platforms if the Y variable was not scaled (due apparently to a bug on the Linux side), but very close results when the Y variable was scaled. Fortunately, the latest version of the forecast package (6.2) gives identical results across Linux and Mac regardless of scaling, but the experience showed just how fragile our results can be when we rely upon complex black-box analysis software, and how we shouldn't take cross-platform reproducibility for granted (see here for more on this issue in the context of MRI analysis).

Having generalized the analyses to a second platform, the next logical step was to generalize it to any machine.  After discussing the options with a number of people in the open science community, the two most popular candidates were provisioning of a virtual machine (VM) using Vagrant, or creating a Docker container.  I ultimately chose to go with the Vagrant solution, primarily because it was substantially easier; in principle you simply set up a Vagrantfile that describes all of the dependencies, and type “vagrant up”.    Of course, this “easy” solution took many hours to actually implement successfully because it required reconstruction of all of the dependencies that I had taken for granted on the other systems, but once it was done we had a system that allows anyone to recreate the full set of statistical analyses exactly on their system, which is available at https://github.com/poldrack/myconnectome-vm

A final step was to provide a straightforward way for people to view the complex set of results.  Our visualization guru, Vanessa Sochat, developed a flask application (https://github.com/vsoch/myconnectome-explore) that provides a front end to all of the HTML reports generated by the various analyses, as well as a results browser that allows one to browse the 38,363 statistical tests that were computed for project.  This browser is available locally if one installs and runs the VM, and is also accessible publicly from http://results.myconnectome.org
Dashboard for analyses

Browser for timeseries analysis results

We have released code and data with papers in the past, but this is the first paper I have ever published that attempts to include a fully reproducible snapshot of the statistical analyses.  I learned a number of lessons in the process of doing this:
  1. The development of a reproducible workflow saved me from publishing a paper with demonstrably irreproducible results, due to the OS-specific software bug mentioned above.  This in itself makes the entire process worthwhile from my standpoint.
  2. Converting a standard workflow to a fully reproducible workflow is difficult. It took many hours of work beyond the standard analyses in order to develop a working VM with all of the analyses automatically run; that doesn’t even count the time that went into developing the browser. Had I started the work within a virtual machine from the beginning, it would have been much easier, but still would require extra work beyond that needed for the basic analyses.
  3. Ensuring longevity of a working pipeline is even harder.  The week before the paper was set to published I tried a fresh install of the VM to make sure it was still working.  It wasn’t.  The problem was simple (miniconda had changed the name of its installation directory), and highlighted a significant flaw in our strategy, which was that we had not specified software versions in our VM provisioning.  I hope that we can add that in the future, but for now, we have to keep our eyes out for the disruptive effects of software updates.
I look forward to your comments and suggestions about how to better implement reproducible workflows in the future, as this is one of the major interests of our Center for Reproducible Neuroscience.

Sunday, November 1, 2015

Are good science and great storytelling compatible?

Chris Chambers has a piece in the Guardian ("Are we finally getting serious about fixing science?") discussing a recent report about reproducibility from the UK Academy of Medical Sciences, based on a meeting held earlier this year in London. A main theme of the piece is that scientists need to focus more on going good science and less on "storytelling":
Some time in 1999, as a 22 year-old fresh into an Australian PhD programme, I had my first academic paper rejected. “The results are only moderately interesting”, chided an anonymous reviewer. “The methods are solid but the findings are not very important”, said another. “We can only publish the most novel studies”, declared the editor as he frogmarched me and my boring paper to the door.
I immediately asked my supervisor where I’d gone wrong. Experiment conducted carefully? Tick. No major flaws? Tick. Filled a gap in the specialist literature? Tick. Surely it should be published even if the results were a bit dull? His answer taught me a lesson that is (sadly) important for all life scientists. “You have to build a narrative out of your results”, he said. “You’ve got to give them a story”. It was a bombshell. “But the results are the results!” I shouted over my coffee. “Shouldn’t we just let the data tell their own story?” A patient smile. “That’s just not how science works, Chris.”
He was right, of course, but perhaps it’s the way science should work. 

None of us in the reproducibility community would dispute that the overselling of results in service of high-profile publications is problematic, and I doubt that Chambers really believes that our papers should just be data dumps presented without context or explanation.  But by likening the creation of a compelling narrative about one's results to "selling cheap cars", this piece goes too far.  Great science is not just about generating reproducible results and "letting the data tell their own story"; it should also give us deeper insights into how the world works, and those insights are fundamentally built around and expressed through narratives, because humans are story-telling animals.    We have all had the experience of sitting through a research talk that involved lots of data and no story, and it's a painful experience; this speaks to the importance of solid narrative in our communication of scientific ideas.

Narrative becomes even more important when we think about conveying our science to the public. Non-scientists are not in a position to "let the data speak to them" because most of them don't speak the language of data; instead, they speak the language of human narrative. It is only by abstracting away from the data to come up with narratives such as "memory is not like a videotape recorder" or "self-control relies on the prefrontal cortex" that we can bring science to the public in a way that can actually have impact on behavior and policy.

I think it would be useful to stop conflating scientific storytelling with "embellishing and cherry-picking".   Great storytelling (be it spoken or written) is just as important to the scientific enterprise as great methods, and we shouldn't let our zeal for the latter eclipse the importance of the former.