Saturday, May 21, 2016

Scam journals will literally publish crap

In the last couple of years, researchers have started to experience an onslaught of invitations to attend scam conferences and submit papers to scam journals.  Many of these seem to emanate from the OMICS group of Henderson, NV and its various subsidiaries.  A couple of months ago I decided to start trolling these scammers, just to see if I could get a reaction.  After sending many of these, I finally got a response yesterday, which speaks to the complete lack of quality of these journals.  

This was the solicitation:
On May 20, 2016, at 12:55 AM, Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com> wrote: 
Dear Dr. Russell A. Poldrack,Greetings from the Journal of Abnormal and Behavioural Psychology
Journal of Abnormal and Behavioural Psychology is successfully publishing quality articles with the support of eminent scientists like you.
We have chosen selective scientists who have contributed excellent work, Thus I kindly request you to contribute a (Research, Review, Mini Review, Short commentary) or any type of article.
The Journal is indexed in with EBSCO (A-Z), Google Scholar, SHERPA-Romeo, Open J-gate, Journal Seek, Electronic Journals Library, Academic Keys, Safety Lit and many more reputed indexing databases.
 
We publish your manuscript within seven days of Acceptance. For your Excellent Research work we are offering huge discount in the publishing fee (70%). So, we will charge you only 300 USD. This huge offer we are giving in this month only. 
...
With kind regards
Sincerely,
Joyce V. Andria

I had previously received exactly this same solicitation about a month ago, to which I had responded like this:
Dear Ms Andria, 
Thanks for your message.  I just spent three minutes reading and thinking about your email.  My rate for commercial consulting is $500/hour.  Can you please remit your payment of $25 to me at the address below?  I’m sure you can understand that the messages from your organization take valuable time away from scientists, and that you would agree that it’s only fair to renumerate us for this time.
I look forward to receiving your payment promptly.  If you do remit within 30 days I will be forced to send this invoice out for collection.
Sincerely,
Russ Poldrack
I got no response to that message.  So when I received the new message, I decided to step up my troll-fu:
Dear Ms. Andria,
Many thanks for your message soliciting a (Research, Review, Mini Review, Short commentary) or any type of article for your journal. I have a paper that I would like to submit but I am not sure what kind of article it qualifies as. The title is "Tracking the gut microbiome". The paper does not include any text; it is composed entirely of photos of my bowel movements taken every morning for one year. Please let me know if your journal has the capability to publish such a paper; I have found that many other journals are not interested.
Sincerely,
Russell Poldrack
Within 12 hours, I had a response:
From: Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com>
Subject: RE: Appreciated your Excellent Research work
Date: May 20, 2016 at 9:47:28 PM PDT
To: "'Russell Alan Poldrack'" <russpold@stanford.edu>
Dear Dr. Russell A. Poldrack,

Greetings from the Journal of Abnormal and Behavioural Psychology

Thank you for your reply.

I hereby inform you that your article entitled: “Tracking the gut microbiome” is an image type article.

We are happy to know that you want to publish your manuscript with us.

We are waiting for your  earliest submission.

We want to introduce your research work in this month to our Journal. We will be honored to be a part of your scientific journey.

Kindly submit your article on before 26th may, 2016.


Awaiting your response.,

With kind regards
Sincerely,
Anna Watson
Journal Coordinator
Journal of Advances in Automobile Engineering
There you have it: These journals will literally publish complete crap. I hope the rest of you will join me in trolling these parasites - post your trolls and any results in the comments.

Friday, May 20, 2016

Advice for learning to code from scratch

I met this week with a psychology student who was interested in learning to code but had absolutely no experience.  I personally think it’s a travesty that programming is not part of the basic psychology curriculum, because doing novel and interesting research in psychology increasingly requires the ability to collect and work with large datasets and build new analysis tools, which are almost impossible without solid coding skills.  

Because it’s been a while since I learned to code (back when programs were stored on cassette tapes), I decided to ask my friends on the interwebs for some suggestions.  I got some really great feedback, which I thought I would synthesize for others who might be in the same boat.  

Some of the big questions that one should probably answer before getting started are:

  1. Why do you want to learn to code?  For most people who land in my office, it’s because they want to be able to analyze and wrangle data, run simulations, implement computational models, or create experiments to collect data.  
  2. How do you learn best?  I can’t stand watching videos, but some people swear by them.  Some people like to just jump in and start doing, whereas others like to learn the concepts and theories first.  Different strokes...
  3. What language should you start with?  This is the stuff of religious wars.  What’s important to realize, though, is that learning to program is not the same as learning to use a specific language.  Programming is about how to think algorithmically to solve problems; the specific language is just an expression of that thinking.  That said, languages differ in lots of ways, and some are more useful than others for particular purposes.  My feeling is that one should start by learning a first-class language, because it will be easier to learn good practices that are more general.  Your choice of a general purpose language should probably be driven by the field you are in; neuroscientists are increasingly turning to Python, whereas in genomics it seems that Java is very popular.  I personally think that Python offers a nice mix of power and usability, and it’s the language that I encourage everyone to start with.  However, if all you care about doing it performing statistical analyses, then learning R might be your first choice, whereas if you just want to build experiments for mTurk, then Javascript might be the answer.  There may be some problem for which MATLAB is the right answer, but I’m no longer sure what it is. A caveat to all of this is that if you have friends or colleagues who are programming, then you should strongly consider using whatever language they are using, because they will be your best source of help.
  4. What problem do you want to solve?  Some people can learn for the sake of learning, but I find that I need a problem in order to keep me motivated.  I would recommend thinking of a relevant problem that you want to solve and then targeting your learning towards that problem.  One good general strategy is to find a paper in your area of research interest, and try to implement their analysis. Another (suggested by Christina van Heer) is to take some data output from an experiment (e.g. in an Excel file), read it in, and compute some basic statistics.  If you don't have your own data, another alternative is to take a large open dataset (such as health data from NHANES or an openfmri dataset from openfmri.org ) and try to wrangle the data into a format that lets you ask an interesting question.
OK then, so where do you look for help in getting started?

The overwhelming favorite in my social media poll was codeacademy.  It offers interactive exercises in lots of different languages, including Python.  Another Pythonic suggestion was http://learnpythonthehardway.org/book/ which looks quite good. 

For those of you who prefer video courses, there were also a number of votes for online courses, including those from Coursera:
And  FutureLearn:
If you like video courses then these would be a good option.  

Other suggestions included:

Here are some suggested sites with various potentially useful tips




Finally, it’s also worth keeping an eye out for local Software Carpentry workshops.

If you have additional suggestions, please leave them in the comments!

Monday, April 18, 2016

How folksy is psychology? The linguistic history of cognitive ontologies

I just returned from a fabulous meeting on Rethinking the Taxonomy of Psychology, hosted by Mike Anderson, Tim Bayne, and Jackie Sullivan.  I think that in another life I must have been a philosopher, because I always have so much fun hanging out with them, and this time was no different.  In particular, the discussions at this meeting moved from simply talking about whether there is a problem with our ontology (which is old hat at this point) to specifically how we can think about using neuroscience to revise the ontology.  I was particularly excited to see all of the interest from a group of young philosophers whose work is spanning philosophy and cognitive neuroscience, who I am counting on to keep the field moving forward!

I have long made the the point that the conceptual structure of current psychology is not radically different from that of William James in the 19th century.  This seems plausible on its face if you look at some of the section headings from his 1890 “To How Many Things Can We Attend At Once?”
  • “The Varieties Of Attention.”
  • “The Improvement Of Discrimination By Practice”
  • “The Perception Of Time.”
  • “Accuracy Of Our Estimate Of Short Durations”
  • “To What Cerebral Process Is The Sense Of Time Due?”
  • “Forgetting.”
  • “The Neural Process Which Underlies Imagination”
  • “Is Perception Unconscious Inference?”
  • “How The Blind Perceive Space.”
  • “Emotion Follows Upon The Bodily Expression In The Coarser Emotions At Least.”
  • “No Special Brain-Centres For Emotion”
  • “Action After Deliberation”:
Beyond the sometimes flowery language, there are all topics that one could imagine being topics of research papers today, but for my talk I wanted to see if there was more direct evidence that the psychological ontology is less different (and thus more "folksy") than ontologies in other sciences.   To address this, I did a set of analyses that looked at the linguistic history of terms in the contemporary psychological ontology (as defined in the Cognitive Atlas) as compared to terms from contemporary biology (as enshrined in the Gene Ontology).  I started (with a bit of help from Vanessa Sochat) by examining the proportion of terms from the Cognitive Atlas that were present in James' Principles (from the full text available here).  This showed that 22.9% of the terms in our current ontology were present in James's text (some examples are: goal, deductive reasoning, effort, false memory, object perception, visual attention, task set, anxiety, mental imagery, unconscious perception, internal speech, primary memory, theory of mind, judgment).

How does this compare to biology?  To ask this, I obtained two biology textbooks published around the same time as James' Principles (T. H. Huxley's Course of Elementary Instruction in Practical Biology from 1892, and T. J. Parker's Lessons in Elementary Biology from 1893), which are both available in full text from Google Books.  In each of these books I assessed the presence of each term from the Gene Ontology, separately for each of the GO subdomains (biological processes, molecular functions, and cellular components).  Here are the results:

Huxley Parker Overlap
biological process (28,566) 0.09% (26) 0.1% (32) 20
molecular functions (10,057) 0 0 -
cellular components (3,903) 1.05% (41) 1.01% (40) 25

The percentages of overlap are much lower, perhaps not surprisingly since the number of GO terms is so much larger than the number of Cognitive Atlas terms.  But even the absolute numbers are substantially lower, and there is not one mention of any of the GO molecular functions (striking but completely unsurprising, since molecular biology would not be developed for many more decades).

These results were interesting, but it could be that they are specific to these particular books, so I generalized the analysis using the Google N-Gram corpus, which indexes the presence of individual words and phrases across more than 3 million books.  Using a python package that accesses the ngram viewer API, I estimated the presence of all of the Cognitive Atlas terms as well as randomly selected subsets of each of the GO subdomains in the English literature between 1800 and 2000; I'm planning to rerun the analysis on the full corpus using the downloaded version of the N-grams corpus, but using this API required throttling that prevented me from the full sets of GO terms.  Here are the results for the Cognitive Atlas:

It is difficult to imagine stronger evidence that the ontology of psychology is relying on pre-scientific concepts; around 80% of the one-word terms in the ontology were already in use in 1800! Compare this to the Gene Ontology terms (note that there were not enough single-word molecular function terms to get a reasonable estimate):




It's clear that the while a few of the terms in these ontologies were in use prior to the development of the biosciences, the proportion is much smaller than what one sees for psychology. In my talk, I laid out two possibilities arising from this:

  1. Psychology has special access to its ontology that obviates the need for a rejection of folk concepts
  2. Psychology is due for a conceptual revolution that will leave behind at least some of our current concepts
My guess is that the truth lies somewhere in between these.  The discussions that we had at the meeting in London provided some good ideas about how to conceptualize the kinds of changes that neuroscience might drive us to make to this ontology. Perhaps the biggest question to come out of the meeting was whether a data-driven approach can ever overcome the fact that the data were collected from experiments that are based on the current ontology. I am guessing that it can (given, e.g. the close relations between brain activity present in task and rest), but this remains one of the biggest questions to be answered.  Fortunately there seems to be lots of interest and I'm looking forward to great progress on these questions in the next few years.

Friday, February 26, 2016

Reproducibility and quantitative training in psychology

We had a great Town Hall Meeting of our department earlier this week, which was focused on issues around reproducibility, which Mike Frank has already discussed in his blog.  A number of the questions that were raised by both faculty and graduate students centered around training, and this has gotten many of us thinking about how we should update our quantitive training to address these concerns.  Currently the graduate statistics course is fairly standard, covering basic topics in probability and statistics including basic probability theory, sampling distributions, null hypothesis testing, general(ized) linear models (regression, ANOVA), and mixed models, with exercises done primarily using R.  While many of these topics remain essential for psychologists and neuroscientists, it's equally clear that there are a number of other topics that we might want to cover that are highly relevant to issues of reproducibility:

  • the statistics of reproducibility (e.g., implications of power for predictive validity; Ioannidis, 2005)
  • Bayesian estimation and inference
  • bias/variance tradeoffs and regularization
  • generalization and cross-validation
  • model-fitting and model comparison
There are also a number of topics that are clearly related to reproducibility but fall more squarely under the topic of "software hygiene":
  • data management
  • code validation and testing
  • version control
  • reproducible workflows (e.g., virtualization/containerization)
  • literate programming
I would love to hear your thoughts about what a 21st century graduate statistics course in psychology/neuroscience should cover- please leave comments below!

Wednesday, December 9, 2015

Reproducible analysis in the MyConnectome project

Today our paper describing the MyConnectome project was published in Nature Communications.  This paper is unlike any that I have ever worked on before (and probably ever will again), as it reflects analyses of data collected on myself over the course of 18 months from 2012-2014.  A lot has been said already about what the results might or might not mean.  What I want to discuss here is the journey that ultimately led me to develop a reproducible shared analysis platform for the study.

Data collection was completed in April 2014, shortly before I moved to the Bay Area, and much of that summer was spent analyzing the data.  As I got deeper into the analyses, it became clear that we needed a way to efficiently and automatically reproduce the entire set of analyses.  For example, there were a couple of times during the data analysis process when my colleagues at Wash U updated their preprocessing strategy, which meant that I had to rerun all of the statistical analyses that relied upon those preprocessed data. This ultimately led me to develop a python package (https://github.com/poldrack/myconnectome) that implements all of the statistical analyses (which use a mixture of python, R, and **cough** MATLAB) and provides a set of wrapper scripts to run them.  This package made it fairly easy for me to rerun the entire set of statistical analyses on my machine by executing a single script, and provided me with confidence that I could reproduce any of the results that went into the paper.  

The next question was: Can anyone else (including myself at some later date) reproduce the results?  I had performed the analyses on my Mac laptop using a fairly complex software stack involving many different R and python packages, using a fairly complex set of imaging, genomic, metabolomic, and behavioral data.  (The imaging and -omics data had been preprocessed on large clusters at the Texas Advanced Computing Center (TACC) and Washington University; I didn’t attempt to generalize this part of the workflow).  I started by trying to replicate the analyses on a Linux system; identifying all of the necessary dependencies was an exercise in patience, as the workflow would break at increasingly later points in the process.  Once I had the workflow running, the first analyses showed very different results between the platforms; after the panic subsided (fortunately this happened before the paper was submitted!), I tracked the problem down to the R forecast package on Linux versus Mac (code to replicate issue available here).  It turned out that the auto.arima() function (which is the workhorse of our time series analyses) returned substantially different results on Linux and Mac platforms if the Y variable was not scaled (due apparently to a bug on the Linux side), but very close results when the Y variable was scaled. Fortunately, the latest version of the forecast package (6.2) gives identical results across Linux and Mac regardless of scaling, but the experience showed just how fragile our results can be when we rely upon complex black-box analysis software, and how we shouldn't take cross-platform reproducibility for granted (see here for more on this issue in the context of MRI analysis).

Having generalized the analyses to a second platform, the next logical step was to generalize it to any machine.  After discussing the options with a number of people in the open science community, the two most popular candidates were provisioning of a virtual machine (VM) using Vagrant, or creating a Docker container.  I ultimately chose to go with the Vagrant solution, primarily because it was substantially easier; in principle you simply set up a Vagrantfile that describes all of the dependencies, and type “vagrant up”.    Of course, this “easy” solution took many hours to actually implement successfully because it required reconstruction of all of the dependencies that I had taken for granted on the other systems, but once it was done we had a system that allows anyone to recreate the full set of statistical analyses exactly on their system, which is available at https://github.com/poldrack/myconnectome-vm

A final step was to provide a straightforward way for people to view the complex set of results.  Our visualization guru, Vanessa Sochat, developed a flask application (https://github.com/vsoch/myconnectome-explore) that provides a front end to all of the HTML reports generated by the various analyses, as well as a results browser that allows one to browse the 38,363 statistical tests that were computed for project.  This browser is available locally if one installs and runs the VM, and is also accessible publicly from http://results.myconnectome.org
Dashboard for analyses

Browser for timeseries analysis results

We have released code and data with papers in the past, but this is the first paper I have ever published that attempts to include a fully reproducible snapshot of the statistical analyses.  I learned a number of lessons in the process of doing this:
  1. The development of a reproducible workflow saved me from publishing a paper with demonstrably irreproducible results, due to the OS-specific software bug mentioned above.  This in itself makes the entire process worthwhile from my standpoint.
  2. Converting a standard workflow to a fully reproducible workflow is difficult. It took many hours of work beyond the standard analyses in order to develop a working VM with all of the analyses automatically run; that doesn’t even count the time that went into developing the browser. Had I started the work within a virtual machine from the beginning, it would have been much easier, but still would require extra work beyond that needed for the basic analyses.
  3. Ensuring longevity of a working pipeline is even harder.  The week before the paper was set to published I tried a fresh install of the VM to make sure it was still working.  It wasn’t.  The problem was simple (miniconda had changed the name of its installation directory), and highlighted a significant flaw in our strategy, which was that we had not specified software versions in our VM provisioning.  I hope that we can add that in the future, but for now, we have to keep our eyes out for the disruptive effects of software updates.
I look forward to your comments and suggestions about how to better implement reproducible workflows in the future, as this is one of the major interests of our Center for Reproducible Neuroscience.

Sunday, November 1, 2015

Are good science and great storytelling compatible?

Chris Chambers has a piece in the Guardian ("Are we finally getting serious about fixing science?") discussing a recent report about reproducibility from the UK Academy of Medical Sciences, based on a meeting held earlier this year in London. A main theme of the piece is that scientists need to focus more on going good science and less on "storytelling":
Some time in 1999, as a 22 year-old fresh into an Australian PhD programme, I had my first academic paper rejected. “The results are only moderately interesting”, chided an anonymous reviewer. “The methods are solid but the findings are not very important”, said another. “We can only publish the most novel studies”, declared the editor as he frogmarched me and my boring paper to the door.
I immediately asked my supervisor where I’d gone wrong. Experiment conducted carefully? Tick. No major flaws? Tick. Filled a gap in the specialist literature? Tick. Surely it should be published even if the results were a bit dull? His answer taught me a lesson that is (sadly) important for all life scientists. “You have to build a narrative out of your results”, he said. “You’ve got to give them a story”. It was a bombshell. “But the results are the results!” I shouted over my coffee. “Shouldn’t we just let the data tell their own story?” A patient smile. “That’s just not how science works, Chris.”
He was right, of course, but perhaps it’s the way science should work. 

None of us in the reproducibility community would dispute that the overselling of results in service of high-profile publications is problematic, and I doubt that Chambers really believes that our papers should just be data dumps presented without context or explanation.  But by likening the creation of a compelling narrative about one's results to "selling cheap cars", this piece goes too far.  Great science is not just about generating reproducible results and "letting the data tell their own story"; it should also give us deeper insights into how the world works, and those insights are fundamentally built around and expressed through narratives, because humans are story-telling animals.    We have all had the experience of sitting through a research talk that involved lots of data and no story, and it's a painful experience; this speaks to the importance of solid narrative in our communication of scientific ideas.

Narrative becomes even more important when we think about conveying our science to the public. Non-scientists are not in a position to "let the data speak to them" because most of them don't speak the language of data; instead, they speak the language of human narrative. It is only by abstracting away from the data to come up with narratives such as "memory is not like a videotape recorder" or "self-control relies on the prefrontal cortex" that we can bring science to the public in a way that can actually have impact on behavior and policy.

I think it would be useful to stop conflating scientific storytelling with "embellishing and cherry-picking".   Great storytelling (be it spoken or written) is just as important to the scientific enterprise as great methods, and we shouldn't let our zeal for the latter eclipse the importance of the former.

Wednesday, August 26, 2015

New course on decision making: Seeking feedback

I am currently developing a new course on the psychology of decision making that I will teach at Stanford in the Spring Quarter of 2016. I've looked at the various textbooks on this topic and I'm not particularly happy with any of them, so I am rolling my own syllabus and will use readings from the primary literature.  I have developed a draft syllabus and would love to get feedback: Are there important topics that I am missing?  Different readings that I should consider?  Topics I should consider dropping?  Please leave comments with your suggestions, or email me at poldrack@gmail.com!

Part 1: What is a decision? 

1. Varieties of decision making (overview of course)


Part 2: Normative decision theory: How an optimal system should make decisions

2. axiomatic approach from economics
- TBD reading on expected utility theory


3. Bayesian decision theory
K├Ârding, K. P. (2007). Decision Theory: What “Should” the Nervous System Do? Science, 318(5850), 606–610. http://doi.org/10.1126/science.1142998

4. Information accumulation
Smith & Ratcliff, 2004, Psychology and neurobiology of simple decisions.  TINS.


Part 3: Psychology: How humans make decisions

5. Anomalies: the ascendence of psychology and behavioral economics
Kahneman, D. (2003). A perspective on judgment and choice. American Psychologist,
58, 697-720

6. Judgment: Anchoring and adjustment
Chapman, G.B. & Johnson, E.J. (2002). Incorporating the irrelevant: Anchors in
judgment of belief and value

7. Heuristics: availability, representativeness
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185, 1124-1131. 

8. Risk and uncertainty: Risk perception, risk attitudes
Slovic, P. (1987). Perception of risk. Science, 236, 280-285

9. Prospect theory 
Kahneman, D. & Tversky A. (1984). Choices, values, and frames. American
Psychologist, 39, 341–350.

10. Framing, endowment effects, and applications of prospect theory
Kahneman, D., Knetsch, J.L., & Thaler, R.H. (1991). The endowment effect, loss
aversion, and status quo bias. Journal of Economic Perspectives, 5, 193-206.

11. Varieties of utility
Kahneman, Wakker, & Sarin (1997). Back to Bentham: Explorations of experienced utility.  Quarterly Journal of Economics.

12. Intertemporal choice and self-control
Mischel, W., Shoda, Y., & Rodriguez, M.L. (1989). Delay of gratification in children. Science, 244, pp. 933-938.

13. Emotion and decision making
Rottenstreich, Y. & Hsee, C.K. (2001). Money, kisses and electric shocks: On the
affective psychology of risk. Psychological Science, 12, 185-190.

14. Social decision making and game theory
TBD

Part 4: Neuroscience of decision making

15. Neuroscience of simple decisions
Sugrue, Corrado, & Newsome (2005). Choosing the greater of two goods: neural currencies for valuation and decision making. Nature Reviews Neuroscience.

16. Neuroscience of Value-based decision making
Rangel et al., 2008, A framework for studying the neurobiology of value-based decision making

17. Reinforcement learning and dopamine, wanting/liking
Schultz, Montague, and Dayan (1997) A neural substrate of prediction and reward

18. Decision making in simple organisms
Reading TBD (c. elegans, snails, slime mold, etc)
possibilities:


Part 5: Ethical issues

19. Free will
Roskies (2006) Neuroscientific challenges to free will and responsibility.
OR:
Shadlen & Roskies (2012). The neurobiology of decision-making and responsibility: reconciling mechanism and mindedness.