Week 4: All About Pre-Registration

Presently there is a large degree of variability regarding the understanding and application of pre-registration in psychological science. From what I have seen on social media, read in papers and other scholarly work, and from reading actual pre-registrations from different labs, there is no agreed upon definition of pre-registration or guiding principles for when and how to implement a pre-registration. This is perhaps to be expected at such an early stage of adoption by academics not used to publicly sharing their ideas prior to testing them, with a non-trivial number of academics remaining skeptical of the practice. The goal of this week’s class was to introduce the students to a definition of pre-registration, to discuss some common “yes, but…” reasons for not pre-registering hypotheses, methods and data analytic plans, and to share some resources for how to implement a pre-registration.

Here is my working definition of pre-registration: Stating as clearly and specifically as possible what you plan to do, and how, before doing it, in a manner that is verifiable by others.

If you have an idea that you would like to test with new data or existing data, you can share your idea and plans for testing it before doing so. The alternative is not sharing this information before testing your idea, meaning it either (a) gets shared to some degree in a manuscript that is written after testing your idea, or (b) not shared because you chose not to write a manuscript that is written after testing your idea. In my opinion sharing before versus (maybe) after testing your idea is the better option. I therefore suggest that academics pre-register all of their planned research pursuits. In this post I will attempt to explain why.

There is no one correct way to implement a pre-registration, and a pre-registration itself is no guarantee that your hypotheses, methods, and/or data analytic approach were sound. Stating in a manuscript that your idea was pre-registered also does not imply the degree of specificity of your hypothesis, or that you followed your pre-registered protocol as specified. Importantly, however, these things are now verifiable by reading the pre-registration materials. It is worth taking the time to learn how to best communicate your intentions in advance of a given research pursuit, perhaps seeking feedback from other experts during this process, with the assumption that consumers of your research will take the time to read your pre-registration materials.

Common “Yes, but…” Arguments Against Pre-Registration

Here are four common “yes, but…” arguments I hear regarding why a given researcher cannot implement pre-registrations for his or her research:

  • It only applies when you have specific, confirmatory hypotheses. “My work is often theoretically guided but I do not always test specific, confirmatory hypotheses from the outset.”
  • It is simply not feasible or practical for complex study designs (e.g., longitudinal designs, large scale observational studies).
  • The data are already collected so (a) “I have nothing to pre-register”, and/or (b) “I have already analyzed some of the data so I can’t pre-register.”
  • It puts limits on what can be done with the data. “I may have some hypotheses and plans to test them, but as a scientist I need to go to where the data takes me and therefore do not want to be limited to only the analyses I could think of in advance. Pre-registration can stifle creativity and even scientific discovery.”

The short answer to each of these arguments is: nope. According to the definition of pre-registration I put forward, it is always to possible to state what you plan to do before you do it, as long as you are open and transparent about the state of the research pursuit in question. Here are some longer answers to these four arguments:

  • Pre-registration is not only for purely confirmatory research. It applies equally well for research that is largely exploratory, or somewhere in between exploratory and confirmatory. If, for example, you plan to collect self-report personality data from a large group of individuals and follow them over time to observe variability in different personality traits but you are not sure what that variability should look like, you can say that in a pre-registration. If you are not sure if the association between some theoretical constructs you assessed in your study should resemble patterns of mediation or moderation and want to test both, you can say that in a pre-registration. If you want to collect responses on many scales that may/may not be correlated with each other from a sample of students in an effort to select some of these scales for use in another study, you can say that in a pre-registration. Here is the pattern that is unfolding: after saying what you would like to do with your data collection and/or data analysis simply add “I can say that in a pre-registration”. Pre-registering vague ideas or exploratory research also helps prevent the researcher from using the words “As predicted…” in future publications using results from this research.
  • It is not necessary for a pre-registration to include every single possible hypothesis and accompanying data analytic plan for a given data set. If you first plan to analyze a subset of the data from, for example, a large sample of married couples assessed over two years (with data collected at 8 time points), you can pre-register those plans. If you decide to analyze a different set of ideas with different data collected from this sample, you can pre-register that at another time.
  • Data may already be collected, but it is still possible to state in advance your idea and how you plan to use existing data to test this idea. Be upfront with your prior experiences with this data set and how your new ideas were generated.
  • Pre-registration does not put limits on what you can do, but rather helps distinguish between analyses that were planned in advance from those that were conducted post-hoc (or between more confirmatory and exploratory analyses). Ideas and data analytic decisions that are made because of experiences working with the data (ideas and decisions you did not have prior to working with the data) are exploratory and should be labeled as such. Of course follow-up analyses are often needed and can lead to new, perhaps unexpected, patterns of results (that will need to be replicated at some point with independent data to properly test these hypotheses).

At this point in my conversations with those skeptical of pre-registration, they often say “Ok so I can pre-register my ideas, but it will not fix ALL the problems!” Agreed. The goal of pre-registration, though, is not to fix all the problems with the academic research process. It can help solve the problem of Hypothesizing After Results are Known (HARKing), or stating in a manuscript that is written after all of the data analyses have been completed that the hypotheses put forward in that manuscript were crafted exactly as specified prior to collecting data and/or data analysis. Solving that problem would be a huge achievement.

The skeptic at this point often suggests that researchers could simply game the system by pre-registering their ideas and data analytic plans after looking at their data! If so, they benefit at the expense of honest, hard working scientists. Yes, researchers could do that. But if they did, they would be committing fraud, not very different from the likes of former successful scientists in our field that faked their data and subsequently lost their jobs. If we assume that outright fraud is rare in our field now, I think we can further assume that it will remain rare with respect to pre-registration fraud.

For the converted skeptic, this is where we discuss what tools are available to implement pre-registrations, and what information should be included in a pre-registration. I will save that discussion for another day, but there are some very useful resources available to assist with putting together useful and informative pre-registrations for all your research needs, such as:

 

Week 3: Open Notebook

The focus of discussion this week was on elements of the open science process to consider at the beginning of the research process. We started with a brief discussion of how the benefits of open and high powered research outweigh costs (Lebel, Campbell, & Loving, 2017). I then provided a tutorial on how to use the Open Science Framework (OSF) to manage the research workflow. Here is a link to a useful webinar. When I talk to people that are not familiar with using the OSF they are confused regarding the differences between “projects” and “registrations”, and think the OSF is primarily used to “pre-register” study hypotheses. I share my belief that whereas the OSF is useful for pre-registrations, it is even more useful as a free web platform for managing the research process with collaborators. In fact, a researcher could choose to keep all project pages private (i.e., not share with the public) and never pre-register anything at all, but use the OSF to efficiently manage his or her research workflow. That seems unlikely to happen, but the point is that projects on the OSF are dynamic with a great deal of functionality over the course of the research process. In our lab we use the OSF to store research materials (scales, procedures, data files, code, ethics applications, and so on), but also more and more to document communications between collaborators during the research process. And that is where open notebook comes in.

Open notebook is “…the practice of making the entire primary record of a research project publically available online as it is recorded.” At first blush this does not seem all that different from basic “open science” principles whereby the researcher publicly shares research materials, specifies hypotheses in advance, outlines a data analytic plan, and then later on shares data, meta-data and code (all topics to be covered in future classes). But the concept of open notebook also includes sharing the laboratory notebook, something that often contains communications between collaborators, as well as “dear diary” types of entries, that shed light on the decision making process throughout the research process. An open notebook can take on many different forms, and there are some excellent examples from the medical bio-sciences here, here, and here. These three examples take the form of dedicated web pages with regular updates, and I admire their commitment to “extreme open science”.

Rather than create a dedicated website for our lab notebooks, I wanted to develop an approach that uses what our lab already uses on a daily basis—the OSF. That is where all of our research materials are stored for our projects already, but what has been missing to date is the nature of the communications between colleagues during the research process that results in particular decisions being made. And decisions need to be made regularly as new issues arise that were not considered earlier.

Along with current graduate student Nicolyn Charlot, we are trying out the following open notebook approach. First, rather than using email for basic communications for a current project (this one), we decided to make use of the “comments” function that is available for every project page (the “word bubble” icon located at the top right of the screen). That way our messages are documented over time as they occur and are embedded with all of our research materials. With notifications for comments set to “on” (under the settings tab), we receive emails when a new comment has been added. Second, because many of our decisions are made during lab meetings and not via email, we decided to briefly document the decision making process following each lab meeting in a shared google doc file that is linked to a component nested within the main project page titled “Open Notebook”. Here is the open notebook for our project. It is within this component where we communicate using the comments function (click on it and see what we have so far), and where we keep the shared file. Our goal will be to have an “Open Notebook” component for all new projects going forward as a way to document the decision making process, and to add a more personal element to the project pages beyond simply being a repository of files.

I am curious to see how it works out.

Week 2: Why Should Science be Open and Reproducible?

In one word: evaluation.

The motto of the Royal Society is Nullius in verba, or “Take nobody’s word for it”. When evaluating scientific claims, this means focusing on the appropriateness of the research process (e.g., study design, analytic design, interpretation of outcomes, and so on) rather than accepting the claims based on the authority of the individual(s) proposing them. To evaluate the reported outcomes of a study, it is therefore critical to have access to details of the research process that produced those outcomes.

For many years now, these details have typically been shared in the print pages of academic journals. Academic journals typically have page budgets, or a contractually agreed upon number of print pages that can appear in a one year period. Longer individual papers therefore translate into fewer papers published in a given journal each year, meaning that Editors have often asked authors to shorten their papers during the review process. Given the complexity of many experimental designs, and the volume of information collected in many large scale observational studies, it is not feasible to include all methodological details or results within each paper. Authors have typically provided as much detail about how the study was conducted they felt necessary to interpret the results presented, with the promise that omitted details were available upon request if so desired.

But are these details really available upon request? There are many reasons to believe that at least some details would not be available upon request. For example, individuals may change careers and no longer have access to these materials; over time we all die and are not able to provide these details; issues with storage (of both physical and digital copies) can arise that render the details no longer accessible; people are busy and may not have the time to search for these details when asked. And the list goes on. As suggested by the readings for this week, it turns out that the majority of requested research details are not in fact available upon request! This includes access to data and analytic code, things that are important to be available given the non-trivial amount of statistical reporting errors in the literature. Nullius in verba implies that we should not take the word of researchers that say “available upon request”.

The paper by Simmons, Nelson, and Simonsohn (2011) in Psychological Science, as well as the interesting information presented on www.flexiblemeasures.com, also suggest that evaluation of scientific claims requires knowing what research decisions (e.g., hypotheses, use of measures, data analytic plans) were made prior to collecting data and/or data analyses versus after. For example, regarding the use of the Competitive Reaction Time Task (CRTT) in research assessing aggressive behavior, it has been shown that responses to this task have been quantified for data analysis in 156 different ways within 130 academic publications (http://www.flexiblemeasures.com/crtt/)!  It seems likely that some of the decisions regarding how to quantify the CRTT within these publications were made during the process of data analysis (i.e., using a data-driven approach to determine a quantification strategy that yielded “significant” results), but it is impossible to determine which ones in the absence of pre-registration of data analytic strategies. Simmons et al. (2011) also demonstrate how the application of different Questionable Research Practices (QRPs) (e.g., checking for significance, collect more data, check again; trying out a few different DVs; adding covariates and/or moderators) can inflate the Type I error rate well beyond the 5% level typically adopted by researchers using a frequentist data analytic approach. We need to know when decisions were made during the research process, and why, in order to properly evaluate the reported results of the research.

Current technology allows for sharing important details of the research process as the research is being considered, conducted, analyzed, and written up into a manuscript. We are no longer limited by how many print pages a traditional journal is budgeted to print each year and then mail to subscribers. We are also not limited with respect to when we share details of the research process—we no longer have to wait to publish everything we can squeeze into a manuscript.

This is an exciting time to be a scientist, but of course it is scary for some scientists to consider all of these changes occurring in a fairly short period of time. In my opinion, greater openness and transparency of the research process is the future of science. There is so much that is being done, and that can be done, to ensure our research is open and reproducible. Next week we start at the beginning of the research process.

Open and Reproducible Science—Introduction to my new Graduate Course

In 2014 my lab began the transition to using open and reproducible research practices (I wrote a blog post about it). Almost a year later, and after a steep learning curve, I realized I needed to organize my open science. There was a lot of discussion within the field of psychology at that time on the idea of open science, but few suggestions on how to actually implement different open science practices throughout the research workflow for different types of research projects. Along with Etienne LeBel and Tim Loving we thought it through, published a paper in December 2014 with some specific recommendations (or so they seemed like it at the time), and then our lab made it up as we went along. To my pleasant surprise I was asked to give a workshop on “doing open science” in November 2015 at the University of Toronto, Mississauga. I really enjoyed talking to faculty and students about this topic, and I was honored to be asked to give similar workshops/talks at many different places during the next two years. Overall, I have now given 14 presentations on open and reproducible science in Canada, the USA, New Zealand, and Turkey (at the bottom of this post there is a list of these talks, including links to slides and recordings where available). I am also very happy to see that in the 3 or so years since publishing our open science recommendations, many journals in the field of psychology are changing their editorial policies to align with open science practices.

As I developed and tweaked my slides over this two year period, I learned a lot more about (a) what is being done in different fields to enhance research transparency and reproducibility, and (b) what can be done with existing technology. With all of this information, I decided to create a new graduate course called “Open and Reproducible Science” so I could share with trainees how they can begin their research career in a way that makes their future publications more open to evaluation and more likely to be reproducible (in many ways) by others (something I suggested was lacking in our graduate training programs here). I put together a syllabus and solicited feedback via Twitter. I received many helpful suggestions, as well as two offers for guest lectures—one by Seth Green from CodeOcean.com, and on from Joanne Paterson, a librarian at Western University. Click here to see what I ended up putting together for this inaugural course. I am excited that 16 grad students from different areas of my psychology department enrolled for this course, beginning January 11, 2018.

My goal is to write expanded lecture notes in a blog post for each week of the class. In these posts I will discuss my planned talking points for each class, as well as flesh out specific examples of how one might use different open science practices throughout the research workflow. Ok, now I need to go re-read the assigned article for the first class: Munafo et al’s (2017) “A manifesto for reproducible science”.

Invited Talks on Open Science and Replication

2015

  • November 3, Workshop presented at the University of Toronto, Mississauga (Psychology), Canada

2016

  • January 28, Pre-Conference of the Society of Personality and Social Psychology (SPSP), San Diego, USA
  • June 10, Conference of the Canadian Psychological Association, Victoria, Canada
  • October 3, York University (Psychology), Canada (audio recording)
  • October 11, University of Toronto (Psychology), Canada
  • October 19, University of Guelph (Family Relations and Applied Nutrition), Canada
  • October 21, Illinois State University, (Psychology), USA
  • November 11, Victoria University Wellington (Psychology), New Zealand
  • November 24, University of Western Ontario (Clinical Area), Canada
  • December 2, University of Western Ontario (Developmental Area), Canada

2017

  • January 19, Workshop presented at Sabanci University, Istanbul, Turkey (with thanks to a Travel Grant awarded to Asuman Buyuckan-Tetik and me from the European Association of Social Psychology)
  • March 10, Western Research Forum Panel Discussion on Open Access: “What’s in it for me?”, London, Canada
  • May 25, Workshop presented at the conference of the Association for Psychological Science (APS), Boston, USA
  • November 10, Plenary address, conference of the Society for the Scientific Study of Sexuality (SSSS), Atlanta, USA

Pre-Registered Publications From Our Lab

Updated

Below is a list of now published studies (as of October 20, 2017) that had pre-registered (a) hypotheses, (b) procedure and materials, and (in most cases) (c) a data analytic plan. We have more original empirical studies under review and in preparation. When I compiled this list I found it interesting that of the six original empirical pre-registered publications, three are in open access journals. We also are currently collecting data for a registered report involving videotaping lab based interactions between romantically involved partners. Our lab has also been active in conducting, and publishing, replication studies. We have published seven replication studies to date, including one Registered Replication Report including data from 16 independent labs. Four of these publications resulted from the group project in my graduate research methods course. Another publication (“Self-esteem, relationship threat…”) was conducted by all members of the lab and involved running over 200 romantically involved couples through a lab based manipulation, one couple at a time; it took one year to collect the data. Upon publication of this paper, however, we did receive our $1000 pre-registration challenge prize money and had a wonderful “lab night out”.

In my view the practice of pre-registration has been helpful in many ways, such as (a) helping clarify what we truly expect to emerge and what we simply think might happen, (b) making us ask ourselves why we are including each measure (why is it relevant/important?), (c) allowing us to develop our data analytic code while data is being collected because we already thought out our data analytic plan, and (d) making it easier to write the manuscript when we are finished given that we have already largely written the methods section, as well as written the rationale for our hypotheses and data analytic plan. I will let others judge if this practice has stifled our creativity (but look at this study, not yet published, before making your final judgement: https://osf.io/yksxt/).

Note: links to the OSF project pages are located on the journal name.

Original Empirical Studies

Dobson, K., Campbell, L., & Stanton, S.C.E. (in press). Are you coming on to me? Bias and accuracy in couples’ perceptions of sexual advances. Journal of Social and Personal Relationships.

Kohut, T., Balzarini, R.N., Fisher, W.A., & Campbell, L. (in press). Pornography’s associations with open sexual communication and relationship closeness vary as a function of dyadic patterns of pornography use within heterosexual relationships. Journal of Social and Personal Relationships.

Balzarini, R.N., Campbell, L., Kohut, T., Holmes, B.M., Lemiller, J.J., Harman, J.J., & Atkins, N. (2017). Perceptions of primary and secondary relationships in polyamory. PLoS ONE 12(5): e0177841. https://doi.org/10.1371/journal.pone.0177841.

Buyukcan-Tetik, A., Campbell, L., Finkenauer, C., Karremans, J.C., & Kappen, G. (2017). Ideal standards, acceptance, and relationship satisfaction: Latitudes of differential effects. Frontiers in Psychology, doi: 10.3389/fpsyg.2017.01691.

Campbell, L., Chin, K., & Stanton, S.C.E. (2016). Initial evidence that individuals form new relationships with partners that more closely match their ideal preferences. Collabra, 2(1), p.2. DOI: http://doi.org/10.1525/collabra.24

Stanton, S.C.E., & Campbell, L. (2016). Attachment avoidance and amends-making: A case advocating the need for attempting to replicate one’s own work. Journal of Experimental Social Psychology, 67, 43-49.

In Principle Agreement (Registered Report)

Hahn, C., Campbell, L., Pink, J.C., & Stanton, S.C.E. (In principle agreement). The role of adult attachment orientation in information-seeking strategies employed by romantic partners. Comprehensive Results in Social Psychology.

Replication Studies

Babcock, S., Li, Y., Sinclair, V., Thomson, C., & Campbell, L. (2017). Two replications of an investigation on empathy and utilitarian judgment across socioeconomic status. Scientific Data 4, Article number: 160129, doi: 10.1038/sdata.2016.129

Balakrishnan, A., Palma, P.A., Patenaude, J., & Campbell, L. (2017). A 4-study replication of the moderating effects of greed on socioeconomic status and unethical behaviour. Scientific Data 4, Article number: 160120, doi: 10.1038/sdata.2016.120

Balzarini, R.N., Dobson, K., Chin, K., & Campbell, L. (2017). Does exposure to erotica reduce attraction and love for romantic partners in men? Independent replications of Kenrick, Gutierres, and Goldberg (1989) study 2. Journal of Experimental Social Psychology, 70, 191-197.

Campbell, L., Balzarini, R.N., Kohut, T., Dobson, K., Hahn, C.M., Moroz, S.E., & Stanton, S.C.E. (2017). Self-esteem, relationship threat, and dependency regulation: Independent replication of Murray, Rose, Bellavia, Holmes, and Kusche (2002) Study 3. Journal of Research in Personality. https://doi.org/10.1016/jrp.2017.04.001.

Cheung, I., Campbell, L., & LeBel, E.P., …Yong, J.C. (2016). Registered replication report: Study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002). Perspectives on Psychological Science, 11, 750-764.

Connors, S., Khamitov, M., Moroz, S., Campbell, L., & Henderson, C. (2016). Time, money, and happiness: Does putting a price on time affect our ability to smell the roses? Journal of Experimental Social Psychology, 67, 60-64.

LeBel, E.P., & Campbell, L. (2013). Heightened sensitivity to temperature cues in highly anxious individuals: Real or elusive phenomenon? Psychological Science, 24, 2128-2130.

A Commitment to Better Research Practices (BRPs) in Psychological Science

Scientific research is an attempt to identify a working truth about the world that is as independent of ideology as possible.  As we appear to be entering a time of heightened skepticism about the value of scientific information, we feel it is important to emphasize and foster research practices that enhance the integrity of scientific data and thus scientific information. We have therefore created a list of better research practices that we believe, if followed, would enhance the reproducibility and reliability of psychological science. The proposed methodological practices are applicable for exploratory or confirmatory research, and for observational or experimental methods.

  1. If testing a specific hypothesis, pre-register your research[1], so others can know that the forthcoming tests are informative. Report the planned analyses as confirmatory, and report any other analyses or any deviations from the planned analyses as exploratory.
  2. If conducting exploratory research, present it as exploratory. Then, document the research by posting materials, such as measures, procedures, and analytical code so future researchers can benefit from them. Also, make research expectations and plans in advance of analyses—little, if any, research is truly exploratory. State the goals and parameters of your study as clearly as possible before beginning data analysis.
  3. Consider data sharing options prior to data collection (e.g., complete a data management plan; include necessary language in the consent form), and make data and associated meta-data needed to reproduce results available to others, preferably in a trusted and stable repository. Note that this does not imply full public disclosure of all data. If there are reasons why data can’t be made available (e.g., containing clinically sensitive information), clarify that up-front and delineate the path available for others to acquire your data in order to reproduce your analyses.
  4. If some form of hypothesis testing is being used or an attempt is being made to accurately estimate an effect size, use power analysis to plan research before conducting it so that it is maximally informative.
  5. To the best of your ability maximize the power of your research to reach the power necessary to test the smallest effect size you are interested in testing (e.g., increase sample size, use within-subjects designs, use better, more precise measures, use stronger manipulations, etc.). Also, in order to increase the power of your research, consider collaborating with other labs, for example via StudySwap (https://osf.io/view/studyswap/). Be open to sharing existing data with other labs in order to pool data for a more robust study.
  6. If you find a result that you believe to be informative, make sure the result is robust. For smaller lab studies this means directly replicating your own work or, even better, having another lab replicate your finding, again via something like StudySwap.  For larger studies, this may mean finding highly similar data, archival or otherwise, to replicate results. When other large studies are known in advance, seek to pool data before analysis. If the samples are large enough, consider employing cross-validation techniques, such as splitting samples into random halves, to confirm results. For unique studies, checking robustness may mean testing multiple alternative models and/or statistical controls to see if the effect is robust to multiple alternative hypotheses, confounds, and analytical approaches.
  7. Avoid performing conceptual replications of your own research in the absence of evidence that the original result is robust and/or without pre-registering the study. A pre-registered direct replication is the best evidence that an original result is robust.
  8. Once some level of evidence has been achieved that the effect is robust (e.g., a successful direct replication), by all means do conceptual replications, as conceptual replications can provide important evidence for the generalizability of a finding and the robustness of a theory.
  9. To the extent possible, report null findings. In science, null news from reasonably powered studies is informative news.
  10. To the extent possible, report small effects. Given the uncertainty about the robustness of results across psychological science, we do not have a clear understanding of when effect sizes are “too small” to matter. As many effects previously thought to be large are small, be open to finding evidence of effects of many sizes, particularly under conditions of large N and sound measurement.
  11. When others are interested in replicating your work be cooperative if they ask for input. Of course, one of the benefits of pre-registration is that there may be less of a need to interact with those interested in replicating your work.
  12. If researchers fail to replicate your work continue to be cooperative. Even in an ideal world where all studies are appropriately powered, there will still be failures to replicate because of sampling variance alone. If the failed replication was done well and had high power to detect the effect, at least consider the possibility that your original result could be a false positive. Given this inevitability, and the possibility of true moderators of an effect, aspire to work with researchers who fail to find your effect so as to provide more data and information to the larger scientific community that is heavily invested in knowing what is true or not about your findings.

We should note that these proposed practices are complementary to other statements of commitment, such as the commitment to research transparency (http://www.researchtransparency.org/). We would also note that the proposed practices are aspirational.  Ideally, our field will adopt many, of not all of these practices.  But, we also understand that change is difficult and takes time.  In the interim, it would be ideal to reward any movement toward better research practices.

Brent W. Roberts, Rolf A. Zwaan, Lorne Campbell

[1] van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. doi:10.1016/j.jesp.2016.03.004

The Twelve Days of Open Science

On the first day of Open Science, research mavericks gave to me, a huge study called the RP:P.

On the second day of Open Science, research mavericks gave to me, the Center for Open Science, And a huge study called the RP:P.

On the third day of Open Science, research mavericks gave to me, a list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the fourth day of Open Science, research mavericks gave to me, The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the fifth day of Open Science, research mavericks gave to me, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the sixth day of Open Science, research mavericks gave to me, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the seventh day of Open Science, research mavericks gave to me, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On eighth day of Open Science, research mavericks gave to me, registered replication reports, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the ninth day of Open Science, research mavericks gave to me, open access journals, registered replication reports, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the tenth day of Open Science, research mavericks gave to me, pre-print servers, open access journals, registered replication reports, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the eleventh day of Open Science, research mavericks gave to me, pre-registration, pre-print servers, open access journals, registered replication reports, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

On the twelfth day of Open Science, research mavericks gave to me, Facebook discussion groups, pre-registration, pre-print servers, open access journals, registered replication reports, a team of second string researchers, replication studies, P-Hacking! The Open Science Framework, A list of QRPs, The Center for Open Science, And a huge study called the RP:P.

My 2016 Open Science Tour

I have been asked to discuss my views on open science and replication, particularly in my field of social psychology, nine times in 2016 (see my “Open Science Tour” dates below). During these talks, and in discussions that followed, people wanted to know what exactly is open science, and how might a researcher go about employing open science practices?

Overall, many similar questions were asked of me from faculty and students so I thought I would create a list of these frequently asked questions. I do not provide a summary of my responses to these questions, instead wanting readers to consider how they would respond. So, how would you answer these questions? (public google doc for posting answers)

  1. Given that many findings are not, and in many cases cannot, be predicted in advance, how can I pre-register my hypotheses?
  2. If my research is not confirmatory, do I need to use open science practices? Isn’t open science only “needed” when very clear hypotheses are being tested?
  3. How can I share data?
    • What data do I “need” to share? (All of it? Raw data? Aggregated data?)
    • What platforms are available for data sharing? (and what is the “best” one?)
    • What format/software should be used?
    • Is this really necessary?
    • How should I present this to my research ethics board?
  4. Can I publicly share materials that are copyrighted?
  5. What is a data analytic plan?
  6. Is it really important to share code/syntax from my analyses?
  7. Can’t researchers simply “game the system”? That is, conduct research first, then pre-register after results are known (PRARKing), and submit for publication?
  8. Can shared data, or even methods/procedures, be treated as unique “citable units”?
  9. If I pilot test a procedure in order to obtain the desired effects, should the “failed” pilot studies be reported?
    • If so won’t this bias the literature by diluting the evidence in favor of the desired/predicted effect obtained in later studies?
  10. How much importance should I place on statistical power?
    • Given that effect sizes are not necessarily knowable in advance, and straightforward procedures are not available for more complex designs, is it reasonable to expect a power analysis for every study/every analysis?
  11. If I use open science practices but others do not, can they benefit more in terms of publishing more papers because of fewer “restrictions” on them?
    • If yes, how is this fair?

Unique question from students:

  1. Could adopting open science practices result in fewer publications?
  2. Might hiring committees be biased against applicants that are pro open science?
  3. If a student wants to engage in open science practices, but his/her advisor is against this, what should this student do?
  4. If a student wants to publish studies with null findings, but his/her advisor is against this, what should this student do?
  5. Will I “need” to start engaging in open science practices soon?
  6. Will it look good, or bad, to have a replication study (studies) on my CV?
  7. What is the web address for the open science framework? How do I get started?

My Open Science tour dates in 2016 (links to slides provided):

  • January 28, Pre-Conference of the Society of Personality and Social Psychology (SPSP), San Diego, USA
  • June 10, Conference of the Canadian Psychological Association, Victoria, Canada
  • October 3, York University (Psychology), Canada (audio recording)
  • October 11, University of Toronto (Psychology), Canada
  • October 19, University of Guelph (Family Relations and Applied Nutrition), Canada
  • October 21, Illinois State University, (Psychology), USA
  • November 11, Victoria University Wellington (Psychology), New Zealand
  • November 24, University of Western Ontario (Clinical Area), Canada
  • December 2, University of Western Ontario (Developmental Area), Canada

Tone Deaf

In my field of psychological science there have been many discussions the past few years on the way an argument is expressed, its tone. A common theme is the general desire for academic discussions to be positive and respectful, and not mean and antagonistic. With the release of Susan Fiske’s commentary on the state of scientific communication (see a detailed discussion of the commentary in the context of other developments in the field the past decade here), the discussion of “tone” has heated up again. This is particularly true for the Facebook discussion group “PsychMap” where the tone of communication is closely monitored.

The following of course is simply my own opinion, and I respect that others disagree with this opinion, but I do not really care that much about the tone of an argument. A person can offer up a positive, or neutral, argument and be full of shit, or not. A person can offer up a negative, sarcastic, even rude argument and be on the mark, or not. If you have sat through a few faculty meetings  you will know exactly what I mean. Personally, I do my best (and sometimes my best is not good enough, to be honest) to focus on the argument being presented and not on how the argument is presented. I can only control (a) how I decide to put forward my own arguments (asshole or angel, or somewhere in between), and (b) how I respond to others’ arguments. In my opinion the tone of argument reflects more on the person delivering the argument than on the target of the argument. I accept that if I choose to deliver my arguments in a manner most of my colleagues would perceive as obnoxious and combative that I may not be taken so seriously by these colleagues for very long. I personally therefore choose to be positive, or at least direct in a fairly neutral manner, with the majority of my arguments (hopefully as reflected in my blog posts the past year and a half, and in my papers on meta-scientific issues). I therefore prefer discussions not to be officially moderated, and to let people own the words they choose to use to present their views. The field of academic psychology is literally a community of highly educated individuals that are smart enough to know the difference between shit and Shinola; we can figure out if an argument, however presented, has substance or not.

And for what it’s worth, it seems to me that the majority of discussions I am privy to in private and on social media are positive and constructive in tone. That is nice.

 

Organize your Data and Code for Sharing from the Start

On September 12, 2016, experimental psychologist Christopher Ferguson created a “go-fund-me” page to raise funds for access to an existing data set that was used to advance scientific arguments in a scientific publication (link here). In Ferguson’s own words: “So I spoke with the Flourishing Families project staff who manage the dataset from which the study was published and which was authored by one of their scholars.  They agreed to send the data file, but require I cover the expenses for the data file preparation ($300/hour, $450 in total; you can see the invoice here).” Ferguson’s request has generated a lot of discussion on social media (this link as well), with many individuals disappointed that data used to support ideas put forward in a scientific publication are only available after a big fee is paid. Others feel a fee is warranted given the amount of effort required to put together the data requested into one file, as well as instructions regarding how to use the data file. And in the words of one commenter, “But I also know people who work with giant longitudinal datasets, and preparing just the codebook for one of those, in a way that will make sense to people outside the research team, can take weeks.” (highlighting added by me).

As someone that has collected data over time from large numbers of romantically involved couples, I agree that it would it take some time to prepare these data sets and codebooks for others to understand. But I think this is a shame really, and is a problem in need of a solution. If it takes me weeks to prepare documentation to explain my dataset organization to outsiders, I am guessing it would take the same amount of time to explain the same dataset organization to my future self (e.g., when running new analyses with an existing data set), or a new graduate student that wants to use the data to test new ideas, not to mention people outside of the lab. This seems highly inefficient for in-lab research activities, and represents the potential loss of valuable data to the field given that others may never have access to my data in the event that (a) I am too busy to spend weeks (or even hours for other data sets) putting everything together for others to make sense of my data, and (b) I die before I put these documents together (I am 43 with a love of red meat, so I could drop dead tomorrow. I think twice before buying green bananas).

So what is my proposed solution? Organize your data and code from the start with the assumption that you will need to share this information (see also “Why scientists must share their research code”). Create a data management plan at the beginning of all your research projects. Consider how the data will be organized, where it will be stored, and where the code for data cleaning/variable generation, analyses, and plots will be stored. Create meta-data (information about your dataset) along the way, updating as needed; consider where to store this meta-data from the beginning. If you follow these steps, your data, meta-data, and code can be available for sharing in a manner understandable to other competent researchers in a matter of minutes, not weeks. Even for complex data sets. Your future self will thank you. Your future graduate students will thank you. Your future colleagues will praise your foresight long after you are dead, as your [organized] data will live on.

Update: see Candice Morey’s post on the same topic.