Category Archives: Uncategorized

Week 7: Data Management Plans

 

I admit that up until not too long ago I was unfamiliar with the term “data management plan”. In graduate school I was not trained specifically in data management. Rather, collecting, handling, storing, and sharing data were tasks left for me to decide how best to handle, and I usually did so on the fly. The result is that there are data sets I can no longer find (probably on a floppy disk in a box somewhere’ish), and some data sets I can find I can no longer interpret given the absence of annotation or meta-data. I also used different strategies for naming files and study variables than my grad student colleagues working on similar projects in the same lab (i.e., lack of consistency within the lab). Yuck. But academics typically become famous because of the cool positive results they report, not because they are meticulous with curating their data sets for future use. Given that as a field we rarely share our data publically, or even privately when requested, it is likely seen by many a waste of time to think about data management.

That type of thinking, however, is both wrong and problematic. Here are some reasons why:

  • A lot of data is generated every single year by thousands of researchers, and it is huge waste that most of it is hidden away from other researchers as well as the public. Others could put it to good use, but this orphan data is likely never to see the light of day. Imagine a shirt maker manufactures 100 shirts every day, but only sells 5 of them to customers and throws the rest in a pile in a warehouse. In a short amount of time, there are exponentially more shirts in the warehouse than on people’s backs. A lot of time and energy is wasted manufacturing and storing the 95 shirts each day that will never be worn.
  • Researchers use the results of data analyses to persuade others about relations among variables. When persuaded, these results form the building blocks of theories and can also be used to set policy by government agencies. The only way to properly evaluate the claims made by researchers is to have access study materials, procedures, data, and data analytic code (among other things). Without such access, trust is a surrogate for information, and proper evaluation of scientific claims is impossible.
  • Your future self, and future lab members, will benefit immensely from good data management.

That is why in this week of the class we discussed the importance of data management plans. In short, “Research Data Management is the process of organizing, describing, cleaning, enhancing and preserving data for future use by yourself or other researchers.”

I am thankful to the assistance of librarians at Western University for meeting with the class to review a data management template that was put together by the Portage Network. I provide the template below. Ideally, researchers answer each of these questions prior to data collection and then store the document in a repository of their choice (e.g., the Open Science Framework). Writing a data management plan for all projects that collect unique data is an excellent habit for all researchers to develop.

Template Made Available by Portage Network

DMP title

Project Name:

Principal Investigator/Researcher:

Description:

Institution:

Data Collection

What types of data will you collect, create, link to, acquire and/or record?

What file formats will your data be collected in? Will these formats allow for data re-use, sharing and long-term access to the data?

What conventions and procedures will you use to structure, name and version control your files to help you and others better understand how your data are organized?

Documentation and Metadata

What documentation will be needed for the data to be read and interpreted correctly in the future?

How will you make sure that documentation is created or captured consistently throughout your project?

Storage and Backup

What are the anticipated storage requirements for your project, in terms of storage space (in megabytes, gigabytes, terabytes, etc.) and the length of time you will be storing it?

How and where will your data be stored and backed up during your research project?

How will the research team and other collaborators access, modify, and contribute data throughout the project?

Preservation

Where will you deposit your data for long-term preservation and access at the end of your research project?

Indicate how you will ensure your data is preservation ready. Consider preservation-friendly file formats, ensuring file integrity, anonymization and deidentification, inclusion of supporting documentation.

Sharing and Reuse

What data will you be sharing and in what form? (e.g. raw, processed, analyzed, final).

Have you considered what type of end-user license to include with your data?

What steps will be taken to help the research community know that your data exists?

Responsibilities and Resources

Identify who will be responsible for managing this project’s data during and after the project and the major data management tasks for which they will be responsible.

How will responsibilities for managing data activities be handled if substantive changes happen in the personnel overseeing the project’s data, including a change of Principal Investigator?

What resources will you require to implement your data management plan? What do you estimate the overall cost for data management to be?

Ethics and Legal Compliance

If your research project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?

If applicable, what strategies will you undertake to address secondary uses of sensitive data?

How will you manage legal, ethical, and intellectual property issues?

This document was generated by DMP Assistant (https://assistant.portagenetwork.ca)

Additional Resources

Presentations on engaging scientists in good data management

Blog post on data management best practices.

What Policies would I want to Implement if I was a Journal Editor?

If I was asked to consider being an editor of a psychology journal, here is a wish list of policies I would like to implement*:

For all submitted articles

  • Encourage pre-registration of (a) hypotheses, and (b) data analytic plans, and provide guidance to authors on how to pre-register different types of studies (e.g., prior to data collection, following data collection, longitudinal designs)
  • The manuscript needs to include a discussion of sample size rationale, inclusion/exclusion criteria, data collection stopping rules (already implemented at many journals of course)
  • A list of all study materials/scales available to the researcher(s) when conducting analyses for studies presented in the manuscript, and the order of presentation, needs to be available to the reviewers and editor
  • Additionally, all study materials (e.g., copies of questionnaires, coding schemes, stimuli) and procedures presented in the manuscript needs to be available to the reviewers and editor
  • The data analytic code needed to reproduce the results in the manuscript needs to be available to reviewers and the editor
  • The data, and meta-data, needed to reproduce the results presented in the manuscript needs to be available to reviewers and the editor
  • Prior to the manuscript being sent out for review, the data analytic code provided will be run using the data provided to reproduce the results presented in the manuscript. If major discrepancies emerge that make the reported results difficult or impossible to interpret, the manuscript will be returned to the authors. Otherwise, a report of the outcome of this process will be provided to the reviewers, and ultimately to the author(s).
  • All reviews and editorial correspondence will be made publicly available, but reviewers will not be required to sign their reviews

For all accepted articles

  • The list of all study materials/scales available to the researcher(s) when conducting analyses for studies presented in the manuscript, and the order of presentation, needs to be made publicly available prior to acceptance
  • All study materials and procedures presented in the manuscript must be made publicly available prior to acceptance (copyrighted material excluded)
  • All data analytic code required to reproduce the results in the manuscript needs to be made publicly available prior to acceptance
  • The data and meta-data needed to reproduce the results in the manuscript need to be placed on a third party server. Author(s) can petition the editor for the data to not be made publicly available, but available upon request. A policy would need to be established to allow the editor to approve requests for data when the corresponding author is not available (e.g., retired, not responding to email, or has passed away).
  • Prior to acceptance, a pre-print of the manuscript needs to be made publicly available (if one has not yet been made available)

Other offerings

  • Have a registered reports option
  • Strongly encourage direct replications of presented results within each submitted manuscript when feasible (e.g., via cross validation, or other independent sample)
  • Adopt the “pottery barn rule”—replications of studies that were previously published in the journal will be considered for publication

I have likely missed some important policies to promote and encourage being more open and transparent during the research process, but this seems like a good start.

* I realize this will likely mean I will not be asked to consider being a journal editor anytime in the near future

Week 6: Sharing Materials, Procedures, and Data Analytic Plans

In this class we discussed the importance of sharing study materials, procedures, and as many decisions as possible regarding planned (and exploratory) analyses. Sharing materials and detailed procedures allows other researchers to reproduce the study in their own labs without needing to contact the original researcher(s). This is important because the original researcher(s) may no longer have access to old study materials (e.g., lost, changed jobs, old files thrown out, has left academics and did not bring materials/files with him/her), and eventually all researchers will die. Publicly sharing research materials and procedures helps ensure that your science does not die with you.

A few years ago my lab decided to conduct a close replication of an important study in the field of relationship science—study 3 of Murray et al. (2002). In this study, both partners of 67 heterosexual relationships participated in a laboratory experiment to see if individuals relatively high or low in self-esteem responded differently in the face of a relationship threat. Partners were seated in the same room, but with their backs to each other, to answer a number of paper and pencil questionnaires. The researchers manipulated relationship threat in half of the couples by leading one partner to believe that his or her partner perceived there to be many problems in the relationship (a very clever manipulation to be honest). The predicted two-way interaction between self-reported self-esteem and experimental condition emerged for some of the self-reported dependent variables, showing that in the “relationship threat” condition individuals with low self-esteem minimized the importance of the relationship whereas those with high self-esteem affirmed their love for their partners. After reading the paper closely to assemble the materials and scales needed to reproduce the study procedures, we realized that many of these scales were created by the original researchers and that we would need to ask the corresponding author (Dr. Sandra Murray) for copies or we could not run the planned replication study. Dr. Murray responded positively to our request, and luckily had copies of the study materials that she forwarded to us. We were able to run our replication study, and a pre-print of the article containing all of the study details and links to OSF (for materials, data, code) can be found here.

The moral of this story is that if we (or others of course) did not attempt to replicate this study, and if Dr. Murray no longer had access to these study materials, it simply would not have been possible for anyone else to closely reproduce the full set of procedures for this study going forward. It seems that this is the case for the majority of all published research in the field of psychology—the reported results remain in print, but the study materials and lab notes discussing how to, for example, properly employ manipulations are lost to future generations of researchers. But it does not have to be this way anymore.

The New Normal

Ideally, for every research project the researcher(s) should publicly share the following on a site such as the Open Science Framework (OSF) (go here to learn more about how to use the OSF to start sharing with the research community):

  • A list of all measures, with citations and web links where available, used in the study. List in order of presentation to participants, or discuss procedures used to randomize the order of presentation
  • A copy of all measures used in the study, except for copyrighted material. This is particularly important for measures created in-house given that the items are not readily available to others
  • Provide instructions for how to score all scales used in the study (e.g., indicate items to be reverse coded, to create scores for each participants calculate an average of all items on the scale, and so on)
  • Detailed description of any manipulations. Did you use single or double blinding?
  • Detailed description of the interactions between researchers and participants in lab settings
  • Description of the study design
  • Consider creating a flow-chart of the study design, discussing what happened at each stage of the study from beginning to end
  • Post pictures of the study setting (e.g., lab rooms) where appropriate
  • Consider creating a methods video (i.e., record a typical run of the experimental protocol with mock participants)

Sharing data analytic plans created prior to conducting analyses is also important to help clarify the difference between what was expected up front and what was assessed after getting started with the planned analyses. This plan should be included in the study pre-registration (see here for more information on pre-registration).

Here are some things to consider including in a data analytic plan:

  • If not included elsewhere, discuss the stopping rule to be used to terminate data collection (e.g., after recruiting a minimum of x number of participants data collection will cease)
  • Indicate rules for removing participants from the data set (e.g., failed attention checks, scores that are considered outliers based on some pre-determined criteria, participants that did not meet pre-determined inclusion and/or exclusion criteria)
  • Consider what types of descriptive data are important to present for your study (e.g., correlations, means, and standard deviations of study variables, frequency of responses on key variables)

Confirmatory Analyses

  • Define your alpha level (e.g., .05, .01, .005)
  • Do you plan to use any type of correction for your alpha level given the number of planned tests?
  • Given your hypotheses and the methods used to test your hypotheses, what types of statistical analyses are appropriate to use? Discuss what tests you plan to use, what variables will be used as predictor and outcome variables, and the critical effect(s) for each planned model.
  • What options will you consider if your planned models violate assumptions of that particular test?
  • How do you intend to conduct simple effects analyses (if applicable)?
  • Where appropriate, consider providing a figure of expected outcomes for each planned model
  • What type of effect size will you be reporting?
  • Will you present 95% confidence intervals of effects? Or some other measure of sensitivity?

 Exploratory Analyses

  • Can include many of the same elements relevant for confirmatory analyses (e.g., define your alpha, consideration of assumptions of tests to be used)
  • Provide a guiding framework for how exploratory analyses will be approached (an example of this can be found here)

Remember, sharing is caring.

Reference:

Murray, S. L., Rose, P., Bellavia, G., Holmes, J., & Kusche, A. (2002). When rejection stings: How self-esteem constrains relationship- enhancement processes. Journal of Personality and Social Psychology, 83, 556–573.

Week 5: Power and Study Design

Graduate Class: Open and Reproducible Science (syllabus)

This week, we discussed statistical power (the likelihood of finding an effect when it in fact exists) and study design. When attempting to calibrate power for a given study a lot of choices need to be made. For example, the smallest effect size of interest (SESOI), researcher resources, study design, and number of participants to recruit. There are a lot of wonderful resources available to help researchers understand all of this a lot better, and we reviewed a subset of these resources in this class.

In this space I want to focus more specifically on one of the in-class demonstrations regarding low power and over estimating effect sizes. Using the p-hacker shiny app, we simulated data for an experiment with two conditions containing 20 participants each, across 10 dependent variables (seed set at 108). We assumed that a true effect exists in the population, but that this effect is very small: d = .05. Using the “pwr” package in R , the statistical power of this study is estimated to be .052, meaning there is only a 5.2% chance of obtaining a statically significant effect. Actually running this study with such low power would be a waste of time and resources of course, but luckily simulating such a study takes very little time. We did need to run a few “studies” to find an effect, but the study using seed 108 was a winner. Here are our study “results”:


Excellent, we obtained 2 significant effects! With a bit of p-hacking we could likely turn the p value for the combined DV from .064 (non-significant) to less than .05 (considered significant)*. But for now let’s focus on the results for DV2. The p value for the F test = .023–a respectable degree of significance one might say. The effect size of this test statistic (obtained using an online effect size calculator) is d = .751 (95% CI: .109 – 1.39). We have therefore overestimated the true effect size by a massive degree (d = .70). Other simulations with larger true effect sizes and sample sizes continued to demonstrate that the true effect size was overestimated, usually by non-trivial amounts. The students were surprised by all of this.

A goal of this class was to demonstrate to students the decisions that need to be taken into account when seriously considering statistical power during the study design phase of the research. As part of this goal, I did my best to show them that they *need* to seriously consider statistical power when designing their studies. A goal of the entire course is to convince the students of the importance of openly sharing their decision making process, at all phases of the research process.

*indeed, by removing the highest value in the control condition the p-value for this interaction moves to .026.

Week 4: All About Pre-Registration

Presently there is a large degree of variability regarding the understanding and application of pre-registration in psychological science. From what I have seen on social media, read in papers and other scholarly work, and from reading actual pre-registrations from different labs, there is no agreed upon definition of pre-registration or guiding principles for when and how to implement a pre-registration. This is perhaps to be expected at such an early stage of adoption by academics not used to publicly sharing their ideas prior to testing them, with a non-trivial number of academics remaining skeptical of the practice. The goal of this week’s class was to introduce the students to a definition of pre-registration, to discuss some common “yes, but…” reasons for not pre-registering hypotheses, methods and data analytic plans, and to share some resources for how to implement a pre-registration.

Here is my working definition of pre-registration: Stating as clearly and specifically as possible what you plan to do, and how, before doing it, in a manner that is verifiable by others.

If you have an idea that you would like to test with new data or existing data, you can share your idea and plans for testing it before doing so. The alternative is not sharing this information before testing your idea, meaning it either (a) gets shared to some degree in a manuscript that is written after testing your idea, or (b) not shared because you chose not to write a manuscript that is written after testing your idea. In my opinion sharing before versus (maybe) after testing your idea is the better option. I therefore suggest that academics pre-register all of their planned research pursuits. In this post I will attempt to explain why.

There is no one correct way to implement a pre-registration, and a pre-registration itself is no guarantee that your hypotheses, methods, and/or data analytic approach were sound. Stating in a manuscript that your idea was pre-registered also does not imply the degree of specificity of your hypothesis, or that you followed your pre-registered protocol as specified. Importantly, however, these things are now verifiable by reading the pre-registration materials. It is worth taking the time to learn how to best communicate your intentions in advance of a given research pursuit, perhaps seeking feedback from other experts during this process, with the assumption that consumers of your research will take the time to read your pre-registration materials.

Common “Yes, but…” Arguments Against Pre-Registration

Here are four common “yes, but…” arguments I hear regarding why a given researcher cannot implement pre-registrations for his or her research:

  • It only applies when you have specific, confirmatory hypotheses. “My work is often theoretically guided but I do not always test specific, confirmatory hypotheses from the outset.”
  • It is simply not feasible or practical for complex study designs (e.g., longitudinal designs, large scale observational studies).
  • The data are already collected so (a) “I have nothing to pre-register”, and/or (b) “I have already analyzed some of the data so I can’t pre-register.”
  • It puts limits on what can be done with the data. “I may have some hypotheses and plans to test them, but as a scientist I need to go to where the data takes me and therefore do not want to be limited to only the analyses I could think of in advance. Pre-registration can stifle creativity and even scientific discovery.”

The short answer to each of these arguments is: nope. According to the definition of pre-registration I put forward, it is always to possible to state what you plan to do before you do it, as long as you are open and transparent about the state of the research pursuit in question. Here are some longer answers to these four arguments:

  • Pre-registration is not only for purely confirmatory research. It applies equally well for research that is largely exploratory, or somewhere in between exploratory and confirmatory. If, for example, you plan to collect self-report personality data from a large group of individuals and follow them over time to observe variability in different personality traits but you are not sure what that variability should look like, you can say that in a pre-registration. If you are not sure if the association between some theoretical constructs you assessed in your study should resemble patterns of mediation or moderation and want to test both, you can say that in a pre-registration. If you want to collect responses on many scales that may/may not be correlated with each other from a sample of students in an effort to select some of these scales for use in another study, you can say that in a pre-registration. Here is the pattern that is unfolding: after saying what you would like to do with your data collection and/or data analysis simply add “I can say that in a pre-registration”. Pre-registering vague ideas or exploratory research also helps prevent the researcher from using the words “As predicted…” in future publications using results from this research.
  • It is not necessary for a pre-registration to include every single possible hypothesis and accompanying data analytic plan for a given data set. If you first plan to analyze a subset of the data from, for example, a large sample of married couples assessed over two years (with data collected at 8 time points), you can pre-register those plans. If you decide to analyze a different set of ideas with different data collected from this sample, you can pre-register that at another time.
  • Data may already be collected, but it is still possible to state in advance your idea and how you plan to use existing data to test this idea. Be upfront with your prior experiences with this data set and how your new ideas were generated.
  • Pre-registration does not put limits on what you can do, but rather helps distinguish between analyses that were planned in advance from those that were conducted post-hoc (or between more confirmatory and exploratory analyses). Ideas and data analytic decisions that are made because of experiences working with the data (ideas and decisions you did not have prior to working with the data) are exploratory and should be labeled as such. Of course follow-up analyses are often needed and can lead to new, perhaps unexpected, patterns of results (that will need to be replicated at some point with independent data to properly test these hypotheses).

At this point in my conversations with those skeptical of pre-registration, they often say “Ok so I can pre-register my ideas, but it will not fix ALL the problems!” Agreed. The goal of pre-registration, though, is not to fix all the problems with the academic research process. It can help solve the problem of Hypothesizing After Results are Known (HARKing), or stating in a manuscript that is written after all of the data analyses have been completed that the hypotheses put forward in that manuscript were crafted exactly as specified prior to collecting data and/or data analysis. Solving that problem would be a huge achievement.

The skeptic at this point often suggests that researchers could simply game the system by pre-registering their ideas and data analytic plans after looking at their data! If so, they benefit at the expense of honest, hard working scientists. Yes, researchers could do that. But if they did, they would be committing fraud, not very different from the likes of former successful scientists in our field that faked their data and subsequently lost their jobs. If we assume that outright fraud is rare in our field now, I think we can further assume that it will remain rare with respect to pre-registration fraud.

For the converted skeptic, this is where we discuss what tools are available to implement pre-registrations, and what information should be included in a pre-registration. I will save that discussion for another day, but there are some very useful resources available to assist with putting together useful and informative pre-registrations for all your research needs, such as:

 

Week 3: Open Notebook

The focus of discussion this week was on elements of the open science process to consider at the beginning of the research process. We started with a brief discussion of how the benefits of open and high powered research outweigh costs (Lebel, Campbell, & Loving, 2017). I then provided a tutorial on how to use the Open Science Framework (OSF) to manage the research workflow. Here is a link to a useful webinar. When I talk to people that are not familiar with using the OSF they are confused regarding the differences between “projects” and “registrations”, and think the OSF is primarily used to “pre-register” study hypotheses. I share my belief that whereas the OSF is useful for pre-registrations, it is even more useful as a free web platform for managing the research process with collaborators. In fact, a researcher could choose to keep all project pages private (i.e., not share with the public) and never pre-register anything at all, but use the OSF to efficiently manage his or her research workflow. That seems unlikely to happen, but the point is that projects on the OSF are dynamic with a great deal of functionality over the course of the research process. In our lab we use the OSF to store research materials (scales, procedures, data files, code, ethics applications, and so on), but also more and more to document communications between collaborators during the research process. And that is where open notebook comes in.

Open notebook is “…the practice of making the entire primary record of a research project publically available online as it is recorded.” At first blush this does not seem all that different from basic “open science” principles whereby the researcher publicly shares research materials, specifies hypotheses in advance, outlines a data analytic plan, and then later on shares data, meta-data and code (all topics to be covered in future classes). But the concept of open notebook also includes sharing the laboratory notebook, something that often contains communications between collaborators, as well as “dear diary” types of entries, that shed light on the decision making process throughout the research process. An open notebook can take on many different forms, and there are some excellent examples from the medical bio-sciences here, here, and here. These three examples take the form of dedicated web pages with regular updates, and I admire their commitment to “extreme open science”.

Rather than create a dedicated website for our lab notebooks, I wanted to develop an approach that uses what our lab already uses on a daily basis—the OSF. That is where all of our research materials are stored for our projects already, but what has been missing to date is the nature of the communications between colleagues during the research process that results in particular decisions being made. And decisions need to be made regularly as new issues arise that were not considered earlier.

Along with current graduate student Nicolyn Charlot, we are trying out the following open notebook approach. First, rather than using email for basic communications for a current project (this one), we decided to make use of the “comments” function that is available for every project page (the “word bubble” icon located at the top right of the screen). That way our messages are documented over time as they occur and are embedded with all of our research materials. With notifications for comments set to “on” (under the settings tab), we receive emails when a new comment has been added. Second, because many of our decisions are made during lab meetings and not via email, we decided to briefly document the decision making process following each lab meeting in a shared google doc file that is linked to a component nested within the main project page titled “Open Notebook”. Here is the open notebook for our project. It is within this component where we communicate using the comments function (click on it and see what we have so far), and where we keep the shared file. Our goal will be to have an “Open Notebook” component for all new projects going forward as a way to document the decision making process, and to add a more personal element to the project pages beyond simply being a repository of files.

I am curious to see how it works out.

Week 2: Why Should Science be Open and Reproducible?

In one word: evaluation.

The motto of the Royal Society is Nullius in verba, or “Take nobody’s word for it”. When evaluating scientific claims, this means focusing on the appropriateness of the research process (e.g., study design, analytic design, interpretation of outcomes, and so on) rather than accepting the claims based on the authority of the individual(s) proposing them. To evaluate the reported outcomes of a study, it is therefore critical to have access to details of the research process that produced those outcomes.

For many years now, these details have typically been shared in the print pages of academic journals. Academic journals typically have page budgets, or a contractually agreed upon number of print pages that can appear in a one year period. Longer individual papers therefore translate into fewer papers published in a given journal each year, meaning that Editors have often asked authors to shorten their papers during the review process. Given the complexity of many experimental designs, and the volume of information collected in many large scale observational studies, it is not feasible to include all methodological details or results within each paper. Authors have typically provided as much detail about how the study was conducted they felt necessary to interpret the results presented, with the promise that omitted details were available upon request if so desired.

But are these details really available upon request? There are many reasons to believe that at least some details would not be available upon request. For example, individuals may change careers and no longer have access to these materials; over time we all die and are not able to provide these details; issues with storage (of both physical and digital copies) can arise that render the details no longer accessible; people are busy and may not have the time to search for these details when asked. And the list goes on. As suggested by the readings for this week, it turns out that the majority of requested research details are not in fact available upon request! This includes access to data and analytic code, things that are important to be available given the non-trivial amount of statistical reporting errors in the literature. Nullius in verba implies that we should not take the word of researchers that say “available upon request”.

The paper by Simmons, Nelson, and Simonsohn (2011) in Psychological Science, as well as the interesting information presented on www.flexiblemeasures.com, also suggest that evaluation of scientific claims requires knowing what research decisions (e.g., hypotheses, use of measures, data analytic plans) were made prior to collecting data and/or data analyses versus after. For example, regarding the use of the Competitive Reaction Time Task (CRTT) in research assessing aggressive behavior, it has been shown that responses to this task have been quantified for data analysis in 156 different ways within 130 academic publications (http://www.flexiblemeasures.com/crtt/)!  It seems likely that some of the decisions regarding how to quantify the CRTT within these publications were made during the process of data analysis (i.e., using a data-driven approach to determine a quantification strategy that yielded “significant” results), but it is impossible to determine which ones in the absence of pre-registration of data analytic strategies. Simmons et al. (2011) also demonstrate how the application of different Questionable Research Practices (QRPs) (e.g., checking for significance, collect more data, check again; trying out a few different DVs; adding covariates and/or moderators) can inflate the Type I error rate well beyond the 5% level typically adopted by researchers using a frequentist data analytic approach. We need to know when decisions were made during the research process, and why, in order to properly evaluate the reported results of the research.

Current technology allows for sharing important details of the research process as the research is being considered, conducted, analyzed, and written up into a manuscript. We are no longer limited by how many print pages a traditional journal is budgeted to print each year and then mail to subscribers. We are also not limited with respect to when we share details of the research process—we no longer have to wait to publish everything we can squeeze into a manuscript.

This is an exciting time to be a scientist, but of course it is scary for some scientists to consider all of these changes occurring in a fairly short period of time. In my opinion, greater openness and transparency of the research process is the future of science. There is so much that is being done, and that can be done, to ensure our research is open and reproducible. Next week we start at the beginning of the research process.

Open and Reproducible Science—Introduction to my new Graduate Course

In 2014 my lab began the transition to using open and reproducible research practices (I wrote a blog post about it). Almost a year later, and after a steep learning curve, I realized I needed to organize my open science. There was a lot of discussion within the field of psychology at that time on the idea of open science, but few suggestions on how to actually implement different open science practices throughout the research workflow for different types of research projects. Along with Etienne LeBel and Tim Loving we thought it through, published a paper in December 2014 with some specific recommendations (or so they seemed like it at the time), and then our lab made it up as we went along. To my pleasant surprise I was asked to give a workshop on “doing open science” in November 2015 at the University of Toronto, Mississauga. I really enjoyed talking to faculty and students about this topic, and I was honored to be asked to give similar workshops/talks at many different places during the next two years. Overall, I have now given 14 presentations on open and reproducible science in Canada, the USA, New Zealand, and Turkey (at the bottom of this post there is a list of these talks, including links to slides and recordings where available). I am also very happy to see that in the 3 or so years since publishing our open science recommendations, many journals in the field of psychology are changing their editorial policies to align with open science practices.

As I developed and tweaked my slides over this two year period, I learned a lot more about (a) what is being done in different fields to enhance research transparency and reproducibility, and (b) what can be done with existing technology. With all of this information, I decided to create a new graduate course called “Open and Reproducible Science” so I could share with trainees how they can begin their research career in a way that makes their future publications more open to evaluation and more likely to be reproducible (in many ways) by others (something I suggested was lacking in our graduate training programs here). I put together a syllabus and solicited feedback via Twitter. I received many helpful suggestions, as well as two offers for guest lectures—one by Seth Green from CodeOcean.com, and on from Joanne Paterson, a librarian at Western University. Click here to see what I ended up putting together for this inaugural course. I am excited that 16 grad students from different areas of my psychology department enrolled for this course, beginning January 11, 2018.

My goal is to write expanded lecture notes in a blog post for each week of the class. In these posts I will discuss my planned talking points for each class, as well as flesh out specific examples of how one might use different open science practices throughout the research workflow. Ok, now I need to go re-read the assigned article for the first class: Munafo et al’s (2017) “A manifesto for reproducible science”.

Invited Talks on Open Science and Replication

2015

  • November 3, Workshop presented at the University of Toronto, Mississauga (Psychology), Canada

2016

  • January 28, Pre-Conference of the Society of Personality and Social Psychology (SPSP), San Diego, USA
  • June 10, Conference of the Canadian Psychological Association, Victoria, Canada
  • October 3, York University (Psychology), Canada (audio recording)
  • October 11, University of Toronto (Psychology), Canada
  • October 19, University of Guelph (Family Relations and Applied Nutrition), Canada
  • October 21, Illinois State University, (Psychology), USA
  • November 11, Victoria University Wellington (Psychology), New Zealand
  • November 24, University of Western Ontario (Clinical Area), Canada
  • December 2, University of Western Ontario (Developmental Area), Canada

2017

  • January 19, Workshop presented at Sabanci University, Istanbul, Turkey (with thanks to a Travel Grant awarded to Asuman Buyuckan-Tetik and me from the European Association of Social Psychology)
  • March 10, Western Research Forum Panel Discussion on Open Access: “What’s in it for me?”, London, Canada
  • May 25, Workshop presented at the conference of the Association for Psychological Science (APS), Boston, USA
  • November 10, Plenary address, conference of the Society for the Scientific Study of Sexuality (SSSS), Atlanta, USA

Pre-Registered Publications From Our Lab

Updated

Below is a list of now published studies (as of October 20, 2017) that had pre-registered (a) hypotheses, (b) procedure and materials, and (in most cases) (c) a data analytic plan. We have more original empirical studies under review and in preparation. When I compiled this list I found it interesting that of the six original empirical pre-registered publications, three are in open access journals. We also are currently collecting data for a registered report involving videotaping lab based interactions between romantically involved partners. Our lab has also been active in conducting, and publishing, replication studies. We have published seven replication studies to date, including one Registered Replication Report including data from 16 independent labs. Four of these publications resulted from the group project in my graduate research methods course. Another publication (“Self-esteem, relationship threat…”) was conducted by all members of the lab and involved running over 200 romantically involved couples through a lab based manipulation, one couple at a time; it took one year to collect the data. Upon publication of this paper, however, we did receive our $1000 pre-registration challenge prize money and had a wonderful “lab night out”.

In my view the practice of pre-registration has been helpful in many ways, such as (a) helping clarify what we truly expect to emerge and what we simply think might happen, (b) making us ask ourselves why we are including each measure (why is it relevant/important?), (c) allowing us to develop our data analytic code while data is being collected because we already thought out our data analytic plan, and (d) making it easier to write the manuscript when we are finished given that we have already largely written the methods section, as well as written the rationale for our hypotheses and data analytic plan. I will let others judge if this practice has stifled our creativity (but look at this study, not yet published, before making your final judgement: https://osf.io/yksxt/).

Note: links to the OSF project pages are located on the journal name.

Original Empirical Studies

Dobson, K., Campbell, L., & Stanton, S.C.E. (in press). Are you coming on to me? Bias and accuracy in couples’ perceptions of sexual advances. Journal of Social and Personal Relationships.

Kohut, T., Balzarini, R.N., Fisher, W.A., & Campbell, L. (in press). Pornography’s associations with open sexual communication and relationship closeness vary as a function of dyadic patterns of pornography use within heterosexual relationships. Journal of Social and Personal Relationships.

Balzarini, R.N., Campbell, L., Kohut, T., Holmes, B.M., Lemiller, J.J., Harman, J.J., & Atkins, N. (2017). Perceptions of primary and secondary relationships in polyamory. PLoS ONE 12(5): e0177841. https://doi.org/10.1371/journal.pone.0177841.

Buyukcan-Tetik, A., Campbell, L., Finkenauer, C., Karremans, J.C., & Kappen, G. (2017). Ideal standards, acceptance, and relationship satisfaction: Latitudes of differential effects. Frontiers in Psychology, doi: 10.3389/fpsyg.2017.01691.

Campbell, L., Chin, K., & Stanton, S.C.E. (2016). Initial evidence that individuals form new relationships with partners that more closely match their ideal preferences. Collabra, 2(1), p.2. DOI: http://doi.org/10.1525/collabra.24

Stanton, S.C.E., & Campbell, L. (2016). Attachment avoidance and amends-making: A case advocating the need for attempting to replicate one’s own work. Journal of Experimental Social Psychology, 67, 43-49.

In Principle Agreement (Registered Report)

Hahn, C., Campbell, L., Pink, J.C., & Stanton, S.C.E. (In principle agreement). The role of adult attachment orientation in information-seeking strategies employed by romantic partners. Comprehensive Results in Social Psychology.

Replication Studies

Babcock, S., Li, Y., Sinclair, V., Thomson, C., & Campbell, L. (2017). Two replications of an investigation on empathy and utilitarian judgment across socioeconomic status. Scientific Data 4, Article number: 160129, doi: 10.1038/sdata.2016.129

Balakrishnan, A., Palma, P.A., Patenaude, J., & Campbell, L. (2017). A 4-study replication of the moderating effects of greed on socioeconomic status and unethical behaviour. Scientific Data 4, Article number: 160120, doi: 10.1038/sdata.2016.120

Balzarini, R.N., Dobson, K., Chin, K., & Campbell, L. (2017). Does exposure to erotica reduce attraction and love for romantic partners in men? Independent replications of Kenrick, Gutierres, and Goldberg (1989) study 2. Journal of Experimental Social Psychology, 70, 191-197.

Campbell, L., Balzarini, R.N., Kohut, T., Dobson, K., Hahn, C.M., Moroz, S.E., & Stanton, S.C.E. (2017). Self-esteem, relationship threat, and dependency regulation: Independent replication of Murray, Rose, Bellavia, Holmes, and Kusche (2002) Study 3. Journal of Research in Personality. https://doi.org/10.1016/jrp.2017.04.001.

Cheung, I., Campbell, L., & LeBel, E.P., …Yong, J.C. (2016). Registered replication report: Study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002). Perspectives on Psychological Science, 11, 750-764.

Clerke, A. S., Brown, M., Forchuk, C., & Campbell, L. (2018). Association between Social Class, Greed, and Unethical Behaviour: A Replication Study. Collabra: Psychology, 4(1), 35.

Connors, S., Khamitov, M., Moroz, S., Campbell, L., & Henderson, C. (2016). Time, money, and happiness: Does putting a price on time affect our ability to smell the roses? Journal of Experimental Social Psychology, 67, 60-64.

LeBel, E.P., & Campbell, L. (2013). Heightened sensitivity to temperature cues in highly anxious individuals: Real or elusive phenomenon? Psychological Science, 24, 2128-2130.

A Commitment to Better Research Practices (BRPs) in Psychological Science

Scientific research is an attempt to identify a working truth about the world that is as independent of ideology as possible.  As we appear to be entering a time of heightened skepticism about the value of scientific information, we feel it is important to emphasize and foster research practices that enhance the integrity of scientific data and thus scientific information. We have therefore created a list of better research practices that we believe, if followed, would enhance the reproducibility and reliability of psychological science. The proposed methodological practices are applicable for exploratory or confirmatory research, and for observational or experimental methods.

  1. If testing a specific hypothesis, pre-register your research[1], so others can know that the forthcoming tests are informative. Report the planned analyses as confirmatory, and report any other analyses or any deviations from the planned analyses as exploratory.
  2. If conducting exploratory research, present it as exploratory. Then, document the research by posting materials, such as measures, procedures, and analytical code so future researchers can benefit from them. Also, make research expectations and plans in advance of analyses—little, if any, research is truly exploratory. State the goals and parameters of your study as clearly as possible before beginning data analysis.
  3. Consider data sharing options prior to data collection (e.g., complete a data management plan; include necessary language in the consent form), and make data and associated meta-data needed to reproduce results available to others, preferably in a trusted and stable repository. Note that this does not imply full public disclosure of all data. If there are reasons why data can’t be made available (e.g., containing clinically sensitive information), clarify that up-front and delineate the path available for others to acquire your data in order to reproduce your analyses.
  4. If some form of hypothesis testing is being used or an attempt is being made to accurately estimate an effect size, use power analysis to plan research before conducting it so that it is maximally informative.
  5. To the best of your ability maximize the power of your research to reach the power necessary to test the smallest effect size you are interested in testing (e.g., increase sample size, use within-subjects designs, use better, more precise measures, use stronger manipulations, etc.). Also, in order to increase the power of your research, consider collaborating with other labs, for example via StudySwap (https://osf.io/view/studyswap/). Be open to sharing existing data with other labs in order to pool data for a more robust study.
  6. If you find a result that you believe to be informative, make sure the result is robust. For smaller lab studies this means directly replicating your own work or, even better, having another lab replicate your finding, again via something like StudySwap.  For larger studies, this may mean finding highly similar data, archival or otherwise, to replicate results. When other large studies are known in advance, seek to pool data before analysis. If the samples are large enough, consider employing cross-validation techniques, such as splitting samples into random halves, to confirm results. For unique studies, checking robustness may mean testing multiple alternative models and/or statistical controls to see if the effect is robust to multiple alternative hypotheses, confounds, and analytical approaches.
  7. Avoid performing conceptual replications of your own research in the absence of evidence that the original result is robust and/or without pre-registering the study. A pre-registered direct replication is the best evidence that an original result is robust.
  8. Once some level of evidence has been achieved that the effect is robust (e.g., a successful direct replication), by all means do conceptual replications, as conceptual replications can provide important evidence for the generalizability of a finding and the robustness of a theory.
  9. To the extent possible, report null findings. In science, null news from reasonably powered studies is informative news.
  10. To the extent possible, report small effects. Given the uncertainty about the robustness of results across psychological science, we do not have a clear understanding of when effect sizes are “too small” to matter. As many effects previously thought to be large are small, be open to finding evidence of effects of many sizes, particularly under conditions of large N and sound measurement.
  11. When others are interested in replicating your work be cooperative if they ask for input. Of course, one of the benefits of pre-registration is that there may be less of a need to interact with those interested in replicating your work.
  12. If researchers fail to replicate your work continue to be cooperative. Even in an ideal world where all studies are appropriately powered, there will still be failures to replicate because of sampling variance alone. If the failed replication was done well and had high power to detect the effect, at least consider the possibility that your original result could be a false positive. Given this inevitability, and the possibility of true moderators of an effect, aspire to work with researchers who fail to find your effect so as to provide more data and information to the larger scientific community that is heavily invested in knowing what is true or not about your findings.

We should note that these proposed practices are complementary to other statements of commitment, such as the commitment to research transparency (http://www.researchtransparency.org/). We would also note that the proposed practices are aspirational.  Ideally, our field will adopt many, of not all of these practices.  But, we also understand that change is difficult and takes time.  In the interim, it would be ideal to reward any movement toward better research practices.

Brent W. Roberts, Rolf A. Zwaan, Lorne Campbell

[1] van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. doi:10.1016/j.jesp.2016.03.004