What if I can’t do Open Science?

Note: This post was written for the March 2016 newsletter of the Australian Psychology Society’s Psychology of Relationships Interest Group (PORIG). I have made a few small changes, and now include the following link to a talk I gave discussing issues similar to this post at the “Navigating the New Era of Social & Personality Psychology” preconference of the 2016 SPSP main conference: https://www.youtube.com/watch?v=QdUtnA8vUn8 (audio is a bit wonky).

This is a question I hear from time to time, particularly from relationship science scholars. Although the bulk of extant relationship research involved data collected from one individual involved in a relationship, typically at one point in time (see Kashy, Campbell & Harris, 2006), a sizeable minority of the field’s work involves data that is much more complex (e.g., longitudinal, self-report and observational dyadic data). Adding to this complexity is the time and expense associated with recruiting dating and/or married couples for these studies, the difficulty of obtaining and coding behavioural/interactive data, and in some cases the expense associated with obtaining particular measures (e.g., hormonal assays of saliva and/or blood). It is often challenging to obtain large samples necessary to increase statistical power and thus reduce the probability of Type II errors, and the false-discovery rate, for these complex studies. And studies of this magnitude rarely set out to test one pre-specified hypothesis; instead, these projects will collect a large amount of data across a number of measures/constructs with the goal of testing many hypotheses to be forged in the future. It is these types of projects (ones in development, those being run presently, as well those already completed that offer a large amount of available data) that are often referred to when researchers inquire, “What if I can’t do open science?”.

What is open science? Briefly, “open science” refers to the public sharing of all aspects of the research process (for more details see: https://en.wikipedia.org/wiki/Open_science; see also https://osf.io/3swkp/). This sharing involves, for example, (a) publicly disclosing study hypotheses prior to actually testing them, (b) making available all of the study materials and procedures (e.g., on the Open Science Framework, https://osf.io/), and (c) publicly disclosing a data analytic plan (i.e., how you plan to test your hypotheses given the measures/procedures of your study). Is it possible to “do” open science when implementing complex study designs as discussed above that are enacted by many relationship scholars?

In my opinion, yes; not only is it possible, it is being done. The complex designs employed by relationship scholars (and others), however, do require unique open science solutions. I, along with my colleagues Timothy Loving and Etienne LeBel, suggested many unique solutions for these complex designs in a paper we published in Personal Relationships in 2014. Instead of reiterating the points we made in that paper, I want to instead briefly share how my research team has engaged in open science practices for different types of research projects in the field of relationship science.

(1) If my research is largely exploratory, how can I publicly disclose hypotheses and data analytic plans?

When research projects are largely exploratory, you can share your study materials as you can with confirmatory research projects. You can also briefly state that your research project is meant to explore possible associations among certain study variables, and the reason(s) for the exploratory nature of the study. You can also provide, if appropriate, a set of guidelines for how you plan to explore the data collected (templates for different types of disclosures, for both confirmatory and exploratory research, can be found here: https://osf.io/m7f8d/).

For example, Kiersten Dobson (a graduate student in my lab), has posted the following information on the OSF (link here: https://osf.io/4xcpy/): description of the study (including planned sample and analytic goals for the exploratory analyses), study materials, and methods. She then posted a copy of the obtained data set, and discussed the follow-up research currently being conducted that followed from the results of the initial exploratory study.

(2) What if the data I am using to test my hypotheses comes from a large dataset that already exists?

In this instance, if the dataset is not your own it may not be possible to publicly post all of the study materials and methods. You can, however, post a document that outlines all of the measures you plan to use from this dataset to test your hypotheses (note that it’s not necessary to include this information in your manuscript; rather, post this information on the OSF and simply link to it in your manuscript). You can also disclose the hypotheses you plan to test and the proposed data analytic plan. If the dataset is your own, you can also post a copy of all study materials.

Early in 2015 a graduate student from the VU Amsterdam (Asuman Buyukcan-Tetik; now a Professor at Sabanci University, Istanbul) visited my lab for three months. We proposed testing new hypotheses by analyzing already existing data—a large dataset collected under the supervision of the PI Catrin Finkenauer. Prior to conducting analyses, Asuman publicly posted, and pre-registered (i.e., the files are “frozen” and cannot be edited or deleted), information about (a) the project and hypotheses, (b) the method, and (c) the strategy of planned analyses. I want to point out that the strategy of analysis contained a few different options that were dependent to some degree on the outcome of the initial analyses planned (i.e., we were only partly certain of what we expected to find; link here: https://osf.io/d7x2p/).

Also in 2014, Rhonda Balzarini, a graduate student in my lab, was given the opportunity to use a large dataset (over 3000 participants) collected by a group of researchers based in Universities throughout the USA (PIs, Bjarne Holmes, Justin Lehmiller, Jennifer Harman, and Nicole Atkins). Rhonda and I met regularly for about two months in the fall of 2014 to discuss our research interests with respect to the dataset and to derive specific hypotheses prior to looking at any of the actual data (i.e., no peeking). Rhonda then publicly disclosed (and pre-registered), prior to analyses, our hypotheses, the methods and measures used in our analyses (link here: https://osf.io/vs574/).

(3) What if in my study I plan to test more than one set of hypotheses? Also, maybe I don’t know what the other hypotheses are yet so how could I possible “pre-register” them?

Big, complex dyadic studies, as mentioned above, rarely set out to test one set of hypotheses. It is entirely possible, however, to make all study materials/procedures publicly available, and to disclose the first set of planned hypotheses along with a data analytic plan. This is the approach taken by Taylor Kohut, a post-doc in my lab, for a large scale longitudinal dyadic study with an experimental intervention (3 conditions) initiated at the midpoint of the study. Instead of trying to explain the study here, I will simply refer you to all of the study information (and I do mean ALL of it) posted on the OSF: https://osf.io/yksxt/ (including study rationale, methods/measures, analytic strategy, and our recruitment plan). It is guaranteed that we will develop new hypotheses in the future, hypotheses that have not yet been considered or discussed. When that time comes we will simply add a new component to the OSF project that discusses our new hypotheses and data analytic plan, and what variables from the original study we plan to use.

(4) What if I can’t make data available? Or code?

There are at least a few concerns with sharing data: (1) other people can use it and benefit from your efforts, and (2) what if participants can identify their own, or their partner’s, sensitive data? With respect to the first concern, the OSF allows users to create a DOI for all files, including datasets, so that if someone does choose to use your data they can properly cite the dataset. Additionally, users on the OSF can license their datasets in seconds, making it legally mandatory for anyone using the dataset to properly cite its use. With respect to the second concern, there are many ways to de-identify data sets, and ways to restrict the use of datasets to other researchers (e.g., post the data on the OSF to a private project page or component, and then grant access to that page or component when asked by other researchers). This is admittedly a big issue that warrants a much longer discussion, and is something we discuss in more detail in a paper in press at the Journal of Personality and Social Psychology (LeBel, Campbell, & Loving, in press; for a pre-accepted draft of this manuscript click here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2616384).

In my lab we are moving toward posting data needed to reproduce the results of any analyses reported in our manuscripts, and we also post the code/syntax that we used to run our models. At this link https://osf.io/ryfse/ you can find data sets and code needed to reproduce the analyses in a paper in press at the Journal of Experimental Social Psychology, and at this link https://osf.io/me7jp/ you can find the data files (available upon request, but they are physically present in a component linked to this OSF page) and code (for SAS) needed to reproduce the analyses presented in this publication: http://www.collabra.org/articles/10.1525/collabra.24/. One benefit of this practice is that we independently run all study analyses to ensure we can reproduce the results reported in the manuscript prior to submitting for peer review.

There are undoubtedly other scenarios not discussed here that require novel open science solutions. My goal was to share a few of the questions I have heard most often, and show some of the answers we have come up with in my lab. Open science can be done with complex study designs within the field of relationship science, because it is being done. I therefore suggest that the question I posed at the outset of this piece be changed from “What if I can’t do open science?” to “How can I do open science with this study?”.

We now have the tools available to move away from the more “closed science” practices that have been typical of our field to this time. Using these tools is of course a choice, but not using them is also a choice. I have chosen to engage in open science practices for my own research going forward. My experiences so far suggest to me that open science practices have not stifled my creativity, limited what I choose to study, limited the exploration of ideas, or otherwise burdened my ability to discover new things (such as they were).

References

Campbell, L., Loving, T. J., & LeBel, E. P. (2014). Enhancing transparency of the research process to increase accuracy of findings: A guide for relationship researchers. Personal Relationships, 21(4), 531-545.

Kashy, D.A., Campbell, L., & Harris, D.W. (2006). Advances in data analytic approaches for relationships research: The broad utility of hierarchical linear modeling. In A. Vangelisti & D. Perlman (Eds.), The Cambridge Handbook of Personal Relationships (pp. 73-90). New York: Cambridge University Press.