The 5Ws of Preregistration

This post is a long awaited follow up to my February 2018 post titled “All About Pre-Registration”. I began that post with the following: “Presently there is a large degree of variability regarding the understanding and application of pre-registration in psychological science.” Six years later, there STILL is not consensus about many aspects of preregistration (yes, I am dropping the “-“). During this time I have personally found it a little odd how there has been so many opinions expressed in published papers and particularly social media about preregistration yet the people making these comments likely share divergent views *of* preregistration. Some of the arguments against preregistration that make me shake my head the most are that “it is not a panacea” (so the bar for introducing practices to potentially improve our science is that it has to be a solution for ALL problems in our field?), “it stifles creativity and exploration” (well, maybe for you…), “I cannot be expected to know all my hypotheses with my big study that has longitudinal components” (totally get it, but when I read your papers you often say “as expected” or “as predicted”, so…), “preregistration is only for purely confirmatory research” (ok, but what is your definition of preregistration because it is not one I am familiar with), and “people deviate from them so preregistration does not work” (work for what exactly? And let’s forget for a moment that the only way to obtain information regarding deviations between research intentions as stated in a preregistration and actions as written about in a published article are because the preregistration exists). But I also hear arguments from proponents of preregistration say things like “preregistration is not needed for that type of research” (for now of course preregistration is not something that seems to be “needed” to publish research papers in general in that it is not universally mandatory, but this statement implies there is only value for preregistering certain kinds of research activities as well as certain types of information regarding these research activities).

Given the lack of general agreement about the nature and goals of preregistration, I have put together in this post answers to the 5Ws of preregistration: Who? What? Where? When? and Why? The answers are obviously not agreed upon by everyone (see the first paragraph of this post), and they are based on my own experiences with preregistration and other open science practices the past ten years. That includes working with my own lab to generate preregistrations for our research that ranges from primarily confirmatory to primarily exploratory (and everything in between), discussing the practice with colleagues when invited to present on the topic at conferences as well as invited departmental talks, reading social media posts way to often, and publishing papers on the topic (e.g., Campbell, Loving, LeBel, 2014; LeBel, E.P., Campbell, L., & Loving, T.J., 2017; Moshontz, H., Campbell, L., … Chartier, C.R., 2018; Nosek, B.A., Beck, E., Campbell, L., Flake, J.K., Harwicke, T.E., Mellor, D.T., van’t Veer, A.E., & Vazire, S., 2019).

To begin, I put together a brief cheat sheet of the five Ws of preregistration and some brief answers below:

The 5Ws of Preregistration:

Who? – Any individual or group of researchers that evaluate ideas via the collection and analysis of data.

What? – Preregistration is both a concept as well a concrete action, or actions. A working definition of preregistration that is generally agreed upon by those tasked with engaging in the practice is therefore needed to encourage progress on standardizing the practice of preregistration. An agreed upon definition and subsequent translation of this definition to a standard set of practices regarding preregistration currently does not exist.  

When? – Typically preregistration should occur prior to the proposed actions that are specified in the preregistration are carried out.

Where? – Preregistration information should be timestamped and publicly available in perpetuity without the option to delete shared information. Any updates or additions to a preregistration should be transparent to anyone viewing the preregistration information.

Why? – The goal, or goals, motivating the practice of preregistration. There is currently a great deal of disagreement regarding the purpose of preregistration, meaning it is possible for one person to declare that preregistration “works” and another to declare that it “does not work”, with each statement being correct given the different inferred goals of the practice. This is not a desirable situation.

In the next sections, I provide expanded answers to the five Ws of preregistration (this is the first draft of this post on January 30, 2024, and I will likely add to it and lightly edit).

Who?

In my field of psychology, people typically develop and advance their careers by generating research questions and/or hypotheses that can be assessed by designing research studies and comparing study outcomes with expectations. Results that contrast with expectations can be very useful to generate novel insights. A researcher may also evaluate data in the absence of expectations or with vague expectations, developing new research ideas based on the pattern of observations across the set of available variables in a given data set. Overall, any researcher that obtains data for the purpose of discovery has the option to share up front in an open and transparent manner their research intentions and planned actions.

What?

In January of 2024 I conducted an informal internet search for definitions of preregistration. The top hits of my search appear below.

APA.org: “Preregistration allows researchers to specify and share details of their research in a public registry before conducting the study.”

COS.io: “When you preregister your research, you’re simply specifying your research plan in advance of your study and submitting it to a registry. Preregistration separates hypothesis-generating (exploratory) from hypothesis-testing (confirmatory) research.”

“What is pre-registration? Pre-registration involves making hypotheses, analytic plans, and any other relevant methodological information for a study publicly available before collecting data.”

COS from May 13, 2023: Preregistration is the practice of documenting your research plan at the beginning of your study and storing that plan in a read-only public repository such as OSF Registries or the National Library of Medicine’s Clinical Trials Registry.

Surrey.ac.uk: “The goal is to create a transparent plan, ahead of beginning your study/accessing existing data. So, your preregistration will include details about the study, hypotheses (if you have them), your design, sampling plan, search strategy (for reviews) variables, exclusion and inclusion criteria, and the analysis plan.”

The Administration for Children and Families (USA): “Pre-registration is the practice of deciding your research and analysis plan prior to starting your study and sharing it publicly, like submitting it to a registry.”

PLOS.org: Preregistration is the practice of formally depositing a study design in a repository—and, optionally, submitting it for peer review at a journal—before conducting a scientific investigation.

Common Themes: Publicly sharing details of planned research projects in an appropriate registry prior to carrying out the planned research actions.

My working definition of preregistration:

Stating as clearly and specifically as possible what you plan to do, and how, before doing it, in a manner that is verifiable by others (From: https://www.lornecampbell.org/?p=181)

All of the above definitions of preregistration are consistent in that they refer to publicly sharing intentions prior to actions. Where many differences of opinion occur is not with these somewhat vague definitions, but instead with the translation from the conceptual (“create and publicly share research plan prior to carrying out research plan”) to the concrete (from “you can share plans for all of your research” to “sharing only needed and/or useful for specific data analytic decisions for confirmatory hypotheses and nothing else”). It seems to me, therefore, that there is actually some general agreement regarding the higher order definition of preregistration, but important differences between researchers on the translation of this definition for different types of research.

In the figure below, I refer to research intentions and actions. With respect to intentions, I roughly dissect research projects into explorations and hypothesis testing research. Explorations loosely include purely descriptive research (e.g., assessing base rates of behavior in a given population) as well as “fishing” (e.g., calculating correlations between study variables, running a lot of models that include different variables or same variables with different combinations of items, trimming the sample, and so on). Hypothesis testing includes evaluating vague hypotheses (e.g., two variables should be positively correlated, the mean for group A should be higher than group B) as well as very concrete hypotheses (e.g., using data obtained via specific measures the correlation between two variables should be in a given range, or the mean for Groups A and B should be in a given range and differ by more than 1 unit). These categories are simply meant to cover research that at one end has very limited expectations regarding outcomes to research that has very specific expectations regarding outcomes. With respect to actions, I list four categories of the types of concrete information regarding a researcher’s intentions that can be publicly shared. What is obvious from this conceptualization of the research process as it relates to preregistration is that it is primarily the degree of specificity of the information shared (actions) that changes as a function of the research intentions, NOT whether information should be shared or not shared. In other words, you can preregister all of your research projects, with some preregistrations including a lot of very specific information compared to others. If at this point you disagree with what I just said regarding preregistration, it is very likely we have different views on the goal(s) of preregistration. So keep reading until the end.

When?

Prior to doing what you plan to do (will be updated with more information)

Where?

OSF, aspredicted.org, git/github (will be updated with more information)

Why?

            Overall, what is the desired outcome, or outcomes, associated with the practice of preregistration? Ask people that believe in the value or preregistration, and those that do not, this question and I predict that you will get a variety of responses. Whereas researchers may have general agreement about a higher order definition of preregistration, there are important differences of opinion with respect what types of research should be preregistered and I believe this is largely because of disparate beliefs about the goal(s) of preregistration. What does it help with?

Before moving on, when researchers ask this type of question it sounds like this in my head: “If you want me to share my research intentions with you before I have done said research then you need to prove to me that there is some value in doing so or else I declare it is all a waste of time.” Here is something we wrote in a paper published in 2014: “Ideally we should not need to persuade researchers of the benefits of disclosing details of the research process; instead, researchers should need to provide solid rationale for not openly sharing these details.” (Campbell, Loving, & LeBel, 2014, p. 542). I suppose what I am about to say now is somewhat controversial, but consistent with the 2014 version of myself I believe researchers should preregister by default unless there is solid rationale, that needs to be shared, for not doing so.

Primary Goal of Preregistration: The open and transparently shared decision making process, from intentions to proposed actions, for a research endeavour.

The availability of this information should allow for achieving many sub-goals (when relevant to a particular research endeavour), such as:

  • Defining research questions and/or pre-planned hypotheses as formed by the researcher(s) prior to collecting data and/or examining existing data
    • Often described as delineating exploratory from confirmatory tests of research questions and/or hypotheses
      • Whereas some explorations can be planned in advance (e.g., obtaining base rate information for some outcomes in a given group or groups; wanting to see differences in a variety of measures between particular groups; estimating correlations among responses to particular measures), other explorations occur during the process of examining the data obtained to assess the research questions and/or hypotheses
  • Detailing with appropriate specificity the research methods the researcher(s) plan to use to evaluate the research questions and/or hypotheses, including:
    • Target population and planned sample
    • Rationale for sample size
    • The planned information to be obtained and how the researcher(s) will obtain this information (e.g., want to obtain the age of participants in the sample, so will ask study participants to indicate their age in years and months or ask study participants to indicate their date of birth; want to assess self-esteem so will ask participants to complete a particular self-esteem scale or scales)
    • Any planned manipulation in the study design, described in appropriate detail to allow others to implement the manipulation in a like manner
    • The planned procedures for conducting all aspects of the study
    • Any data exclusion rules
    • Plans for how data will be analyzed in specific detail as appropriate given the nature of the research questions and/or hypotheses as well as the methods used to obtain the data

Secondary Goals of Preregistration: Preregistration is believed by some advocates to help increase the quality of published research reports in the following ways:

  • Eliminate HARKing, or hypothesizing after the results are known. In other words, viewing the results from a given set of analyses and then tailoring hypotheses to match the results obtained
  • Eliminate p-hacking, or conducting many statistical tests in many different ways in order to obtain a p value that is lower than the traditionally accepted cut off of .05 and then presenting this result (or results) as the only test that was conducted to test the hypothesis
  • Eliminate outcome switching, or stating that a given result was the primary test planned all along when in fact it was not
  • Enhance the severity of a statistical test of a concrete hypothesis by following the pre-specified data analytic plan
  • Enhance the reproducibility of the study methods and procedures, as well as the data analyses. Each type of reproducibility can assist with (a) the evaluation of study claims, as well as (b) re-running the study to determine the replicability of the findings in another sample of participants

Any preregistration, even one that is poorly constructed, helps achieve the primary goal of being open and transparent regarding one’s intentions and proposed actions. Some preregistrations may assist with achieving the secondary goals, whereas some may not. But being open and transparent with your research intentions and proposed actions in a preregistration achieves the primary goal of preregistration. So do it.

References

Campbell, L., Loving, T.J., & LeBel, E.P. (2014). Enhancing transparency of the research process to increase accuracy of findings: A guide for relationship researchers. Personal Relationships, 21, 531-545. DOI: 10.1111/pere.12053

LeBel, E.P., Campbell, L., & Loving, T.J. (2017). Benefits of open and high-powered research outweigh costs. Journal of Personality and Social Psychology, 113, 230-243. DOI: 10.1037/pspi0000049

Moshontz, H., Campbell, L., … Chartier, C.R. (2018). The psychological science accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychology Science, 1, 510-515. DOI: 10.1177/2515245918797607

Nosek, B.A., Beck, E., Campbell, L., Flake, J.K., Harwicke, T.E., Mellor, D.T., van’t Veer, A.E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Science. DOI: 10.1016/j.tics.2019.07.009

Reflections on Supervising Graduate Students

I stumbled across an email I sent within the past two years reflecting on my general approach to working with graduate students (context for sending the email not important). I made a few edits to remove names and shorten it a little. Here you go…

I always do my best to let the student know that this is their education and their future, not mine. I have no particular expectations from students for myself; rather, I want to do what I can to help them achieve what they would like to get out of graduate school. If they want to do a lot of research and publish papers, then I support data collection and prioritize writing with them. If they want to do some research but want to focus energies on teaching etc, then I pivot my support accordingly. In more recent years, I have had students interested in using their research and teaching skills in industry or staff (compared to Faculty) positions at Universities. I provide the guidance that I can at the time, and when I lack expertise/knowledge I am ok admitting that to the student and discussing what we can do to find information for their interests.

I try not to get in their way, and I try to provide support when it is needed along the way. I also do my best not to take credit for their work, and promote them when I can (e.g., within the department, to colleagues, on social media by sharing their work and achievements). In other words, let them take centre stage for their work because they need it and deserve it. 

I have had students, sometimes in tears, in my office concerned about never being able to do enough, be good enough, etc compared to all of their peers. First, I simply let them speak and listen to them. I also don’t tell them they are wrong, but rather tell them their feelings are understandable. I discuss the difficulties surrounding their position and the future job market for R1 types of jobs (I use the term “R1” because a lot of people understand that to mean “research focused position at what is considered a top tier University”). Then I usually draw my silly picture of a stick person standing in front of what looks like a big mountain. It is impossible to jump to the top of the mountain (simply a long line in front of my stick person). Then I draw a ramp from the top of the mountain that eventually rests on the ground, and suggest to the student that the best way to the top is to go behind the peak and take the ramp. They will get there, but it takes time. The problem with being in year 1/2 of graduate school is that they only see where they need to get to (top of the mountain), and cannot see the many years of training and great things they will likely accomplish during that training as they reach the top. But they will learn a lot, they will do great things, and they will finally reach the top. What is waiting for them is uncertain, but this is ok. The point is they are allowed to feel stressed about the ultimate goal and reaching the top of that mountain, but for now they need to focus on a series of shorter term goals. For example, in year 2 of our Masters program the goal is to defend their Masters thesis as well as be a part of non-thesis related research. Pretend the defense date is in July. Now we work backwards from there, and drop in milestones that need to be achieved to defend (e.g., final draft to committee, time to write final draft, final date of data collection in order to run planned analyses and write results/discussion, [all the other things here], have idea worked out with advisor and share with committee). At this point, I suggest they only really have to focus on what needs to be done this week, knowing that if they achieve these smaller goals along the way the big goal of defending in July will happen. And part of this process includes, for me at least, dealing with conflicting duties. If I want to do something this week, but I am worried about another task that needs to get done, I simply ask myself what needs to get done first? Once I decide, I tell myself not to worry about the other task anymore unless I plan to drop everything else and work on that task. So, don’t worry about something I know I am not going to do today, knowing that it will get done later, and knowing that I am doing something else important today. 

I also tell students not to compare themselves with their peers (as best they can). I did this all the time of course, but after getting my first tenure track job straight out of grad school (5 years in the program), these comparisons were not good for my self-esteem! (so many ppl with longer programs and then post-docs; eg, some people I knew did 6 years of post-docs before getting a TT position, so I did not want to compare pubs early on with them!). Instead, compare yourself with yourself a year ago. Have you made contributions at a rate that you feel is good for you? Being able to do this has helped me a great deal these past 22 years post PhD. 

Lastly, different people sometimes need/want different approaches to supervising. I remember one student (now a faculty member at a great place) having moments of doubt. On a few occasions we had heart to heart talks about their doubts, with the student wondering if they were good enough. I listened a lot longer in those moments before really digging in with my feedback. But it usually ramped up to me writing down papers they had published, had under review, data they were working with at the moment, papers they were writing, and projects that were planned, etc.. I then counted up stuff that was “accepted”, likely to be “accepted”, and likely to be sent out in the near future (and how I thought it would get published—just a matter of where it will be published and when). I think we did that at least 10 times. Another student that is also a faculty member at a great place was different. This student did not doubt being good enough (and I do NOT mean that in a bad way at all; I simply mean they had healthy self-confidence), but had other concerns that we worked through. 

Long story short, I see my job as helping them achieve to the best of our abilities what they would like to achieve with their graduate education. That’s it and that’s all (cheesy pop culture reference that will not make much sense in the near future, inserted here as evidence that I am an aging Professor).

How to Create a New Open Access Academic Journal with no Fees

There are many concerns regarding for profit academic publishers realizing huge profit margins for distributing research products that cost them no money. That is, I conduct my research with my own resources and submit a manuscript detailing my results to a journal, but the journal did not incur any costs for my research. There are also many concerns regarding open access journals that charge high author processing fees (or APCs). A lot of the debates about all of these concerns seems to boil down these two options being the ONLY two options for publishing academic research. Just goes to show that no matter how many people with PhDs get involved it doesn’t mean that the problem will get solved. So, how might any one of us start our own peer reviewed open access journal with no APCs? A few different options it seems. Here are a few of my thoughts after thinking about this for a few hours one morning.  

First, a little bit about my background with academic publishing. I am a social psychologist that has been publishing my research in academic journals for over 20 years. I have been a reviewer, an associate editor, and was the editor of Personal Relationships (Masthead Editor between 2010-2014, but started processing new submissions in July 2008). As the editor of PR I put together a team of associate editors, worked with the for profit publisher to move the submission/review process to manuscript central (up to that point the submission/review process was run entirely via email and excel spreadsheets), and processed approximately 160 submissions per year for four years. That said, I am not up to date with using current publishing software and technologies and I do not have skills relevant to creating these technologies.  

So, with only my current working knowledge of available free online platforms and technologies, and not being in the publishing industry, I attempt to MacGyver (that should give some hints of my time here on earth, and means “to make, form, or repair (something) with what is conveniently on hand”) a new open access academic journal that has no author processing fees and has minimal expenses for the people operating the journal. My research has typically focused on close relationships (particularly romantic relationships), so I will create here a new journal called Close Relationship Science ™. Here is what I came up with, but I would love to hear how this could be improved or even replaced by something better. 

Getting Set Up

  • Write a proposal for the new journal (here are some things to consider)
    • Executive summary that outlines the vision of the new journal, including why it is being introduced, aims and scope, and so on
    • A longer plan that discusses each of the above points in more detail
      • Include a discussion of existing journals that publish similar material and their current status as “open vs. closed” re access and open science practices, controlled by society and/or for-profit publisher, and
      so on
      • Include a consideration of the growth of the journal in the first five years of operation
  • Share the proposal with a few others that you would like to recruit to help get the journal started
    • Have some meetings and establish a temporary board of directors for the establishment of the new journal
  • With the board establish some preliminary operating procedures regarding governance of the new journal as well as policies for the submission/review/publishing process
  • With the board decide on the initial editor (likely one of the board members, but who knows)
  • Set up the Open Journal Systems (OJS) platform for the journal (see next section below)
  • Register domain name for the new journal (~$15 / year based on a few quick searches) and put together a basic website for the journal
    • I have limited technical skills, but even I at this point can put together a basic website and make it viewable by the world
  • Use the new website to advertise the existence of the new journal, its aims and scope, the board of directors, and the new editor
  • At the same time establish a social media presence for the new journal on various platforms
  • Put out a call for people to be part of the editorial board to review new submissions via various sources, such as social media outlets, society chat/email groups, personal emails from the editor/board members
  • When the team has been assembled, advertise that the journal is now open for new submissions

The submission/review/publishing process

  • One good option seems to be the Open Journal Systems (OJS) platform. The open access journal Meta Psychology uses this platform, and my own University (Western University, in London, ON, Canada) “offers a no-fee local publication facility for students and faculty who wish to publish an online open access journal” (link here)
    • A list of journals here that Western University currently helps host/manage using the OJS platform
    • Because this is my fantasy, I will use the OJS platform and seek guidance from the librarians at Western to get the process started
  • Use available resources (e.g., librarians, web links, real people that are kind) to get the journal completely set up on the OJS platform. Here is one such resource.
    • Do as many trial runs with the website as needed with mock submissions to make sure it works properly for submitting authors, the editor, and reviewers
  • Use this system to accept submissions, assign reviews, and see submissions through to publication
  • All publications will appear online only and be assigned a DOI
  • Use the social media sites created for the journal to share news, insights, and publications

Overall it seems possible for one person to begin the process to create a new academic journal that is free to authors and has minimal expenses otherwise. Now imagine if one of the societies you are a member of put together a task force to look into creating their own open access journal that is free to authors and used some of your yearly membership fees (or some of those massive conference fees) to support the journal? It’s easy if you try. 

Psychology Grad Course: Open and Reproducible Science

I have written several posts about the open and reproducible graduate course for psychology that I created and have taught a few times. This post is simply to pull together all relevant links to one location. Below I link to two versions of the syllabus as well as to my lecture summaries for each class. The lecture summaries do not reflect the only material discussed in each class, but they do highlight the main theme for each class.

Syllabus:

  • Syllabus from 2018 (first time teaching this course)
  • Syllabus from 2020

Lecture Summaries (based on 2018 syllabus):

  • Week 1: Introduction to the course
  • Week 2: Why Should Science be Open and Reproducible?
  • Week 3: Open Notebook
  • Week 4: All About Pre-Registration
  • Week 5: Power and Study Design
  • Week 6: Sharing Procedures, Materials, and Data Analytic Plans
  • Week 7: Data Management Plans
  • Weeks 8 and 9: Sharing Data as well as Code/Syntax
  • Week 10: Openly Sharing Research Reports/Manuscripts
  • Week 11: Extended Research Networks
  • Week 12: Transforming Discovery

Week 12: Transforming Discovery

In this final week of the class I discussed the idea of transforming the process of discovery in our own labs. When I started my undergraduate program in 1992 I learned about the discoveries of others in my courses and textbooks. When I started my graduate program in 1996 I also started to implement what I had learned about the process of discovery with my own research. During this time I often asked myself, and others, “what is typically done by other researchers in this area of research?” I wanted to make sure I was doing what was considered acceptable at the time. For example, using a particular measure of attachment orientations because it seems to be used a lot and thus would not be questioned by reviewers. Or using a particular analytic method because it was frequently used and thus seemingly defensible during the review process. And when it came time to analyze data I learned to run a lot of different models with a number of different combinations of responses across multiple measures to find results that were consistent with our original expectations. Then when it came to write manuscripts, learning what results to not only include but also exclude. It simply felt to me that “this is how science is done” in our field, so I did it that way.

But really it did not have to done in the manner briefly described above. I was searching for what I thought was the best way to be a scientist and in the process also discovered the existing norms for how to be a scientist. Sometimes existing norms overlap with what is best for scientific discovery and dissemination, but sometimes they don’t. I don’t claim to know exactly how scientists should go about making their discoveries, but I do feel strongly that when we feel confident enough to share our results publicly we need to also share to the best of our abilities at the time how exactly we obtained those results. For me that is like a latent construct of open science practices–be open and transparent–that influences what I do throughout the research process. In the early days of the open science movement, I felt that actions could speak just as loud as all the words that were flying around. When others were wondering out loud if sharing details of the research process was worth it, might be costly to the researcher, etc., I felt that I could simply point to our own experiences as examples of how it could be done. Yes, you can preregister hypotheses and *still* conduct analyses not planned in advance. Yes, you can even preregister exploratory research and their is value in doing so. Yes, you can share the measures you used in your study. Yes, you can be open and transparent with longitudinal research designs. Yes, you can share data all the time (sometimes publicly, sometimes via other means). Yes, you can share the syntax you used to produce the results that your presented in your manuscript. Yes, you can preprint your work. And on and on. Debates are fun and all, but when many researchers pondered whether they could/should do these things, we simply did them. One of my hopes was that when new graduate students were learning about the process of discovery, they might stumble across some of our open and transparent research practices and think it was something they could do, something that was becoming normative in the field. Whereas some colleagues saw open science practices as warning flags for the process of discovery, I encouraged my students to see them as challenges for which we have the opportunity to develop solutions to our own process of discovery. With the existence of today’s technologies there are numerous ways to share our research process and make it available for scrutiny, and no solid arguments for keeping this process “available upon request”. That is also the reason why I wanted to teach a course on open and reproducible science. I wanted to do what I could to share these tools with these early career researchers in hopes that they would see value in adopting them in their own research.

My final take home message here: when sharing the results of your research also share how you obtained those results as openly and transparently as possible.

I feel relieved to finally finish this series of posts to accompany the weekly lectures for my course on open and reproducible science. It seemed obvious to me that being open and transparent could also apply to the courses we teach, meaning we could share syllabi and course notes. This series of posts serves as my own personal lecture notes for each class. If you have read them I hope they have been of some value.

Week 11: Extended Research Networks

In this class I introduced to students the idea of scaling up open science practices for use in extended research networks. When I first taught this course many of these initiatives were relatively new and some were untested, and the students were excited about the possibility of these large scale collaborations. I will only discuss a few of the current extended research networks in this post.

One of the earlier extended research networks that incorporated open science practices was the Registered Replication Reports (RRR) initiative originally offered by Perspectives on Psychological Science and originally headed up by Daniel Simons. The basic idea was that individuals or groups of researchers would propose a study that they would like to re-run on a large scale with a number of independent labs after input from the original author(s). When a proposal was approved the submitting researchers worked closely with Dan and the original author(s) to reproduce the methods of the original study as closely as possible (or in some cases settle on a particular method/approach they felt was optimal to assess the effect of interest). A call would then go out to the research community, asking others to use the agreed upon methods/measures and collect a given amount of data to contribute to the project. All study details were shared with this extended group on the Open Science Framework. When the data was collected members of the extended research team submitted their data to the person in charge of overseeing the statistical analyses. This person was not part of any of the research groups, and the statistical analyses as well as the syntax used to run these analyses were agreed upon in advance. The team that submitted the original proposal for the replication project worked with Dan and the original author to draft a methods/results section in advance of knowing the results; the goal was to be able to drop the results into the already prepared manuscript. When all was ready the results would be revealed. I participated in one of these projects (I wrote about it here). Here is a link to the final product. Overall these projects were focused on large scale replication research, and presently RRRs are now offered via the journal Advances in Methods and Practices in Psychological Science.

Another successful extended research network is the Many Babies initiative. From their website, Many Babies “is a collaborative project for replication and best practices in developmental psychology research. Our goal is to bring researchers together to address difficult outstanding theoretical and methodological questions about the nature of early development and how it is studied.” The basic idea here is to enhance collaboration between labs all over the world that collect data from babies in an open and transparent manner. This also helps with increasing sample sizes, given that any individual lab faces challenges collecting data from large samples of babies. Check it out.

Lastly I will mention the Psychological Science Accelerator. From their website: “The Psychological Science Accelerator is a globally distributed network of psychological science laboratories with 1328 members representing 84 countries on all six populated continents, that coordinates data collection for democratically selected studies.” They have many committees to assist with every aspect of the research process for everyone involved (e.g., translation, ethics review, statistical analyses, and so on), and the entire process is guided by open and transparent research interests. I was part of some of the early discussions of this initiative and am very impressed with the leadership team during the handful of years it has existed. They are truly inspirational. This type of large scale extended research network seems to be an ideal manner to test ideas with lots of data, but more importantly data from all over the world. This allows for testing group/cultural differences in the effects of interest. Check out the results from the first project of this initiative here.

I have not gone into any detail on the Many Labs projects that sparked a lot of discussion, or other initiatives that sought to bring together researchers from different Universities and countries to collectively test hypotheses in an open and transparent manner. Overall, there are many exciting options available to researchers at all stages of their career to get involved in these extended research networks.

Week 10: Openly Sharing Research Reports/Manuscripts

When I first taught this course, pre-print servers, or other online resources for sharing research reports and manuscripts, were not as popular or well known as they are today (Spring 2023). My goal with this class was to introduce the idea of sharing manuscripts prior to/after publication as well in lieu of publication in a peer reviewed journal. I showed them a few different options available at the time, including the one hosted by the library system at Western University (where we are located).

Overall the students seemed concerned about how it would be perceived to share a manuscript publicly before it was accepted for publication at a peer reviewed journal (e.g., “will the journal want to publish my paper if I have already “published” it?”). As part of this discussion I showed them sherpa romeo, a site that allows one to view the open access policies of a lot of journals and thus help one determine if they can/should share a preprint of a manuscript. The students were also concerned, however, with sharing a copy of the paper that was accepted for publication in case the journal would forbid this practice (and maybe even revoke acceptance of a manuscript); sherpa romeo is helpful here as well. A lot of fear associated with sharing outside the mainstream publication system! Fair enough, that is why I teach this material in the class and have an open discussion where I make sure to listen to the concerns of the students.

In this class I also discuss thinking beyond the typical research report as material worthy of sharing publicly. For example, stimuli used in the research that will not be part of the manuscript but others may want to use for their own research. I discussed how they could share this material in such a way that it could be both used but also cited. It was appealing to the students to think that they could have aspects of their research beyond the manuscript itself appear in, for example, google scholar and also be cited. The same goes for unique methods as well as data sets. Lastly, we discussed the idea of open peer review and associated pros and cons.

I have been sharing preprints for many years now, mostly (but not exclusively) on psyarxiv. Most of the manuscripts shared their are now published in peer reviewed journals, but some are not. For example, here is a brief paper now published at the Journal of Research in Personality that is also on psyarxiv. Google scholar tells me the published paper has been cited a whopping 4 times. But as you can see on psyarxiv it has been downloaded over 2000 times to date. This may mean absolutely nothing, but perhaps it means that the paper is having an impact not measured by citations alone. Also, you can see on psyarxiv that after the paper is published in a peer reviewed journal the author(s) can update the preprint with the published DOI. One example of a manuscript that exists only as a preprint focuses on a qualitative analysis of “ghosting” (in this case relationship dissolution by ending all contact with a partner) that was lead by former awesome graduate student Rebecca Koessler. This paper has been downloaded over 3000 times, suggesting it has been helpful in some way to others; if it had remained tucked away in our hard drives only it would obviously not have had this level of attention. Interestingly enough this preprint has also been cited 11 times according to Google scholar. From this perspective it was therefore of value to share this research as a preprint even though it was not published in a peer reviewed journal. My approach to open science practices has been to lead by example, so I appreciate that my own experiences with sharing preprints has resulted in noticeable attention to the research when the paper is published in a peer reviewed journal or not. I will likely use these papers, and others, as examples of sharing preprints if I teach this course again.

Weeks 8 and 9: Sharing Data as well as Code/Syntax

In week 8 of this course I discussed the importance of sharing data (when appropriate) and the code/syntax used to produce the results presented in a manuscript. This is something I have discussed in a prior blog post. Rather than dive into the details of how this can be accomplished I will instead ask the reader to imagine times they finished up a manuscript that included data analyses to submit for review. When analyzing data, you, the researcher, had (1) a data set, (2) knowledge of how that data set is organized, (3) knowledge of how all variables of interest were collected and/or computed, and (4) data analytic software that could access the data set and execute analyses of interest. All of the results you selected to present in your manuscript required code/syntax to generate. This code/syntax was either written by you, the researcher, or by the program you used if you opted to use drop down menus to make analytic selections. At any rate, prior to completing that manuscript to submit for peer review there was a time when you had access to and knowledge of the data set being analyzed as well as all of the code/syntax used to generate all of the results selected to be presented in your manuscript. The take home message from this lecture–share all of this information in some way so that others can evaluate and reproduce your results. A lot has been written and published on how to do this. So just do it.

In week 9 I was fortunate to have Seth Green visit my class, who at that time was a Developer Advocate at Code Ocean. Code Ocean is one platform to share data and code such that others can re-run your analyses (or create their own analyses with your data) in the cloud without needing to download anything or have access to the multiple data analytic software platforms supported by this site. Also, once you are able to run your own analyses on this site and then share with others, the analyses will always run going forward. When preparing for this class I discovered that my 1 year old R code for a published paper would no longer run because of an update to a needed package that “broke the code” in a function that computed scale scores, and I have heard many others share stories of code no longer working after making updates to R, Rstudio, and/or packages to be used within R. I am guessing this issue exists for other data analytic platforms as well. A former graduate student in our graduate program, Dr. Rebecca Koessler, published a capsule on Code Ocean that you can review here.

Again, the take home message from this class was that data as well as code/syntax existed in a way for you, the researcher, to generate everything you placed in your manuscript and that with a little work it is fairly easy to share all of that with others for evaluation or reuse. I suggested that as a researcher one should adopt the mindset of being open by default and only withhold study information (in this case data as well as code/syntax) when a solid reason exists to do so. The traditional “closed by default” approach needs to be shelved.

Hello, again, hello

I last posted here in June of 2018 (almost 4 ½ years ago). I was attempting to catch up on posting about each week of my then new graduate class on open and reproducible science (the class ended in April of that year). I never did catch up.

Life events that began a few years before 2018 were, and still are, very challenging to experience. I suppose I felt that focusing more on professional tasks, such as teaching new courses and taking on additional roles with journals and so on, would help me move on. But it didn’t. Instead, additional challenging life events that I would normally be able to absorb and handle stopped me in my tracks and I dropped the ball on several commitments. That made me feel very guilty. Then in late 2019 I was surprised to experience briefly professional harassment from a known colleague privately via email. When I told this person that I experienced their actions as harassment, they replied that they did not agree with my assessment. Then of course, early in 2020 the COVID-19 pandemic hit. Life still has not returned to “normal”. It has all been very overwhelming, and it took me a while to realize that I was not going to “feel better” and pick up where I left off years earlier but rather that I needed to find a new starting place. I am not 100% certain yet what that starting place looks like. Life within my profession is not as enjoyable for me as it was in the past, but life outside of work is better. So for now I will consider this my starting place.

What I really want to do now is finish something that I started—complete a few blog posts about the remaining weeks of my graduate course on open and reproducible science (5 more posts to go). I have taught this course two times now and really enjoyed it each time. We’ll see if I get to adding more posts on other topics to this space after that.

Take care and be kind to yourself.

Week 7: Data Management Plans

 

I admit that up until not too long ago I was unfamiliar with the term “data management plan”. In graduate school I was not trained specifically in data management. Rather, collecting, handling, storing, and sharing data were tasks left for me to decide how best to handle, and I usually did so on the fly. The result is that there are data sets I can no longer find (probably on a floppy disk in a box somewhere’ish), and some data sets I can find I can no longer interpret given the absence of annotation or meta-data. I also used different strategies for naming files and study variables than my grad student colleagues working on similar projects in the same lab (i.e., lack of consistency within the lab). Yuck. But academics typically become famous because of the cool positive results they report, not because they are meticulous with curating their data sets for future use. Given that as a field we rarely share our data publically, or even privately when requested, it is likely seen by many a waste of time to think about data management.

That type of thinking, however, is both wrong and problematic. Here are some reasons why:

  • A lot of data is generated every single year by thousands of researchers, and it is huge waste that most of it is hidden away from other researchers as well as the public. Others could put it to good use, but this orphan data is likely never to see the light of day. Imagine a shirt maker manufactures 100 shirts every day, but only sells 5 of them to customers and throws the rest in a pile in a warehouse. In a short amount of time, there are exponentially more shirts in the warehouse than on people’s backs. A lot of time and energy is wasted manufacturing and storing the 95 shirts each day that will never be worn.
  • Researchers use the results of data analyses to persuade others about relations among variables. When persuaded, these results form the building blocks of theories and can also be used to set policy by government agencies. The only way to properly evaluate the claims made by researchers is to have access study materials, procedures, data, and data analytic code (among other things). Without such access, trust is a surrogate for information, and proper evaluation of scientific claims is impossible.
  • Your future self, and future lab members, will benefit immensely from good data management.

That is why in this week of the class we discussed the importance of data management plans. In short, “Research Data Management is the process of organizing, describing, cleaning, enhancing and preserving data for future use by yourself or other researchers.”

I am thankful to the assistance of librarians at Western University for meeting with the class to review a data management template that was put together by the Portage Network. I provide the template below. Ideally, researchers answer each of these questions prior to data collection and then store the document in a repository of their choice (e.g., the Open Science Framework). Writing a data management plan for all projects that collect unique data is an excellent habit for all researchers to develop.

Template Made Available by Portage Network

DMP title

Project Name:

Principal Investigator/Researcher:

Description:

Institution:

Data Collection

What types of data will you collect, create, link to, acquire and/or record?

What file formats will your data be collected in? Will these formats allow for data re-use, sharing and long-term access to the data?

What conventions and procedures will you use to structure, name and version control your files to help you and others better understand how your data are organized?

Documentation and Metadata

What documentation will be needed for the data to be read and interpreted correctly in the future?

How will you make sure that documentation is created or captured consistently throughout your project?

Storage and Backup

What are the anticipated storage requirements for your project, in terms of storage space (in megabytes, gigabytes, terabytes, etc.) and the length of time you will be storing it?

How and where will your data be stored and backed up during your research project?

How will the research team and other collaborators access, modify, and contribute data throughout the project?

Preservation

Where will you deposit your data for long-term preservation and access at the end of your research project?

Indicate how you will ensure your data is preservation ready. Consider preservation-friendly file formats, ensuring file integrity, anonymization and deidentification, inclusion of supporting documentation.

Sharing and Reuse

What data will you be sharing and in what form? (e.g. raw, processed, analyzed, final).

Have you considered what type of end-user license to include with your data?

What steps will be taken to help the research community know that your data exists?

Responsibilities and Resources

Identify who will be responsible for managing this project’s data during and after the project and the major data management tasks for which they will be responsible.

How will responsibilities for managing data activities be handled if substantive changes happen in the personnel overseeing the project’s data, including a change of Principal Investigator?

What resources will you require to implement your data management plan? What do you estimate the overall cost for data management to be?

Ethics and Legal Compliance

If your research project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?

If applicable, what strategies will you undertake to address secondary uses of sensitive data?

How will you manage legal, ethical, and intellectual property issues?

This document was generated by DMP Assistant (https://assistant.portagenetwork.ca)

Additional Resources

Presentations on engaging scientists in good data management

Blog post on data management best practices.