Tone Deaf

In my field of psychological science there have been many discussions the past few years on the way an argument is expressed, its tone. A common theme is the general desire for academic discussions to be positive and respectful, and not mean and antagonistic. With the release of Susan Fiske’s commentary on the state of scientific communication (see a detailed discussion of the commentary in the context of other developments in the field the past decade here), the discussion of “tone” has heated up again. This is particularly true for the Facebook discussion group “PsychMap” where the tone of communication is closely monitored.

The following of course is simply my own opinion, and I respect that others disagree with this opinion, but I do not really care that much about the tone of an argument. A person can offer up a positive, or neutral, argument and be full of shit, or not. A person can offer up a negative, sarcastic, even rude argument and be on the mark, or not. If you have sat through a few faculty meetings  you will know exactly what I mean. Personally, I do my best (and sometimes my best is not good enough, to be honest) to focus on the argument being presented and not on how the argument is presented. I can only control (a) how I decide to put forward my own arguments (asshole or angel, or somewhere in between), and (b) how I respond to others’ arguments. In my opinion the tone of argument reflects more on the person delivering the argument than on the target of the argument. I accept that if I choose to deliver my arguments in a manner most of my colleagues would perceive as obnoxious and combative that I may not be taken so seriously by these colleagues for very long. I personally therefore choose to be positive, or at least direct in a fairly neutral manner, with the majority of my arguments (hopefully as reflected in my blog posts the past year and a half, and in my papers on meta-scientific issues). I therefore prefer discussions not to be officially moderated, and to let people own the words they choose to use to present their views. The field of academic psychology is literally a community of highly educated individuals that are smart enough to know the difference between shit and Shinola; we can figure out if an argument, however presented, has substance or not.

And for what it’s worth, it seems to me that the majority of discussions I am privy to in private and on social media are positive and constructive in tone. That is nice.

 

Organize your Data and Code for Sharing from the Start

On September 12, 2016, experimental psychologist Christopher Ferguson created a “go-fund-me” page to raise funds for access to an existing data set that was used to advance scientific arguments in a scientific publication (link here). In Ferguson’s own words: “So I spoke with the Flourishing Families project staff who manage the dataset from which the study was published and which was authored by one of their scholars.  They agreed to send the data file, but require I cover the expenses for the data file preparation ($300/hour, $450 in total; you can see the invoice here).” Ferguson’s request has generated a lot of discussion on social media (this link as well), with many individuals disappointed that data used to support ideas put forward in a scientific publication are only available after a big fee is paid. Others feel a fee is warranted given the amount of effort required to put together the data requested into one file, as well as instructions regarding how to use the data file. And in the words of one commenter, “But I also know people who work with giant longitudinal datasets, and preparing just the codebook for one of those, in a way that will make sense to people outside the research team, can take weeks.” (highlighting added by me).

As someone that has collected data over time from large numbers of romantically involved couples, I agree that it would it take some time to prepare these data sets and codebooks for others to understand. But I think this is a shame really, and is a problem in need of a solution. If it takes me weeks to prepare documentation to explain my dataset organization to outsiders, I am guessing it would take the same amount of time to explain the same dataset organization to my future self (e.g., when running new analyses with an existing data set), or a new graduate student that wants to use the data to test new ideas, not to mention people outside of the lab. This seems highly inefficient for in-lab research activities, and represents the potential loss of valuable data to the field given that others may never have access to my data in the event that (a) I am too busy to spend weeks (or even hours for other data sets) putting everything together for others to make sense of my data, and (b) I die before I put these documents together (I am 43 with a love of red meat, so I could drop dead tomorrow. I think twice before buying green bananas).

So what is my proposed solution? Organize your data and code from the start with the assumption that you will need to share this information (see also “Why scientists must share their research code”). Create a data management plan at the beginning of all your research projects. Consider how the data will be organized, where it will be stored, and where the code for data cleaning/variable generation, analyses, and plots will be stored. Create meta-data (information about your dataset) along the way, updating as needed; consider where to store this meta-data from the beginning. If you follow these steps, your data, meta-data, and code can be available for sharing in a manner understandable to other competent researchers in a matter of minutes, not weeks. Even for complex data sets. Your future self will thank you. Your future graduate students will thank you. Your future colleagues will praise your foresight long after you are dead, as your [organized] data will live on.

Update: see Candice Morey’s post on the same topic.

 

How to Publish an Open Access Edited Volume on the Open Science Framework (OSF)

Edited volumes are collections of chapters on a particular topic by various experts. In my own experience as a co-editor of three (3) edited volumes, the editors select the topic, select and invite the experts (or authors), and identify a publisher. Once secured, a publisher typically offers a cash advance to the editor(s) along with a small percentage of sales going forward in the form of royalties. The publisher may also provide reviewing services for the collection of chapters, and will advertise the edited volume when it is released. The two primary ways for consumers to access the chapters is to (a) purchase the book, or (b) obtain a copy of the book from a library.

With technological advances it is now possible to publish edited volumes without a professional publishing company. Why would someone choose to not use a publishing company? Indeed, they are literally publication experts. Perhaps the biggest reason is that the resulting volume will be open access, or available to anyone with a connection to the internet, free of charge. There are also some career advantages to sharing knowledge open access. Also, a publishing company is simply not needed for all publication projects.

There are very likely many different ways to publish an edited volume without using a professional publishing company. Below, I outline one possibility that involves using the Open Science Framework (OSF). Suggestions for improving these suggested steps are welcome.

Steps to Using the OSF to publish an Open Access Edited Volume

  1. Identify a topic for the edited volume, and then identify a list of experts that you would like to invite to contribute chapters.
  2. If you do not have an OSF account, create one (it is free). Create a new project page for your edited volume, and give it the title of the proposed edited volume. Select one of the licensing options for your project to grant copyright permission for this work.
  3. Draft a proposal for your edited volume (e.g., the need for this particular collection of chapters, goals of the volume, target audience, and so on). Add this file to the project page.
  4. Send an email inviting potential authors, providing a link to your OSF project page so they can read your proposal.
    • You can make the project page public from the start and simply share the link, or,
    • You can keep the project page private during the development of the edited volume and “share” a read-only link to the project page with prospective authors only.
  5. Ask all authors that accepted the invitation to create on OSF account. Then create a component for each individual chapter; components are part of the parent project, but are treated as independent entities in the OSF. Use the proposed title for each chapter as the title of the component. Add the author(s) as administrators for the relevant component (e.g., A. Smith has agreed to author chapter #4; add A. Smith as an administrator of component #4).
  6. Ask authors to upload a copy of their first draft by the selected deadline. Provide feedback on every chapter.
    • One option is to download a copy of the chapter, make edits using the track changes option, and then upload a copy of the edited chapter using the same title as the original in order to take advantage of the “version control” function of the OSF (i.e., all versions of the chapter will be available on the project page in chronological order, with the most recent version at the top of the list).
  7. Ask authors to upload their revised chapter using the same title (again to take advantage of the “version control” function of the OSF).
  8. When the chapters are completed, “register” the project and all components. This will “freeze” all of the files, meaning changes can no longer be made. The registered components, or chapters, represent the final version of edited volume. Then…
    • Make all of the components, as well as the main project registration, public;
    • Enable the “comments” option so that anyone can post comments within each component (e.g., to discuss the material presented in the chapter);
    • Click the link to obtain a Digital Object Identifier (DOI) for each component (i.e., chapter).
  9. Advertise the edited volume
    • Use social media, including Facebook discussion groups and Twitter (among others). Encourage readers to leave comments for each chapter on the OSF pages;
    • Ask your University to issue a press release;
    • Ask your librarian for tips on how to advertise your new Open Access edited volume (librarians are an excellent resource!!).

Prior to following these steps to create your own Open Access edited volume on the OSF (or by using a different approach), there are some pros and cons to consider:

Pros

  • You have created an edited volume that is completely Open Access
  • The volume cost no money to create, no money to advertise, and no money to purchase
  • Given that the chapters are available to a wider audience than a traditional edited volume released by a for profit publishing company, it is likely that they will actually reach a wider audience as well and have a greater scientific impact

Cons

  • You do not receive a cash advance or royalties
  • You do not receive any assistance from a publisher for reviewing or advertising
  • This approach is new compared to traditional publishing, and therefore you may be concerned that you will not receive proper credit from others (e.g., people evaluating your contributions to science when deciding to hand out grant funds, jobs, promotions, and so on)

Final Thoughts

There is usually more than one way to achieve the same aim. Professional publishing companies work with academics to create many edited volumes every year, but creating an edited volume does not inherently require the assistance of a professional publishing company. The purpose of this post was to present one alternative using the functionality of the Open Science Framework to publish an edited volume that is Open Access. I am sure there are even more ways to achieve this aim.

How much Research is Confirmatory Versus Exploratory?

The president of APS is nervous about pre-registration, or the idea of writing down study goals and hypotheses prior to collecting and/or analyzing data. One concern is that we do not have any data on whether or not pre-registration puts limits on exploration within research programs. If researchers are required to pre-register study goals and/or hypotheses, and given that in many instances good ideas are developed after seeing the data (not always before), then many good ideas may never be tested. This is of course a fair question worthy of discussion.*

But what we perhaps need to know first is approximately how much of our collective research is exploratory at present? We know that over 90% of all journal articles report statistically significant effects (no citation required), presumably for hypotheses developed prior to data collection. If so, then these data analyses have been presumably conducted in a confirmatory manner (i.e., to test hypotheses developed prior to data collection and/or analyses). Pre-registering these confirmatory hypotheses should therefore not be problematic or stifle discovery, particular given current options that make pre-registering hypotheses very easy (e.g., the Open Science Framework, aspredicted.org). If these confirmatory hypotheses took time to develop via exploratory research, then this suggests a massive amount of exploratory research is currently not being reported in any publication outlet; this research represents the large part of the iceberg hidden beneath public perception, with the small confirmatory bit of research peeking into public awareness. If so, we should collectively figure out a way to make this large body of exploratory research, and the details of how these explorations helped researchers develop their confirmatory hypotheses, publicly available. This is important stuff!**

To the extent the current literature, however, is not primarily presenting a priori hypotheses and confirmatory data analyses, then it will contain a blend of confirmatory hypotheses and hypotheses developed during and/or after data analyses (i.e., exploration within the research program). Given that over 90% of all journal articles report statistically significant effects, and that not all articles contain sections that clearly delineate confirmatory hypotheses and those developed from exploration with the data being presented, it is therefore an open question of how much research is confirmatory versus exploratory. Pre-registration of study goals and/or hypotheses, both confirmatory and exploratory (and everything in between), may be one way to answer this question. And perhaps before setting up large scale randomized control trials to determine if pre-registration can limit exploration, we should know just how much exploration is actually going on, as well as the links between this exploration and confirmatory hypotheses that are subsequently developed. Many of us seem to agree that exploration is very important, so let’s make an effort to document our explorations more clearly and openly.

 

* Russ Poldrack is on record as not being nervous about pre-registration

** “stuff” is a technical term of course

An Inside Perspective of a Registered Replication Report (RRR)

Update: Dan Simons and Bobbie Spellman discuss this Registered Replication Report, and others, on NPR “Science Friday

In the spring of 2014 we (i.e., Irene Cheung, Lorne Campbell and Etienne LeBel) decided to submit a proposal to Perspectives on Psychological Science for a Registered Replication Report (RRR) focusing on Study 1 of Finkel, Rusbult, Kumashiro and Hannon’s (2002) paper testing the causal association between commitment and forgiveness. The product of over 2 years of work by many people including us, the tireless Dan Simons (Editor of the RRR series), a cooperative and always responsive Eli Finkel (the lead author of the research to be replicated), and researchers from 15 other labs all over the world, is now finally published online (http://www.psychologicalscience.org/pdf/Finkel_RRR_FINAL.pdf). Here is our inside perspective of how the process unfolded for this RRR.

The initial vetting stage for the RRR was fairly straightforward. We answered some simple questions on the Replication Pre-Proposal Form, and provided the rationale for why we believed Study 1 of Finkel et al.’s (2002) manuscript was a good candidate for an RRR (e.g., the paper is highly cited, is theoretically important, no prior direct replications have been published). After receiving positive feedback, we were asked to provide a more thorough breakdown of the original study and the feasibility of having multiple labs all over the world conduct the same project independently. In a Replication Proposal and Review Form totaling 47 pages, we provided information regarding (a) the original study and effect(s) of interest, (b) sample characteristics of original and proposed replication studies (including power analysis), (c) researcher characteristics (including relevant training of the researcher collecting data from participants), (d) experimental design of original and proposed studies, (e) data collection (including any proposed differences from the original study, and (f) target data analysis (of both the original and planned replication studies). After receiving excellent feedback and making many edits, a draft of this document was sent to the original corresponding author (Eli Finkel). Eli very quickly provided thorough feedback, and forwarded copies of the original study materials. He also provided thoughtful feedback throughout the process as we made many decisions on how to conduct the replication study and ultimately vetted the final protocol. The RRR editors eventually gave us the green light to go forward with the project.

We were then required to organize the project. The study was programmed on Qualtrics, the protocol requirements were created, the project page on the Open Science Framework (OSF) was established, and eventually a call went out for interested researchers to submit a proposal to independently run the study and contribute data. It is near impossible to estimate the number of emails sent around between Dan, our team, and Eli during this time, as well as the number of small changes made to all of the materials along the way. Leading up to the fall of 2015, all participating labs were ready to start collecting data. Participating labs simply needed to download the necessary materials from the OSF project page, and Dan provided support to many of the labs throughout the process. Prior to data collection, the study was pre-registered on the OSF. Data collection was complete by January 2016, and it was time to prepare the R code needed to analyze the data from each lab as well as conduct the planned meta-analyses. Our team helped test the code, and then Edison Choe (working for APS) wrote the full set of code (verified by Courtney Sodenberg from the OSF). The code needed to be tweaked many times as well as to make small adjustments. All labs then ran the code with their data, and submitted the data and results to their own OSF pages, while our team wrote the manuscript before seeing the full set of results from all labs. Dan and Eli provided feedback on numerous occasions, and the full set of results was not released to us until the manuscript was considered acceptable by all parties. After the results were released we incorporated them into the manuscript and wrote a discussion section. Eli then wrote a response. After making many small edits, and sending copious amounts of email around to Dan and Eli, the manuscript was complete. All participating labs were then provided a copy of the manuscript to review for any required edits, and asked not to discuss the results with anyone not associated with the RRR until the paper was published online. Not surprisingly, a few more edits were indeed required. When completed, the manuscript was sent to the publisher and appeared online first within a week.

Overall, this was a monumental task. The manuscript can be read in minutes, the results digested in a few quick glances at the forest plots. Getting to this point, however, required the time, attention and effort of many individuals over 2 years. Seeing an RRR through to completion requires a lot of dedication, hard work, and painstaking attention to detail; it is not to be entered into lightly. But the process itself, in our opinion, represents the best of what Science can be—researchers working together in an open and transparent manner and sharing the outcome of the research process regardless of the outcome. And the outcome of this process is a wonderful set of publicly available data that helps provide more accurate estimates of the originally reported effect size(s). It is a model of what the scientific process should be, and is slowly becoming.

From the proposing authors of this RRR:

Irene Cheung

Lorne Campbell

Etienne LeBel

I Think This Much is True*

Here are 12 things that I** think are true regarding the future of how we do science based on meta-scientific discussions over the past few years and observing the varied developments in many different places (many that have sprung up as part of these discussions):

Ten years*** from now,

  1. How we share the results of our scientific efforts will not look the same. The “article” format will very likely continue to be the primary vehicle for sharing research results, but the publishing and distribution of articles will take on many different forms (as is already beginning to occur).
  2. Science will be more open. It seems highly unlikely that researchers will return to only sharing limited output of the research process, at the end of the research process, in print publications. Instead, it seems that mechanisms for sharing even more of the research process will continue to grow. And it will grow in ways not anticipated today.
  3. Trust in the integrity and reproducibility of data will become more important than trust in the research, because data and code will be openly available for the vast majority of articles (perhaps reaching between 75-90%).
  4. Researchers will be officially credited with what are today not yet considered contributions (e.g., shiny-apps, blogs, published data sets, etc.). Publications will always be important, but other types of contributions will be recognized by the scientific community and by University administrators.
  5. Findings that are replicable (whether context specific results, or results obtained across many contexts) will be valued more than those that are “surprising”.
  6. Statistics training in graduate programs will still be insufficient. But, statistics training will be viewed less as a necessary evil and simply necessary.
  7. Computational science will become more popular and prevalent. Some computational approaches include the use of (a) machine learning, (b) simulation of complex models, (c) advanced statistical approaches, and (d) understanding large and complex data sets (e.g. big data).
  8. When researchers call upon theory it will be more frequently to generate hypotheses rather than help explain an obtained pattern of results post-hoc. Hopefully this means that more theories start to be subjected to risky tests, and thus developed in a cumulative manner (but this second part is not a prediction, but rather a wish).
  9. More articles will contain efforts to directly replicate initial results in a line of research. Direct replications will complement, not replace, conceptual replications.
  10. More effort will be made to distinguish between confirmatory and exploratory analyses. Pre-registration of hypotheses and data analytic plans will be a big part of this effort.
  11. There will be a growing use of within-person, compared to between-person, research designs. One reason for the growing popularity of these designs will be the realization of the greater statistical power and sensitivity of these designs.
  12. Because of all of the above, a much larger proportion of published findings will be (demonstrably) replicable across independent labs. The “Many Labs 100”**** project will support this prediction.

It is my sincere wish to be around ten years from now to see the degree to which these things I think are true come to be. It will also be interesting to observe developments that were not on the horizon (or least on my horizon) today!

* sorry Wally Lamb

** not just me actually. Etienne LeBel (@eplebel; curatescience.org) contributed some novel ideas to this post. I also want to see him around ten years from now.

*** seemed like a good timeline

**** hard to predict, really, how many “Many Labs” projects will have been ushered into being in ten years

Access Not Denied

Last year I wrote a blog post about the inevitable discussion co-authors have about what journal to submit manuscripts for possible publication (link: http://wp.me/p6ma5O-l). Shortly thereafter I submitted a manuscript to the relatively new Open Access (OA) journal Collabra. I was asked at some point to provide some feedback regarding my choice to send my manuscript for consideration at Collabra rather than other possible outlets. Here was my response:

“Overall, I feel that the traditional publication model where societies have contracts with private publishers for one or more journals, with the number of contracted print pages determining how many papers can be published per year, and access to published manuscripts being restricted to those with subscriptions, is no longer the best model for advancing scientific discovery. I therefore support alternative publication outlets that are open access, focus on the quality of the research methods to test hypotheses and less on the pattern of results obtained, and are not limited in terms of how many articles can be published each year by arbitrary page limitations. Collabra is one such outlet, and I like to see that scholars such as Simine Vazirie and Rolf Zwaan, as well as many other associate editors, are part of the editorial team. For those reasons I chose to submit my original research to Collabra and will likely do so again.”

The traditional academic journals referred to above undoubtedly publish excellent research, but my reading of the state of science tells me we are moving inexorably toward open access for publicly funded research. Current technology allows for research to be shared with minimal or no cost, with or without pre-publication peer review (and all with post-publication peer review in one way or another). For example, the journal Judgement and Decision Making is open access, encourages open science practices, and has no author fees. To stick around, traditional journals will need to figure out a new business model for making their product freely available, and current OA journals with fairly high user fees will need to figure out ways to reduce or eliminate those fees.

It’s the end of traditional journals as we know it, and I feel fine.

Teaching Reproducibility to Graduate Students: A Hands-on Approach

A few years ago I made changes to the syllabus for my graduate course on research methods in social psychology in response to what is widely refereed to as the “replicability crisis”. The biggest change was to remove the typical “research proposal” requirement that students completed individually and replace it with a hands-on group replication project (you can view the syllabus of the course here: https://osf.io/nxytf/). In 13 weeks (the length of the course), the group replication project proceeds as follows:

  • Week 1: Groups of up to 4 students are established. Their first task is to identify social psychological research published within the past few years that (a) collected data online (e.g., via a University subject pool, or Amazon’s Mechanical Turk) to minimize time on data collection, and (b) is of interest to group members. Students typically share their research interests when introducing themselves to the class, and form groups with students that share similar interests (e.g., close relationships, stereotypes and prejudice, judgement and decision making).
  • Week 2: as a class, we review the research identified by each group, paying close attention to study details (e.g., is it possible to reproduce the study methods and procedures from the methods section alone?; do descriptions of statistical models and results make sense?).Following this discussion, we select one published study for each group to closely replicate.
  • Week 3: Each group writes a “replication report” that attempts to reproduce as best as possible the exact methods, procedures, and statistical models used in the study to be replicated. Groups make note of important study information that is not available in the published manuscript (e.g., instructions for participant recruitment, items for scales created by the authors, how covariates were coded and entered into statistical models). Next, groups draft an email to be sent to the corresponding author discussing the project. After I review and edit the email, the replication report is sent along with this email as well as questions asking for additional information as needed (example of email sent to corresponding authors: https://osf.io/q7ps8/; example of replication report: https://osf.io/nrkej/).
  • Week 4: Groups edit the replication report as needed based on feedback from the corresponding author. For the 5 groups, across two years, that have worked on these projects the corresponding authors have responded with very helpful feedback within days in every instance so far! Groups then submit a research ethics application, and wait (example ethics application: https://osf.io/bncq4/). It typically takes 4-6 weeks to obtain ethics approval.
  • While awaiting ethics approval, groups prepare all study materials and procedures on Qualtrics. Groups are also encouraged to prepare data analytic code for the primary analyses. Shortly after receiving final ethics approval, groups pre-register study hypotheses on the Open Science Framework (OSF; e.g., https://osf.io/exnqu/). This typically occurs around week 10.
  • Following pre-registration, data collection begins via Amazon’s Mechanical Turk (other options are available, but we have used Mturk to this point). Data is typically collected with 48 hours. To date I have funded data collection for each project.
  • Each group analyzes their data, conducts follow-up analyses as needed, and prepares a written report to be submitted at the end of the course (week 13).
  • After all reports are graded, and the course is officially over, I meet with each group to discuss the next steps for their projects. If needed we contact the corresponding author of the original study to ask for additional study information. In all instances we collect at least one additional large sample of participants to provide a more thorough test of hypotheses (in some instances we have collected three or four additional samples, adding additional conditions or sampling from a different population). When appropriate the results of the replication samples and original study are meta-analyzed (e.g., https://osf.io/y86zp/).
  • I work with each group to prepare a manuscript for publication in a peer reviewed journal. Presently one manuscript is in press at the Journal of Experimental Social Psychology (see pre-print here: http://ir.lib.uwo.ca/psychologypub/102/; two replication samples with ~500 participants total). Four other manuscripts will be submitted very soon; these four manuscripts attempt to replicate four different published studies, and contain data from 12 replication samples with over 2000 participants.

Lessons Learned by the Students:

  1. It is not possible to completely reproduce the methods and procedures of the published studies selected to date without requiring additional information from corresponding authors. This surprises the students, and allows for in class discussions of how to make their own research more reproducible by others (e.g., posting study materials and procedures on the OSF).
  2. The data analytic approach used in the original research is also often not described in enough detail to reproduce without asking questions of the corresponding author.
  3. The students take great care to work with original authors in a respectful manner, and to reproduce the study design as closely as possible. This approach has resulted in the original researchers being very helpful and responsive.
  4. When the results from their replication attempts (with large samples) are not similar to the published results the students are genuinely surprised.* This allows for a class discussion on the possible reasons for the inability to replicate published results. It also highlights to them the importance of replicating their own results, when feasible, prior to submitting manuscripts for publication.
  5. The students learn that the methods and results sections of published papers are very, very valuable. For example, they now focus more attention on statistical power, if the paper includes a direct/close replication or not, and the effect size of key findings. They also pay more attention to the reproducibility of the methods and procedures as discussed in the paper. If a given study was low powered, not replicated, and yielded a large effect size, and furthermore if the methods and procedures as discussed are not reproducible, they are now more likely to question the conclusions of the authors as put forward in the discussion section.

Overall, the students develop what I feel is a healthy skepticism of results from one study (someone else’s study or their own). They learn the importance of statistical power first hand, and that published effect sizes may overestimate the true effect size. They also learn to value the importance of sharing study materials and procedures to increase the reproducibility of their own research. In my opinion, lectures and readings alone are not as effective at teaching these lessons compared to running an actual close replication of published research.

* Of the five replication projects completed in my class to date (containing a total of 13 large independent samples), evidence consistent with the original published research has been obtained in 1 project (20%)

What if I can’t do Open Science?

Note: This post was written for the March 2016 newsletter of the Australian Psychology Society’s Psychology of Relationships Interest Group (PORIG). I have made a few small changes, and now include the following link to a talk I gave discussing issues similar to this post at the “Navigating the New Era of Social & Personality Psychology” preconference of the 2016 SPSP main conference: https://www.youtube.com/watch?v=QdUtnA8vUn8 (audio is a bit wonky).

This is a question I hear from time to time, particularly from relationship science scholars. Although the bulk of extant relationship research involved data collected from one individual involved in a relationship, typically at one point in time (see Kashy, Campbell & Harris, 2006), a sizeable minority of the field’s work involves data that is much more complex (e.g., longitudinal, self-report and observational dyadic data). Adding to this complexity is the time and expense associated with recruiting dating and/or married couples for these studies, the difficulty of obtaining and coding behavioural/interactive data, and in some cases the expense associated with obtaining particular measures (e.g., hormonal assays of saliva and/or blood). It is often challenging to obtain large samples necessary to increase statistical power and thus reduce the probability of Type II errors, and the false-discovery rate, for these complex studies. And studies of this magnitude rarely set out to test one pre-specified hypothesis; instead, these projects will collect a large amount of data across a number of measures/constructs with the goal of testing many hypotheses to be forged in the future. It is these types of projects (ones in development, those being run presently, as well those already completed that offer a large amount of available data) that are often referred to when researchers inquire, “What if I can’t do open science?”.

What is open science? Briefly, “open science” refers to the public sharing of all aspects of the research process (for more details see: https://en.wikipedia.org/wiki/Open_science; see also https://osf.io/3swkp/). This sharing involves, for example, (a) publicly disclosing study hypotheses prior to actually testing them, (b) making available all of the study materials and procedures (e.g., on the Open Science Framework, https://osf.io/), and (c) publicly disclosing a data analytic plan (i.e., how you plan to test your hypotheses given the measures/procedures of your study). Is it possible to “do” open science when implementing complex study designs as discussed above that are enacted by many relationship scholars?

In my opinion, yes; not only is it possible, it is being done. The complex designs employed by relationship scholars (and others), however, do require unique open science solutions. I, along with my colleagues Timothy Loving and Etienne LeBel, suggested many unique solutions for these complex designs in a paper we published in Personal Relationships in 2014. Instead of reiterating the points we made in that paper, I want to instead briefly share how my research team has engaged in open science practices for different types of research projects in the field of relationship science.

(1) If my research is largely exploratory, how can I publicly disclose hypotheses and data analytic plans?

When research projects are largely exploratory, you can share your study materials as you can with confirmatory research projects. You can also briefly state that your research project is meant to explore possible associations among certain study variables, and the reason(s) for the exploratory nature of the study. You can also provide, if appropriate, a set of guidelines for how you plan to explore the data collected (templates for different types of disclosures, for both confirmatory and exploratory research, can be found here: https://osf.io/m7f8d/).

For example, Kiersten Dobson (a graduate student in my lab), has posted the following information on the OSF (link here: https://osf.io/4xcpy/): description of the study (including planned sample and analytic goals for the exploratory analyses), study materials, and methods. She then posted a copy of the obtained data set, and discussed the follow-up research currently being conducted that followed from the results of the initial exploratory study.

(2) What if the data I am using to test my hypotheses comes from a large dataset that already exists?

In this instance, if the dataset is not your own it may not be possible to publicly post all of the study materials and methods. You can, however, post a document that outlines all of the measures you plan to use from this dataset to test your hypotheses (note that it’s not necessary to include this information in your manuscript; rather, post this information on the OSF and simply link to it in your manuscript). You can also disclose the hypotheses you plan to test and the proposed data analytic plan. If the dataset is your own, you can also post a copy of all study materials.

Early in 2015 a graduate student from the VU Amsterdam (Asuman Buyukcan-Tetik; now a Professor at Sabanci University, Istanbul) visited my lab for three months. We proposed testing new hypotheses by analyzing already existing data—a large dataset collected under the supervision of the PI Catrin Finkenauer. Prior to conducting analyses, Asuman publicly posted, and pre-registered (i.e., the files are “frozen” and cannot be edited or deleted), information about (a) the project and hypotheses, (b) the method, and (c) the strategy of planned analyses. I want to point out that the strategy of analysis contained a few different options that were dependent to some degree on the outcome of the initial analyses planned (i.e., we were only partly certain of what we expected to find; link here: https://osf.io/d7x2p/).

Also in 2014, Rhonda Balzarini, a graduate student in my lab, was given the opportunity to use a large dataset (over 3000 participants) collected by a group of researchers based in Universities throughout the USA (PIs, Bjarne Holmes, Justin Lehmiller, Jennifer Harman, and Nicole Atkins). Rhonda and I met regularly for about two months in the fall of 2014 to discuss our research interests with respect to the dataset and to derive specific hypotheses prior to looking at any of the actual data (i.e., no peeking). Rhonda then publicly disclosed (and pre-registered), prior to analyses, our hypotheses, the methods and measures used in our analyses (link here: https://osf.io/vs574/).

(3) What if in my study I plan to test more than one set of hypotheses? Also, maybe I don’t know what the other hypotheses are yet so how could I possible “pre-register” them?

Big, complex dyadic studies, as mentioned above, rarely set out to test one set of hypotheses. It is entirely possible, however, to make all study materials/procedures publicly available, and to disclose the first set of planned hypotheses along with a data analytic plan. This is the approach taken by Taylor Kohut, a post-doc in my lab, for a large scale longitudinal dyadic study with an experimental intervention (3 conditions) initiated at the midpoint of the study. Instead of trying to explain the study here, I will simply refer you to all of the study information (and I do mean ALL of it) posted on the OSF: https://osf.io/yksxt/ (including study rationale, methods/measures, analytic strategy, and our recruitment plan). It is guaranteed that we will develop new hypotheses in the future, hypotheses that have not yet been considered or discussed. When that time comes we will simply add a new component to the OSF project that discusses our new hypotheses and data analytic plan, and what variables from the original study we plan to use.

(4) What if I can’t make data available? Or code?

There are at least a few concerns with sharing data: (1) other people can use it and benefit from your efforts, and (2) what if participants can identify their own, or their partner’s, sensitive data? With respect to the first concern, the OSF allows users to create a DOI for all files, including datasets, so that if someone does choose to use your data they can properly cite the dataset. Additionally, users on the OSF can license their datasets in seconds, making it legally mandatory for anyone using the dataset to properly cite its use. With respect to the second concern, there are many ways to de-identify data sets, and ways to restrict the use of datasets to other researchers (e.g., post the data on the OSF to a private project page or component, and then grant access to that page or component when asked by other researchers). This is admittedly a big issue that warrants a much longer discussion, and is something we discuss in more detail in a paper in press at the Journal of Personality and Social Psychology (LeBel, Campbell, & Loving, in press; for a pre-accepted draft of this manuscript click here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2616384).

In my lab we are moving toward posting data needed to reproduce the results of any analyses reported in our manuscripts, and we also post the code/syntax that we used to run our models. At this link https://osf.io/ryfse/ you can find data sets and code needed to reproduce the analyses in a paper in press at the Journal of Experimental Social Psychology, and at this link https://osf.io/me7jp/ you can find the data files (available upon request, but they are physically present in a component linked to this OSF page) and code (for SAS) needed to reproduce the analyses presented in this publication: http://www.collabra.org/articles/10.1525/collabra.24/. One benefit of this practice is that we independently run all study analyses to ensure we can reproduce the results reported in the manuscript prior to submitting for peer review.

There are undoubtedly other scenarios not discussed here that require novel open science solutions. My goal was to share a few of the questions I have heard most often, and show some of the answers we have come up with in my lab. Open science can be done with complex study designs within the field of relationship science, because it is being done. I therefore suggest that the question I posed at the outset of this piece be changed from “What if I can’t do open science?” to “How can I do open science with this study?”.

We now have the tools available to move away from the more “closed science” practices that have been typical of our field to this time. Using these tools is of course a choice, but not using them is also a choice. I have chosen to engage in open science practices for my own research going forward. My experiences so far suggest to me that open science practices have not stifled my creativity, limited what I choose to study, limited the exploration of ideas, or otherwise burdened my ability to discover new things (such as they were).

References

Campbell, L., Loving, T. J., & LeBel, E. P. (2014). Enhancing transparency of the research process to increase accuracy of findings: A guide for relationship researchers. Personal Relationships, 21(4), 531-545.

Kashy, D.A., Campbell, L., & Harris, D.W. (2006). Advances in data analytic approaches for relationships research: The broad utility of hierarchical linear modeling. In A. Vangelisti & D. Perlman (Eds.), The Cambridge Handbook of Personal Relationships (pp. 73-90). New York: Cambridge University Press.

Why are Top Journals Top Journals?

Researchers need to publish manuscripts to both advance science and their careers. This latter fact was made clear recently as I was a member of a departmental committee tasked with evaluating the performance of each faculty member. When it came time to discuss publications there were two themes: (1) how many publications did this person have in the given time frame (we “expect” approximately 3 per year), and (2) were they published in top journals? No mention of the actual research conducted, or the strength of the methods employed, because the committee was not asked to read any of these publications. The number of publications and the prestige of publication outlet therefore served as proxies for research performance.

I want to focus this discussion on the prestige of academic journals. What makes a journal a Top Journal? We all seem to know one when we see one, but specifically defining why one journal is better than another is tricky. When I ask this question of others a variety of factors are discussed, such as (a) rejection rate (higher rejection rate seems to equal better journal), (b) a place where the successful academics tend to publish their work, (c) the impact of the research published in a journal on the rest of the field, (d) visibility of the journal (i.e., are most people in the field familiar with the journal?), (e) perception that the research needs to be particularly novel/ground-breaking to be published in the journal, and so on. These all seem like reasonable points, but it does suggest a fairly static hierarchy of journal prestige, and the quality of the research discussed in particular manuscripts within each journal is implied as a function of the prestige of the journal.

I then conducted some Internet searches to see how journal prestige is assessed. Most ranking systems that I could find rely largely on citation counts of articles published within a journal (e.g., the much loved, and loathed, impact factor). For example, the SCImago journal ranking provides information on thousands of journals, including a long list of psychology journals that can be ranked on a few different factors, including: SJR (a measure of “prestige”), H-index (number of articles in a journal cited a given number of times), and Impact Factor. A great feature of this site is that you can download the rankings in an excel file. I downloaded the file, and decided to add some ranking information recently put together by Uli Schimmack: the R-index of the journal.* I direct readers to Uli’s blog to learn more about the R-index and how it is calculated, but briefly it uses the estimates of post-hoc power calculated for each published article, and R-index scores increase when power is higher and decreases when publication bias is present (according to Uli, and awaiting verification). So, it is calculated based on information presented in each paper published in a given journal (i.e., based on p-values calculated from reported statistical tests that are then used to estimate post-hoc power; see also the N-pact factor), rather than on how many times each paper published in a given journal is cited by others.

So, do indices based on citation counts correlate with an index based on post-hoc power of published articles? No. The scatterplot below indicates a slightly negative relation between the R-index and IF of a journal. There is R code available to recreate this scatterplot and to to calculate correlations with the SJR and H-index as well here (spoiler alert—the R-index does not significantly correlate with them either).

So, Top Journals do not seem to publish studies with relatively more post-hoc power and thus results more likely to replicate compared to lower tier (dare I say “specialty”) journals (at least according to the R-index). Is journal prestige therefore merely popularity?

* the R-index currently ranks 54 psychology journals (more to come I believe). The R-index was calculated for each section of JPSP, whereas the SCImago rankings gives scores for the entire journal; I therefore selected the highest R-index score for JPSP. A few journals listed in the R-index ranking were not included in the SCImago rankings. Overall, a total of 50 R-indices were entered into the data file. Also, I used the R-index calculated for articles published in journals between 2010-2014 to be more consistent with the time frame of the SCImago rankings (based on 2014 numbers)

Journal IF and Rindex