Weeks 8 and 9: Sharing Data as well as Code/Syntax

In week 8 of this course I discussed the importance of sharing data (when appropriate) and the code/syntax used to produce the results presented in a manuscript. This is something I have discussed in a prior blog post. Rather than dive into the details of how this can be accomplished I will instead ask the reader to imagine times they finished up a manuscript that included data analyses to submit for review. When analyzing data, you, the researcher, had (1) a data set, (2) knowledge of how that data set is organized, (3) knowledge of how all variables of interest were collected and/or computed, and (4) data analytic software that could access the data set and execute analyses of interest. All of the results you selected to present in your manuscript required code/syntax to generate. This code/syntax was either written by you, the researcher, or by the program you used if you opted to use drop down menus to make analytic selections. At any rate, prior to completing that manuscript to submit for peer review there was a time when you had access to and knowledge of the data set being analyzed as well as all of the code/syntax used to generate all of the results selected to be presented in your manuscript. The take home message from this lecture–share all of this information in some way so that others can evaluate and reproduce your results. A lot has been written and published on how to do this. So just do it.

In week 9 I was fortunate to have Seth Green visit my class, who at that time was a Developer Advocate at Code Ocean. Code Ocean is one platform to share data and code such that others can re-run your analyses (or create their own analyses with your data) in the cloud without needing to download anything or have access to the multiple data analytic software platforms supported by this site. Also, once you are able to run your own analyses on this site and then share with others, the analyses will always run going forward. When preparing for this class I discovered that my 1 year old R code for a published paper would no longer run because of an update to a needed package that “broke the code” in a function that computed scale scores, and I have heard many others share stories of code no longer working after making updates to R, Rstudio, and/or packages to be used within R. I am guessing this issue exists for other data analytic platforms as well. A former graduate student in our graduate program, Dr. Rebecca Koessler, published a capsule on Code Ocean that you can review here.

Again, the take home message from this class was that data as well as code/syntax existed in a way for you, the researcher, to generate everything you placed in your manuscript and that with a little work it is fairly easy to share all of that with others for evaluation or reuse. I suggested that as a researcher one should adopt the mindset of being open by default and only withhold study information (in this case data as well as code/syntax) when a solid reason exists to do so. The traditional “closed by default” approach needs to be shelved.

Leave a Reply

Your email address will not be published. Required fields are marked *