Week 7: Data Management Plans

 

I admit that up until not too long ago I was unfamiliar with the term “data management plan”. In graduate school I was not trained specifically in data management. Rather, collecting, handling, storing, and sharing data were tasks left for me to decide how best to handle, and I usually did so on the fly. The result is that there are data sets I can no longer find (probably on a floppy disk in a box somewhere’ish), and some data sets I can find I can no longer interpret given the absence of annotation or meta-data. I also used different strategies for naming files and study variables than my grad student colleagues working on similar projects in the same lab (i.e., lack of consistency within the lab). Yuck. But academics typically become famous because of the cool positive results they report, not because they are meticulous with curating their data sets for future use. Given that as a field we rarely share our data publically, or even privately when requested, it is likely seen by many a waste of time to think about data management.

That type of thinking, however, is both wrong and problematic. Here are some reasons why:

  • A lot of data is generated every single year by thousands of researchers, and it is huge waste that most of it is hidden away from other researchers as well as the public. Others could put it to good use, but this orphan data is likely never to see the light of day. Imagine a shirt maker manufactures 100 shirts every day, but only sells 5 of them to customers and throws the rest in a pile in a warehouse. In a short amount of time, there are exponentially more shirts in the warehouse than on people’s backs. A lot of time and energy is wasted manufacturing and storing the 95 shirts each day that will never be worn.
  • Researchers use the results of data analyses to persuade others about relations among variables. When persuaded, these results form the building blocks of theories and can also be used to set policy by government agencies. The only way to properly evaluate the claims made by researchers is to have access study materials, procedures, data, and data analytic code (among other things). Without such access, trust is a surrogate for information, and proper evaluation of scientific claims is impossible.
  • Your future self, and future lab members, will benefit immensely from good data management.

That is why in this week of the class we discussed the importance of data management plans. In short, “Research Data Management is the process of organizing, describing, cleaning, enhancing and preserving data for future use by yourself or other researchers.”

I am thankful to the assistance of librarians at Western University for meeting with the class to review a data management template that was put together by the Portage Network. I provide the template below. Ideally, researchers answer each of these questions prior to data collection and then store the document in a repository of their choice (e.g., the Open Science Framework). Writing a data management plan for all projects that collect unique data is an excellent habit for all researchers to develop.

Template Made Available by Portage Network

DMP title

Project Name:

Principal Investigator/Researcher:

Description:

Institution:

Data Collection

What types of data will you collect, create, link to, acquire and/or record?

What file formats will your data be collected in? Will these formats allow for data re-use, sharing and long-term access to the data?

What conventions and procedures will you use to structure, name and version control your files to help you and others better understand how your data are organized?

Documentation and Metadata

What documentation will be needed for the data to be read and interpreted correctly in the future?

How will you make sure that documentation is created or captured consistently throughout your project?

Storage and Backup

What are the anticipated storage requirements for your project, in terms of storage space (in megabytes, gigabytes, terabytes, etc.) and the length of time you will be storing it?

How and where will your data be stored and backed up during your research project?

How will the research team and other collaborators access, modify, and contribute data throughout the project?

Preservation

Where will you deposit your data for long-term preservation and access at the end of your research project?

Indicate how you will ensure your data is preservation ready. Consider preservation-friendly file formats, ensuring file integrity, anonymization and deidentification, inclusion of supporting documentation.

Sharing and Reuse

What data will you be sharing and in what form? (e.g. raw, processed, analyzed, final).

Have you considered what type of end-user license to include with your data?

What steps will be taken to help the research community know that your data exists?

Responsibilities and Resources

Identify who will be responsible for managing this project’s data during and after the project and the major data management tasks for which they will be responsible.

How will responsibilities for managing data activities be handled if substantive changes happen in the personnel overseeing the project’s data, including a change of Principal Investigator?

What resources will you require to implement your data management plan? What do you estimate the overall cost for data management to be?

Ethics and Legal Compliance

If your research project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?

If applicable, what strategies will you undertake to address secondary uses of sensitive data?

How will you manage legal, ethical, and intellectual property issues?

This document was generated by DMP Assistant (https://assistant.portagenetwork.ca)

Additional Resources

Presentations on engaging scientists in good data management

Blog post on data management best practices.

Leave a Reply

Your email address will not be published. Required fields are marked *