The Rebecca J. Holz Series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.
On October 12th, Matthew Garcia, a PhD candidate in Forest Science with the Dept of Forest & Wildlife Ecology at UW-Madison, gave a talk entitled “The accurate data management plan (if such exists) in the presence of ‘Big Data'”. His slides are embedded below and are also available on the Research Data Services Speakerdeck page.
As data sets grow ever larger and funding requirements to share them become more common, Garcia described two important but often missing pieces of data management plans: where will this”big data” be stored, and who will pay for it? He used his own research as an example, noting that when he began his project, he anticipated gathering approximately 5 terabytes of data to work with. He ended up with approximately 40.
In order to deal with these issues, Garcia suggested that data management plans should not be isolated from proposal budgets but rather cost considerations for data should be explicitly included, covering things like sharing, storing, administration, personnel time, etc. This should also plan for contingencies, for example, when the volume of data is larger than expected, as was the case with Garcia’s research. Garcia believes a data management plan that anticipates the costs and issues associated with “big data” is more complete, useful, and can help reveal potential problems around data management and dissemination at the time of proposal.