Happy New Year’s! The start of a new year and a new semester are as good a time as ever to evaluate your data management practices. Here are some reminders about data management best practices, groups on campus who can help you with managing your data, and some upcoming opportunities for you to sharpen your skills.
What Is Kaggle?
Kaggle is an online community of data scientists and machine learners, owned by Google. Kaggle began in 2017 as a site that offered machine learning competitions, and has since expanded into a public data sharing platform, as well as a host for machine learning educational services.
From September 3-5, the Workshop on Open Citations was held in Bologna: researchers, scholarly publishers, funders, policy makers, and advocates for open citations gathered to present new tools and practices for the creation, management, and reuse of citation data, and to participate in a hackathon.
Information from DMPTool
Research Data Services is excited to share that DMPTool released version 3 on February 27, 2018! For those unfamiliar with DMPTool – it is a tool that can help you understand the data management plan (DMP) requirements from federal funders, write your own DMP, and share your DMP with others.
DMPTool noted that the new version includes the following updates –
- “Up-to-date Funder requirements (34 templates and more to come…)
- A searchable list of participating institutions with & without single sign-on
- New user interface with streamlined plan writing pages
- Website translations coming soon (Spanish, French, German, Brazilian Portuguese, Japanese, contact us to contribute another language!)
- Quick-start guide for creating a DMP (downloadable pdf coming soon)
- General data management guidance”
To access DMPTool with your UW-Madison NetID, visit DMPTool and click “Sign In” on the upper-right hand corner of your screen. From the drop down menu that appears, select option 1, “Your Institution”. Type “Wisconsin” into the text box that appears and select “University of Wisconsin-Madison” from the options and select “Go”. From there the NetID process should appear as usual.
RDS team will be updating the DMPTool with more UW-Madison specific help in the future, so be sure to keep an eye on the blog for that announcement! Until then, if you have any questions about DMPTool, feel free to contact us!
Content is adapted from the Love Your Data website.
As we reach the last few days of Love Your Data Week, let’s talk about a harder topic – data sharing. Sharing is a great way to give and get credit – it’s also required by some federal funding agencies. Today’s post will introduce you to key components of sharing and provide an activity to help you become comfortable with it. If you have any questions or want to let us know how you shared your data, reach out to us on Twitter!
Respect Your Data – Give & Get Credit
Data are becoming valued scholarly products instead of a byproduct of the research process. Federal funding agencies and publishers are encouraging, and sometimes requiring, researchers to share data that have been created with public funds. The benefit to researchers is that sharing your data can increase the impact of your work, lead to new collaborations or projects, enables verification of your published results, provides credit to you as the creator, and provides great resources for education and training. Data sharing also benefits the greater scientific community, funders, the public by encouraging scientific inquiry and debate, increases transparency, reduces the cost of duplicating data, and enables informed public policy.
There are many ways to comply with these requirements – talk to your local librarian to figure out how, where, and when to share your data.
- Share your data upon publication.
- Share your data in an open, accessible, and machine readable format (e.g., csv vs. xlsx, odf vs. docx, etc.)
- Deposit your data in a subject or institutional repository so your colleagues can find and use it.
- Deposit your data in your institution’s repository to enable long term preservation.
- License your data so people know what they can do with it.
- Tell people how to cite your data.
- When choosing a repository, ask about the support for tracking its use. Do they provide a handle or DOI? Can you see how many views and downloads? Is it indexed by Google, Google Scholar, the Data Citation Index?
Things to Avoid
- “Data available upon request” is NOT sharing the data.
- Sharing data in PDF files.
- Sharing raw data if the publication doesn’t provide sufficient detail to replicate your results.
Take the plunge and share some of your data today! Check out the list of resources below, or contact your local librarians to get started.
If your data are not quite ready to go public, go check out 1-2 of the repositories below and see what kinds of data are already being shared.
If you have used someone else’s data, make sure you are giving them credit. Take a minute to learn how to cite data:
by Brianna Marshall, Digital Curation Coordinator
This is part three of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW or read the second post, focused on data repository Dryad.
To help you better understand your options, here are the areas I will address for each platform:
- Background information on who can use it and what type of content is appropriate.
- Options for sharing and access
- Archiving and preservation benefits the platform offers
- Whether the platform complies with the forthcoming OSTP mandate
figshare is a discipline-neutral platform for sharing research in many formats, including figures, datasets, media, papers, posters, presentations and filesets. All items uploaded to figshare are citable, shareable and discoverable.
Sharing and access
All publicly available research outputs are stored under Creative Commons Licenses. By default, figures, media, posters, papers, and filesets are available under a CC-BY license, datasets are available under CC0, and software/code is available under the MIT license. Learn more about sharing your research on figshare.
Archiving and preservation
figshare notes that items will be retained for the lifetime of the repository and that its sustainability model “includes the continued hosting and persistence of all public research outputs.” Research outputs are stored directly in Amazon Web Service’s S3 buckets. Data files and metadata are backed up nightly and replicated into multiple copies in the online system. Learn more about figshare’s preservation policies.
The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). All items uploaded to figshare are minted a DataCite DOI, so as long as your data is set to public it is a good candidate for complying with the mandate.
Have additional questions or concerns about where you should archive your data? Contact us.