The Challenge of Manual DOI Creation
Managing research data efficiently is crucial in academic institutions, and Digital Object Identifiers (DOIs) play a key role in ensuring research outputs are easily accessible, citable, and trackable. However, the process of creating DOIs manually can be time-consuming, error-prone, and resource-intensive.
At Research Data Services our team is responsible for generating DOIs for a growing number of research datasets in the institutional repository. The process required manually formatting metadata, ensuring data accuracy, and uploading records individually to DataCite. This approach was not only inefficient but also introduced the risk of inconsistencies in metadata handling.
Recognizing the need for a streamlined solution, we developed an automated workflow to optimize the DOI creation process.
The Solution: Automating DOI Creation
To improve efficiency, we designed a system that automates the key steps involved in DOI generation and metadata management. This workflow includes:
- Extracting metadata from the MINDS@UW repository
- Cleaning and formatting data to align with DataCite standards
- Validating metadata to ensure completeness and accuracy
- Bulk uploading data to generate DOIs efficiently
- Generating a final report summarizing DOI statuses and any errors
By eliminating manual intervention in these steps, the system significantly reduces processing time and enhances accuracy. Cameron Cook, Data & Digital Scholarship Manager, notes that “this solution has saved hours for staff, especially in multiple recent instances where they had bulk items to upload to the repository. The process was also easy to learn and implement for staff.”
How the Automated Process Works
Step 1: Extracting Metadata
- The system retrieves metadata, including titles, authors, publication dates, and descriptions from the MINDS@UW repository.
Step 2: Data Cleaning & Formatting
- The raw data often contains inconsistencies, such as missing fields or incorrect structures.
- The automation script rearranges, formats, and validates the data to meet DOI registration requirements.
Step 3: Bulk DOI Creation
- The cleaned and formatted metadata is uploaded in bulk through a separate script that utilizes the DataCite API, eliminating the need for manual individual entries.
Step 4: Status Reporting
- After upload, the system generates an Excel report that provides a status update for each DOI (successful, pending, or failed).
- Any failed records are flagged, allowing for a quick resolution.
Impact & Key Benefits
- Reduce Manual Effort – The automation significantly reduces the time required to format and upload metadata. Reducing the manual time from 4 hours to 15 minutes
- Improved Data Accuracy – Automating metadata validation minimizes errors and ensures compliance with DOI standards.
- Scalability & Efficiency – The system can handle large volumes of research data, making it easier to manage growing repositories.
By implementing this automation, our team has transformed the DOI registration process into a seamless, scalable workflow that supports researchers more effectively.
Future Enhancements & Next Steps
While the current system has streamlined DOI creation, we aim to further enhance efficiency by Integrating API-based real-time DOI generation to reduce dependency on batch processing and further limit manual interventions.
Conclusion
Automating DOI creation has been a significant step forward in improving research data management for the repository. By leveraging automation, we have reduced manual workload, enhanced data accuracy, and ensured a more efficient workflow for DOI generation.
For institutions managing large-scale research repositories, similar automation solutions can enhance efficiency, improve metadata quality, and streamline research data accessibility.
Rahil Virani is currently pursuing his Master’s Degree in Information Science. Through his role as a Research Data Analyst & Initiatives Assistant for UW-Madison Libraries, he provides technical and data support for researchers and teams, particularly focusing on MINDS@UW and Research Data Services (RDS). His passion lies in improving library digital services through automation and data-driven research, aiming to enhance efficiency and reach to the diverse UW student community. His work involves gathering data and delivering actionable insights to support strategic planning and targeted initiatives