Information adapted from the Tabula website.
What is Tabula?
If you’ve ever needed data that only exists in a PDF format, you’ve likely discovered that you can’t easily copy and paste the data, which makes being able to actually use it difficult. Tabula is a free, open-source tool you can use for “liberating data tables locked inside PDF files.”
For an example of Tabula being used to extract data for a visualization project, check out this blog post by the Jane Speaks Initiative. Other examples can also be found on the Tabula website.
What can Tabula help you do?
Tabula runs in your web browser, making it easy to browse to the PDF containing the data you need, select the portion of the PDF containing the data tables, and then easily extract the data from the tables into a CSV file or a Microsoft Excel spreadsheet.
How do you get it?
You can download Tabula for free from its website. It is also available on GitHub.
What else should you know?
Tabula works only with text-based PDFs; the developers note that it will not work with scanned documents. Tabula is available for Windows, Mac OS X, and Linux operating systems.
Written by Chiu-chuang Lu Chou; Information adapted from OpenICPSR
OpenICPSR is a self-serving data repository for researchers who need to deposit their social and behavioral science research data for public access compliance. Researchers can share up to 2 GB data in OpenICPSR for free. Researchers prepare all data and documentation files necessary to allow their data collection be read and interpreted independently. They also prepare metadata to allow their data be searched and discovered in ICPSR catalog and major search engines. A DOI and a data citation will be provided to the depositor after data are published.
Depositors will receive data download reports from OpenICPSR. All OpenICPSR data is governed by the Attribution 4.0 Creative Commons License. Server-side encryption is used to encrypt all files uploaded to OpenICPSR. Data deposited in self-deposit package are distributed and preserved as-is, exactly as they arrive without the standard curation and preservation features available to professional curation package.
OpenICPSR offers Professional Curation Package to researchers, who like to utilize ICPSR’s curation services including full metadata generation and a bibliography search, statistical package conversion, and user support. The cost of professional curation is based on the number of variables and complexity of the data. To learn more about OpenICPSR, please visit their website
Written by Heather Wacha
Documenting DH is a project from the Digital Humanities Research Network (DHRN). It consists of a series of audio interviews with various humanities scholars and students around the University of Wisconsin-Madison campus. Each interviewee is given a chance to talk about how they view data, work with data, manage data, or teach data to others. Most recently, we interviewed Dorothea Salo, a faculty associate in the iSchool at the University of Wisconsin, Madison. She is also the founder and director of RADD (Recovering Analog and Digital Data). Her commitment to the preservation of data has created a career in which she has been instrumental to the digital community at the University of Wisconsin. Her interview is now accessible on the DHRN website.
In this series, members of the RDS team share links to research data related stories, resources, and news that caught their eye each month. Feel free to share your favorite stories with us on Twitter @UWMadRschSvcs!
A new tool called Seek & Blastn is being developed to find errors in research involving nucleotide sequences.
Tracking university students’ library usage can help improve services, but it raises important privacy concerns. Read how some universities are using library data here.
Looking for tips on organizing data in spreadsheets? This article may help.
OCLC Research is working on a project to examine RDM at four research universities.
Digital Preservation Coalition released “a ‘Bit List’ of Digitally Endangered Species” categorizing the risk levels of different types of digital materials.
twarc, a command line tool for archiving Twitter JSON, now is able to output CSV files.
Interested in working with your data in R? Check out the Programming Historian lessons on “R Basics with Tabular Data” and “Data Wrangling and Management in R“.
Information adapted from CodingBat Python website.
What is CodingBat Python?
CodingBat Python is a website that offers Python coding problems you can work through for practice (it also offers Java problems). It was created by Nick Parlante, a computer science lecturer at Stanford. It’s geared towards beginners, although some knowledge of Python is required. The website notes that these problems are the sort you’d encounter in a first or second computer science course.
The Rebecca J. Holz Series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.
On November 15, Morton Ann Gernsbacher, Vilas Research Professor and Sir Frederic Bartlett Professor at UW-Madison, gave her talk titled “Benefits of Open Data and Open Stimuli”. Her slides are embedded below. There is a growing trend among scientists to ensure their research is reproducible by increasing its transparency, and Professor Gernsbacher described four ways researchers can do this: preregistering the study, providing open materials, sharing open data, and supporting open access.
Information from the position listing on the Jobs at UW site.
UW Madison’s new Data Sciences Hub (DS Hub) is seeking a Data Science Facilitator! See below for the position summary. To view more information about the position as well as requirements and qualifications, visit the listing on the Jobs at UW site.
The Data Science Hub at the Wisconsin Institute for Discovery (WID) provides a focal point for programs dedicated to research and application of modern techniques to the management, storage, and analysis of complex data sets. The Data Science Hub (DS Hub) is seeking an individual to advance the research activities of faculty members, students, and staff in a broad range of scholarly disciplines that rely on data science methods. The Data Science Facilitator will consult with researchers on campus to recommend appropriate solutions to data science problems impeding their research. The successful candidate will gain a wide range of skills at this job and will have the opportunity to work with experts in a range of research areas and data-centric technologies through Data Science Hub partnerships.
This position will work closely with personnel at the Data Science Hub and its on-campus partners, which include but are not limited to: the Advanced Computing Initiative, the Bioinformatics Resource Center, the Biometry program, the Center for High Throughput Computing, the Center for Predictive Computational Phenotyping, the Humanities Research Bridge, Research Data Services, the Social Sciences Computing Cooperative, and many others.