Tool: Tabula

Information adapted from the Tabula website.

What is Tabula?

If you’ve ever needed data that only exists in a PDF format, you’ve likely discovered that you can’t easily copy and paste the data, which makes being able to actually use it difficult.  Tabula is a free, open-source tool you can use for “liberating data tables locked inside PDF files.”

For an example of Tabula being used to extract data for a visualization project, check out this blog post by the Jane Speaks Initiative. Other examples can also be found on the Tabula website.

What can Tabula help you do?

Tabula runs in your web browser, making it easy to browse to the PDF containing the data you need, select the portion of the PDF containing the data tables, and then easily extract the data from the tables into a CSV file or a Microsoft Excel spreadsheet.

How do you get it?

You can download Tabula for free from its website. It is also available on GitHub.

What else should you know?

Tabula works only with text-based PDFs; the developers note that it will not work with scanned documents. Tabula is available for Windows, Mac OS X, and Linux operating systems.