Definition

A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it.

Relevance

In order to read most types of digital data, you need to open it in a compatible software. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If there is a risk of your data format becoming obsolete during its useful lifetime, you may need to migrate it to a new format. The resources needed to do this could be included as a budget item in your data plan.

Recommendations

Wherever possible, select data formats that have the following sustainability attributes: (1) adheres to specifications that are publicly documented (versus formats based on proprietary specifications), as these file types are in widespread use and readable with readily available software; (2) is self-describing, containing embedded metadata that help interpret the context and structure of the data file; and (3) contains as much of the original information as possible.

Type of dataPreferred FormatsAcceptable Formats
Publications and Scholarly DocumentsPDF/UA, PDF/A, PDF (.pdf)
PDF (.pdf)
Documentation and scriptsRich text Format (.rtf)
PDF/UA, PDF/A, PDF (.pdf)
HTML (.xhtml, .htm)
OpenDocument Text (.odt)
Plain text (.txt)
Widely-used formats: MS Word (.doc, docx), MS Excel (.xls, xlsx)
XML-marked-up text (.xml) according to an appropriate DTD or schema (e.g., XHTML 1.0)
Textual dataRich Text Format (.rtf)
Plain text, ASCII (.txt)
eXtensible Markup Language (.xml) text according to an appropriate schema
Hypertext Markup Language (HTML)
Widely-used formats: MS Word (.doc, .docx)
Software-specific formats: NUD*IST, NVivo, ATLAS.ti
Tabular data with minimal metadata
(Column headings, variable names, etc.)
Comma-separated values (.csv)
Tab-delimited tile (.tab)
Delimited text with SQL data definition statements
Delimited text (.txt) with characters not present in data used as delimiters
Widely-used formats: MS Excel (.xls/xlsx), MS Access (.mdb/accdb), dBase (.dbf), OpenDocument Spreadsheet (.ods)
Audio dataWaveform Audio Format (.wav)MPEG-1 Audio Layer 3 (.mp3) if original created in this format
Audio Interchange File Format (.aif)
Free Lossless Audio Codec (FLAC, .flac)
Video dataMPEG-4 (.mp4)
OGG video (.ogv, .ogg)
AVCHD video (avchd)
AVI (.avi)
MOV (.mov)
Tabular data with extensive metadata
(Variable data, code labels, defined missing values, etc.)
SPSS portable format (.por)
Delimited text and command file (SPSS, Stata, SAS, etc.)
Structured text or mark-up file of metadata information (DDI in an XML file)
Proprietary formats of statistical packages: SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb)
Geospatial data
-vector and raster data
ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn)
Geo-referenced TIFF (.tif, .tfw)
CAD data (.dwg)
Tabular GIS attribute data
Geography Markup Language (.gml)
ESRI Geodatabase format (.mdb)
MapInfo Interchange Format (.mif) for vector data
Keyhole Markup Language (.kml)
Adobe Illustrator (.ai), CAD data (.dxf or .svg)
Binary formats of GIS and CAD packages
Image dataTIFF 6.0 uncompressed (.tif)JPEG (.jpeg, jpg, jp2) if original created in this format
GIF (.gif)
TIFF other versions (.tif, .tiff)
RAW image format (.raw)
Photoshop files (.psd)
BMP (.bmp)
PNG (.png)
Adobe Portable Document Format (PDF/A, .pdf)

Resources

ResourceSource
FAQs about digital audio and video formats.National Archives
Sustainability of Digital FormatsLibrary of Congress