Definition

A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it.

Relevance

In order to read most types of digital data, you need to open it in a compatible software. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If there is a risk of your data format becoming obsolete during its useful lifetime, you may need to migrate it to a new format. The resources needed to do this could be included as a budget item in your data plan.

Recommendations

Wherever possible, select data formats that have the following sustainability attributes: (1) adheres to specifications that are publicly documented (versus formats based on proprietary specifications), as these file types are in widespread use and readable with readily available software; (2) is self-describing, containing embedded metadata that help interpret the context and structure of the data file; and (3) contains as much of the original information as possible.

Type of DataRecommended FormatsAcceptable Formats
Tabular data with extensive metadata: variable labels, code labels, and defined missing valuesSPSS portable format (.por)

Delimited text and command ('setup') file (SPSS, Stata, SAS, etc.)

Structured text or mark-up file of metadata information, e.g. DDI XML file
Proprietary formats of statistical packages: SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb)
Tabular data with minimal metadata: column headings, variable namesComma-separated values (.csv)

Tab-delimited file (.tab)

Delimited text with SQL data definition statements
Delimited text (.txt) with characters not present in data used as delimiters

Widely-used formats: MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf), OpenDocument Spreadsheet (.ods)
Geospatial data: vector and raster dataESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn)

Geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

Tabular GIS attribute data

Geography Markup Language (.gml)
ESRI Geodatabase format (.mdb)

MapInfo Interchange Format (.mif) for vector data

Keyhole Mark-up Language (.kml)

Adobe Illustrator (.ai), CAD data (.dxf or .svg)

Binary formats of GIS and CAD packages
Textual dataRich Text Format (.rtf)

Plain Text, ASCII (.txt)

eXtensible Mark-up Language (.xml) text according to an appropriate Document Type Definition (DTD) or schema
Hypertext Mark-up Language (HTML)

Widely-used formats: MS Word (.doc/.docx)

Some software-specific formats (NUD*IST, NVivo, ATLAS.ti)
Image dataTIFF 6.0 uncompressed (.tif)JPEG (.jpeg, .jpg, .jp2) if original created in this format

GIF (.gif)

TIFF other versions (.tif, .tiff)

RAW image format (.raw)

Photoshop files (.psd)

BMP (.bmp)

PNG (.png)

Adobe Portable Document Format (PDF/A, PDF), (.pdf)
Audio dataFree Lossless Audio Codec (FLAC) (.flac)MPEG-1 Audio Layer 3 (.mp3) if original created in this format

Audio Interchange File Format (.aif)

Waveform Audio Format (.wav)
Video dataMPEG-4 (.mp4)

OGG video (.ogv, .ogg)

Motion JPEG 2000 (.mj2)
AVCHD video (.avchd)
Documentation and scriptsRich Text Format (.rtf)

PDF/UA, PDF/A, or PDF (.pdf)

XHTML or HTML (.xhtml, .htm)

OpenDocument Text (.odt)
Plain text (.txt)

Widely-used formats: MS Word (.doc/.docx), MS Excel (.xls, .xlsx)

XML marked-up text (.xml) according to an appropriate DTD (Document Type Definition) or schema

Resources

ResourceSource
FAQs about digital audio and video formats.National Archives
Sustainability of Digital FormatsLibrary of Congress