OpenCitations Enhances Citation Data with COCI

OpenCitations has been working toward enhancing citations to make citation data more easily discoverable and retrievable. In July, OpenCitations released COCI, the OpenCitations Index of Crossref open DOI-to-DOI references. The initial release of COCI created first-class data entities out of citation information in order to index Crossref and to make this information machine-readable. The July release also included the OpenCitations Corpus (OCC), a repository of downloadable bibliographic and citation data. OpenCitations has been building upon the data model that they created, and released the newest version of COCI this week: they have extended the data model, and the index now contains almost 450 million citation links between DOIs from Crossref reference data.

The extended data model now includes classes for describing two kinds of self-citations, those related to a journal and those relating to an author.

The data model used for describing the citation data included in any Open Citation Index. This image was retrieved from opencitations.wordpress.com
  • Journal self-citations refer to instances in which the citing and cited entities are published in the same journal.  This is determined by comparing ISSNs of the journals in which the two articles related by a citation are published, according to the data from Crossref. Two articles must share an ISSN to be described as a journal self-citation.
  • Author self-citations refer to instances in which the citing and cited entities have at least one author in common, which is determined based on the ORCIDs in Crossref. If any ORCID is shared between two citations, then it is considered an author self-citation.

OpenCitations has also updated its REST API to reflect the changes in the COCI data model. It is now possible to include or exclude journal or author self-citations when using the API to retrieve data.

In addition to accessing data through the REST API, all of the data in COCI can be queried using the COCI SPARQL endpoint, can be searched using the COCI Search Interface, and are available as data dumps on figshare in CSV and N-Triples.

The next extension to COCI that is in the works will include indexes of citations from sources like Wikidata and DataCite.