https://doi.org/10.6084/m9.figshare.5522143
https://w3id.org/people/essepuntato/presentations/the-open-citations-revolution.html
silvio.peroni@unibo.it 0000-0003-0530-4305 @essepuntato
DASPLab, DISI, Università di Bologna
1st Workshop on Enabling Open Semantic Science (SemSci 2017), 21 October 2017, Vienna, Austria
Citations are the primary tool to acknowledge others' prior work on a particular topic. They enable one to find key publications within a particular field, and are used also for research purposes – e.g. people working in Bibliometrics, Informetrics, and Scientometrics use them for analysing the complex relationships that exist within huge networks of citations of scholarly works. In addition, citation data are important for the assessment of the quality of research by means of metrics and indicators calculated from citation databases. However, the cruel reality is that citations have been locked up in close silos for years, and often they can only be accessed by paying significant subscription fees.
But the scenario is quickly changing. In the past years, several initiatives (I4OC, OpenCitations, WikiCite, Springer Nature SciGraph, LORC, etc.) have started to promote the availability of open citation data. In this talk I will introduce some of the main significative efforts in the area, focussing on the way Semantic Publishing technologies have been used and adopted for enabling a FAIR publication of open citation data.
Oxford dictionary: refer to (a passage, book, or author) as evidence for or justification of an argument or statement, especially in a scholarly work
Isaac Newton (1675): If I have seen further, it is by standing on the shoulders of giants
Citations are unanimously recognised as crucial for knitting together our scientific and cultural knowledge
Everything is a reference
Semantic overload!
A citation is a conceptual directional link from a citing entity to a cited entity
Citations instantiated by the inclusion of a bibliographic reference (1) are different from those defined by an in-text reference pointer + citation context (2)
Generate a plain citation link
Generate a citation link with a specific citation function
A citation index is an index of (plain) citations between publications
Some are freely accessible but not downloadable, e.g. Google Scholar
The most authoritative by institutions worldwide, namely Scopus and Web of Science, can be accessed only by paying significant access fees
Usually are more oriented towards human readability rather than machine readability and data re-use
What about machine-readable open citation data?
The Initiative for Open Citations (I4OC, https://i4oc.org) is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data
Goal: promote the availability of data on citations that are:
structured - available in common and machine-readable formats
separable - no need to access the source bibliographic products
open - freely accessible and reusable
Publishers deposit their reference data with Crossref, but the default state for the data is closed (i.e. not accessible through the Crossref REST API)
However one email is enough to make the data open
Challenge: persuade a group of influential publishers to release their data
What happened: before I4OC launch (6 April 2017), 1% of publications in Crossref with open references
49 scholarly publishers have opened their references, including the following major ones:
Commercial publishers - Association for Computing Machinery, BMJ, De Gruyter, eLife, EMBO Press, Hindawi, IOS Press, PeerJ, Pensoft Publishers, Portland Press, Public Library of Science, Springer Nature, Taylor & Francis, Wiley
University and scholarly presses - Cambridge University Press, Cold Spring Harbor Laboratory Press, Company of Biologists, Edinburgh University Press, MIT Press, Rockefeller University Press
Learned societies - American Association for the Advancement of Science (AAAS), American Physical Society, American Society for Cell Biology, International Union of Crystallography, Proceedings of the National Academy of Sciences (PNAS), Royal Society of Chemistry, The Royal Society
API call: http://api.crossref.org/works/10.1007/978-3-319-46547-0_18
{
"message": {
"DOI": "10.1007/978-3-319-46547-0_18",
"title": ["FOOD: FOod in Open Data"],
"reference": [{
"key": "18_CR1",
"unstructured": "Falco, R., Gangemi, A., Peroni, S., Vitali, F.: Modelling OWL ontologies
with Graffoo. In: The Semantic Web: ESWC 2014 Satellite Events, pp. 320-325
(2014). http://dx.doi.org/10.1007/978-3-319-11955-7_42",
"DOI": "10.1007/978-3-319-11955-7_42",
"doi-asserted-by": "crossref"
}, {
"key": "18_CR2",
"unstructured": "Ferragina, P., Scaiella, U.: On-the-fly annotation of short text fragments
(by Wikipedia entities). In: Proceedings of the 19th ACM International
Conference on Information and Knowledge Management (CIKM 2010), pp.
1625-1628 (2010). http://dx.doi.org/10.1145/1871437.1871689",
"DOI": "10.1145/1871437.1871689",
"doi-asserted-by": "crossref"
}, ... ],
...
}
}
Crossref citation data are not following all the FAIR data principles
F1: no persistent identifier is defined for metadata - just API URLs
I1: no use of formal language for knowledge representation - just JSON
I2: the vocabulary used in JSON is not FAIR
R1.2: only a bit of provenance, no change tracking
R1.3: use of specific JSON made for that purpose
While Crossref data are the main building block for FAIR open citation data, how can we reach FAIRness?
The OpenCitations Corpus (OCC, http://opencitations.net) is a LOD repository of CC0 citation data
It provides >11M citation links from ~260,000 citing articles to ~6M cited resources + provenance information
Data accessible via HTTP URI (content negotiation), SPARQL endpoint, and monthly dumps (on Figshare)
Reuse of external APIs (Europe PubMed Central, Crossref, ORCID) and models (SPAR Ontologies)
More info: One year of the OpenCitations Corpus
, talk on October 24, 2017, 10:50, room Stolz 2
WikiCite is an initiative to build a bibliographic database in Wikidata to serve all Wikimedia projects
Wikidata now includes 36M citation links using the cites (P2860) Property in Wikidata
All the data released in CC0
Additional tools built by the community, e.g. Scholia
From https://doi.org/10.1108/JD-12-2013-0166:
When the citation links exist but are not described, the traveller through the city of scholarly publishing travels without specific directions, and will get lost in the maze of bridges that form the citation network
The Citation Typing Ontology (CiTO, http://purl.org/spar/cito) enables characterization of the nature or type of citations, both factually (cito:usesMethodIn
, cito:extends
, etc.) and rhetorically (cito:agreesWith
, cito:disputes
, etc.)
Semantic Lancet (http://semanticlancet.eu) is a prototype focused on building LOD on scholarly publications
Its contains bibliographic data, abstract and citations
The Linked Open Research Cloud (https://linkedresearch.org/cloud) is a new project which makes available an inbox to receive Linked Data Notifications about scholarly communication resources
CC0 expected for notifications + provenance information
Notifications must meet the requirements of certain notification data shapes: article, annotations, citations
CiTO properties (i.e. subproperties of cito:cites
) are suggested for specifying citation links
Key benefit for open citation data from I4OC home:
The creation of a public citation graph to explore connections between knowledge fields, and to follow the evolution of ideas and scholarly disciplines.
Having one single FAIR dataset with the complete citation graph of the whole scholarly literature is not feasible
Each initiative should provides its partial data and links its resources with those included within other citation datasets
Interlinking should be an additional mandatory I
in FAIR if we want to reach a real public and global citation graph
Authors and journal editors:
ask your publisher to free citation data
contribute actively to community initiatives with your citation data (e.g. share them in LORC)
Publishers:
join I4OC (https://i4oc.org/#faqs)
it is in the publisher's interests - better discoverability and use of its published content, both subscription access and open access