Introduction

The amount of data available on the World Wide Web (the Web) is increasing rapidly, and finding relevant information by searching the Web is a daily challenge. Traditional search techniques rely on a textual matching of words, and do not take into consideration the semantic information behind the textual content. The Semantic Web is an approach which tries to overcome these disadvantages by representing knowledge on the World Wide Web in a way that can be interpreted by machines. In particular, these data are expressed by means of RDF , a data model that enables one to define information in the form of machine-readable subject-predicate-object statements. Usually these RDF statements are stored in a particular kind of RDF database called a triplestore, which can be queried by means of SPARQL , the query language for RDF data.

SPARQL is a very powerful query language, that can be used to look for data that follow specific patterns. When institutions such as the British Library and the British Museum, and projects such as Wikidata and DBpedia, want to make available their RDF data to the public, they usually provide a specialised Web interface to a SPARQL endpoint of their triplestore, so as to enable users to conduct programmatic searches for particular information, which is returned in one or more formats (usually HTML, XML, JSON, and CSV). However, this SPARQL query language is quite complex to learn, and is normally usable only by experts in Semantic Web technologies, remaining completely obscure to ordinary Web users.

In order to make such SPARQL endpoints usable by a broader audience, without obliging users to become expert in Semantic Web technology, we have developed OSCAR, the OpenCitations RDF Search Application, previously described at the the SAVE-SD 2018 Workshop (co-located with The Web Conference 2018) . OSCAR is a user-friendly search platform that can be used with any RDF triplestore providing a SPARQL endpoint, and which is entirely built without the integration of external application components. It provides a configurable mechanism that allows one to query a triplestore by means of a textual user input following definable rules, while in the background one or more SPARQL queries elaborate the user requests. The main idea is that Semantic Web experts need only be employed in the initial configuration of the system to work with a particular triplestore, by customizing a particular configuration file that provides the text-search interface and that then enables any user to query and filter the results returned by the underlying SPARQL queries by means of appropriate facets and values.

The development of OSCAR is one of the outcomes of the OpenCitations Enhancement Project, funded by the Alfred P. Sloan Foundation and run by OpenCitations . One of the main aims of OpenCitations is to build an open repository of scholarly citation data with accurate citation information (bibliographic references) harvested from the scholarly literature. Currently, OpenCitations provides two different datasets, i.e. the OpenCitations Corpus (OCC) and COCI (the OpenCitations Index of Crossref open DOI-to-DOI citations). These datasets are provided in RDF format and available for querying via two separate SPARQL endpoints. OSCAR was successfully configured and integrated inside the OpenCitations website so as to search on these datasets, thus permitting ordinary Web users to compose and obtain responses to simple textual queries.

The previous version of OSCAR (Version 1.0), described in , was able to accept free-text queries, that were analysed so as to understand the user intent, and then executed in the background by employing the appropriate SPARQL query. Since then, we have developed new features in response to users needs and the outcomes of the usability studies described in . This paper reports these new features, made possible by additions of the OSCAR architecture. Specifically:

Users now have the ability, using an advanced query interface, to create multiple field queries and to combine them using logical connectors. For example, it is possible to query for articles published in 2015 that have ‘John Michael’ as one of the authors. The logical connectors available are AND, OR, and AND NOT, so as to mimic existing approaches for building such complex queries such as the one implemented in Scopus.
A set of preprocessing functions are available that can be applied to the initial input in order to modify its form, so as to make it suitable for use within the specified SPARQL queries – e.g. “provide the lowercase version of the input DOI”.
The the table of results returned by a SPARQL query is extended, with new columns containing additional data retrieved by calling external services, such as REST APIs.
Specify conversion rules are employed to modify the values in the results table into formats more appropriate for visualisation purposes – e.g. an ISO date such as 2018-11-27 is presented as 27 November 2018.
The organisation of the configuration file that has to be created to customise OSCAR for a particular SPARQL endpoint has been restructured into a more intuitive and comprehensible form. In addition, it permits the customization of additional stylistic and filtering features.

To demonstrate the current usage of OSCAR, and its reusability in contexts different from the one for which had been developed (i.e. OpenCitations), we have analysed the traffic log of its use on the OpenCitations databases, and we demonstrate a configuration of an OSCAR instance that works the Wikidata SPARQL endpoint .

The rest of the article is organized as follows. In we describe some of the most important existing SPARQL-based searching tools. In , we describe OSCAR and discuss its model definition and architectural form, with a special focus on the new features. In , we demonstrate its use with the OpenCitations datasets (COCI and OCC) and with WikiData, while, in , we give some statistics about its use within OpenCitations. Finally, in , we conclude the article and sketch out some future works.

OSCAR, the OpenCitations RDF Search Application

OSCAR, the OpenCitations RDF Search Application, is an open source stand-alone javascript application which can be embedded in a webpage so as to provide a human-friendly interface to search for data within RDF triplestores by means of SPARQL queries. It is possible to configure OSCAR to work with a particular SPARQL endpoint by configuring a JSON document which specifies how the SPARQL queries are sent to that endpoint, and how the returned query results should be visualized, according to the predefined tabular view that OSCAR provides. The source code and documentation of OSCAR are available on GitHub at https://github.com/opencitations/oscar.

When OSCAR was presented for the first time at the SAVE-SD 2018 Workshop , we needed to follow a number of specific requirements, based on our experience and observation over the users requirements while requesting the data included in the OpenCitations datasets. In particular:

It must permit one to operate on and post-process the result set returned by the execution of a SPARQL query. These operations could be applied to one or more of the result fields included in the tabular interface presented to a user, and needed to be done dynamically at real time without any further querying of the SPARQL endpoint;
Each part of OSCAR – interface, functionalities and queries – must be customizable according to the user needs. This operation should be easily handled through a specific configuration module.
OSCAR must be easily configured to work with any RDF triplestore providing a SPARQL endpoint, and must also be easy to integrate as a new module within any webpage.

As a consequence of the outcomes of the usability test described in and in response to additional feedback gathered from users of OSCAR through personal communication, we have extended the aforementioned requirements with the following ones, so as to significantly improve the searching experience and potentials of OSCAR:

To allow an additional more sophisticated advanced search form, which connects a number of rule-oriented queries using logical connectors, specifically AND, OR and AND NOT.
To pre-process the query input provided by a user into a more useful form for the construction of the SPARQL query by the application of some heuristics.
To make available additional post-processing operations, such as to integrate additional data on the table of results by calling external services (e.g. REST API) and/or converting specific values in the result table into a more appropriate form for visualisation purposes, e.g. converting ISO dates (2018-11-27) into natural dates (27 November 2018).

In the following subsections, we describe the general architecture of OSCAR, its workflow, and how its customisation (via the configuration file) works, focusing in particular on the new components integrated into Version 2.0 of OSCAR presented in this article, since a detailed discussion of the features of Version 1.0 has already been provided .

Architecture of OSCAR

All the functionalities implemented by OSCAR are executed in the browser (client side), so as to make it easily reusable in different contexts and with different Web sites without the need of handling specific programming languages for running the back-end scripts. In particular, each OSCAR instance is defined by three files:

search.js, the main core of the tool, which handles its behaviour and defines its model;
search-conf.js, the configuration file which defines all the parameters and customises OSCAR to work to a specific SPARQL-endpoint;
search.css, the CSS stylesheet that defines the layout and other stylistic aspects of the OSCAR user interface.

All these files need first to be imported into an HTML page that will provide the user with the OSCAR text query interface. In addition, a skeleton HTML snippet needs to be included in such a Web page, that will be populated with the result of the OSCAR search operation. This snippet is defined as follows:

               
<div id="search" class="search">
  <div id="search_extra" class="search-extra"></div>
  <div id="search_header" class="search-header">
    <div id="rows_per_page"></div>
    <div id="sort_results"></div>
  </div>
  <div id="search_body" class="search-body">
    <div id="search_filters" class="search-filters">
      <div id="limit_results"></div>
      <div id="filter_btns"></div>
      <div id="filter_values_list"></div>
    </div>
    <div id="search_results" class="search-results"></div>
  </div>
</div>

The skeleton layout of the aforementioned OSCAR results interface (element div with attribute @id = search) is composed of three main sections, defined by specific div elements: the section extra (@id = search_extra), the section header (@id = search_header), and the section body (@id = search_body).

The section extra can be used to make available additional functionalities and operations to the results of a search operation. Currently, it includes a mechanism for exporting the results shown as a CSV file. The section header contains components that allow one to modify the table of results from a visual perspective – e.g. by specifying the maximum number of rows to be visualized per page, and by sorting the results according to a specific column or field. Finally, the section body is where the results are actually shown. It contains a table populated with the results obtained from the query execution, and a series of filters that enable a user to refine the results, so as to keep or excluding specific values.

The organisation of the structure of the aforementioned sections (and of all the subsections they contain) can be customized according to particular needs. In particular, one can decide which components are to be included within or excluded from the results Web page by keeping within that Web page the relevant HTML fragment, or by omitting it. Furthermore, while OSCAR provide a set of basic layout rules for all the components, these can be freely customised so as to align them with the particular style required by the Web site under consideration.

The Workflow

The workflow implemented by OSCAR is described in , where we introduce all the operations that OSCAR enables, and the various steps it runs as consequences of such operations. The process starts with the generation of the search interface, which is the mechanism used to permit someone to decide between two searching options: either (1) to input a free textual query within in the text search box provided by the interface, or (2) to perform an advanced search using multiple field queries, connect them using the AND, OR, and AND NOT logical operations. We will have two different workflows, according to the searching choice made by the user.

In case of a free text search, when a query is run (by pressing the enter key or by clicking on the lens provided in the interface to the right of the free-text field), OSCAR determines which SPARQL query it has to execute in order to provide results to match the particular textual input specified. As described in more detail in , the configuration file allows one to specify a sequence of rules, each defining a SPARQL query and a particular regular expression. OSCAR iterates each rule as it appears in the sequence, and it runs the related SPARQL query with the application of a number of heuristics (defined in the configuration file) only if the input text matches the regular expression specified in the rule under consideration. If no results are returned by that particular SPARQL query, OSCAR iterates to the next rule and its associated SPARQL query until a result is returned, or until no result is found.

On the other hand, if a user choose to run an advanced search, the workflow will directly start from the application of the heuristics and the execution of a complex SPARQL query, which is made through the combination of several SPARQL group patterns combined through the appropriate connectors (e.g. UNION and FILTER NOT EXISTS) according to the logical connectors chosen by the user from the Web interface. Once we have a set of results returned by the SPARQL query, we move directly to the post-processing phase. This workflow is shown by means of red arrows in .

Once a result is returned, other additional operations are executed. First, OSCAR checks if some of the fields returned in the result table actually represent URL links for values of other fields – according to what is specified in the configuration file – and, if that is the case, it creates explicit links in the resulting Web page. For instance, if we consider a simple two-column table where each row describes the title of an article and the URL from which one can retrieve its full metadata, then OSCAR can be configured to show the article title as a clickable link that allows one to go to the descriptive page for that article, by incorporating into the title the related URL that would otherwise have been displayed in the second column.

Then, OSCAR performs two new operations we have built for Version 2.0, if they are activated in the configuration file. First, it calls external services using as parameters the values present in the table returned by the SPARQL query, so as to integrate and/or extend the current table of results with additional information (e.g. a new column). For instance, considering the metadata describing a particular bibliographic resource (such as those ones available in the OpenCitations Corpus), it is possible to call the Crossref API with the DOI of the bibliographic resource (already specified in the table returned after the SPARQL query), to retrieve the ISSN of the related journal where such bibliographic resource has been published, and then to integrate such a new value under a new ‘issn’ column. Second, OSCAR enables one to expose the values of specific columns in the table according to a new format following precise transformation rules (expressed as regular expressions) specified in the configuration file. For instance, the given name of a person (e.g. John) could be mapped into a new shape which keeps only its first letter followed by a dot (e.g. “J.).

After these passages, OSCAR performs a grouping operation following the parameters indicated in the configuration file. This kind of operation allows one to group multiple rows of the results table according to a particular field (a key), all the fields of such rows will be collected together, for example by concatenating their textual values. For instance, consider the following query to be executed on the OpenCitations Corpus SPARQL endpoint:

               
SELECT ?title ?iri ?author {
  ?iri 
    dcterms:title ?title ;
    pro:isDocumentContextFor [
      pro:withRole pro:author ;
      pro:isHeldBy [
        foaf:familyName ?author
      ]
    ]
}

This query will return a three column table which includes the title of a bibliographic resource, its IRI, and the name of the author. In case a certain bibliographic resource has more than one authors, the several rows will be returned (one for each author of the article), each repeating the title and IRI of the bibliographic resource and listing one of its authors in the third field. The grouping operation performed by OSCAR allows us to group all these authors into one author cell in the third column, so as to provide just one row per bibliographic resource in the result table.

Finally, OSCAR allows one to specify only a specific subset of the fields returned by the SPARQL endpoint to display in the Web page, according to the specification given within the configuration file. For instance, using the same example provided above, in this phase we can exclude the second column depicting the IRI of the bibliographic resource, since this IRI could have already been incorporated into a clickable links added to the article title in the first column.

All the data obtained by the aforementioned operations are initialized and stored internally in four different forms, called native data, filtered data, sorted data and visualised data respectively. Native data are the complete original result-set after the execution of the aforementioned operations. Filtered data are the subset of the native data after the application of filtering operations executed by a user through the OSCAR web interface (e.g. show only the articles published in 2016). Sorted data are the subset of the filtered data after the execution (still by the user through the Web interface) of sorting operations (e.g. sort the rows in descending order according to the number of citations that the bibliographic resources have received). Finally, visualised data are the subset of the sorted data that are displayed in the Web page (for example, the first twenty results), while the others are hidden behind a pagination mechanism so as to avoid filling up the entire page with all the results.

It is worth mentioning that, in the initialization phase, before filtering and sorting, all the filtered data and sorted data are equivalent to the native data, while the visualised data (i.e. those actually shown in the webpage) are a subset of the sorted data initially created using the display parameters specified in the configuration file. The filtered and sorted data are then subsequently modified as consequence of the filtering and sorting operations undertaken by a user through the OSCAR Web interface. In fact, once all the various data are initialised, OSCAR builds its layout and interface, and thus enables the user to interact with the results by executing certain type of operations on the data – i.e. exporting, filtering, sorting and visualising, introduced above. All the aforementioned operations, with the exception of the exporting operation, result in updating the user interface, which shows only the new subset of visualised data obtained as consequence of each operation, as summarized in .

Step	Operation	Data modified	Description
Export	Export into a CSV file	Sorted data	The sorted data are exported into a CSV file.
Filter	Show all results	Filtered data	The filtered data are reset to the native data.
Filter	Modify number of results	Filtered data	Reduce the filtered data to a specified number of rows.
Filter	Filter by field	Filtered data	Exclude or show only the filtered data equal to some specific values of a certain field.
Sort	Sort results by field	Filtered data	Sort (in ascending or descending order) all the filtered data according to the value of a particular field.
Visualize	Browse pages	Visualised data	Show the visualized data, organized into pages, page by page.
Visualize	Modify number of rows	Visualised data	Increase or decrease the number of visualized data row shown at any one time in the Web page.

All the possible operations that a user can perform on the results returned by a free-text search, arranged by the steps in the OSCAR workflow in the order that they are executed.

Customising OSCAR

OSCAR offers a flexible way to customise its behaviour according to different needs. In particular, an adopter has to modify a particular configuration file (i.e. search-conf.js, which contains a JSON object) so as to customize the tool for the particular SPARQL endpoint to be queried – as illustrated in the documentation of the tool available on the GitHub repository. An excerpt of an exemplar configuration file is shown as follows (while a full example is available online):

               
{
  "sparql_endpoint": "https://w3id.org/oc/sparql",

  "prefixes": [
    { "prefix":"cito", "iri":"http://purl.org/spar/cito/" },
    { "prefix":"dcterms", "iri":"http://purl.org/dc/terms/" },
    … 
  ],

  "rules": [
    {
      "name":"doi",
      "advanced": true,
      "freetext": true,
      "heuristics": [[lower_case]],
      "category": "document",
      "regex":"(10.\\d{4,9}\/[-._;()/:A-Za-z0-9][^\\s]+)",
      "query": [
        "{",
        "?iri datacite:hasIdentifier/literal:hasLiteralValue '[[VAR]]' .",
        "}"
      ]
    },
    ...
  ],

  "categories": [
    {
      "name": "document",
      "label": "Document",
      "macro_query": [
        "SELECT DISTINCT ?iri ?short_iri ?short_iri_id ?browser_iri ?doi …",
        "WHERE {",
          "[[RULE]]",
          "OPTIONAL { … }}",
      ],
      "fields": [
        {
          "iskey": true, "value":"short_iri", 
          "label":{"field":"short_iri_id"},
          "title": "Corpus ID", "column_width":"15%",
          "type": "text", 
          "sort": {"value": "short_iri.label", "type":"int"}, 
          "link": {"field":"browser_iri","prefix":""}
        },
        … 
      ],
      "group_by": {"keys":["iri"], "concats":["author"]},
      "ext_data": {
        "crossref4doi": {
          "name": call_crossref, "param": {"fields":["doi"]},     
          "async": true}
    },
    … 
  ],
  … 
}

This configuration file allows one to specify the SPARQL endpoint to connect with for running SPARQL queries, and the SPARQL prefixes to use in the various queries. In addition, it enables the specification of the rules for executing the appropriate SPARQL queries. In particular, each rule includes a name, an activator (i.e. a regular expression shaping a particular string pattern), a category describing the types of data that will be collected (see below), and the SPARQL query to include into the macro query, defined under the specified category in order to build the correct sequence of SPARQL group patterns to execute once the activator matches with the textual input query provided by the user. In the case of an advanced search, we might have multiple queries from several rules which need to be connected, following the logical connectors specified in the user interface (AND, OR, and AND NOT), with the corresponding SPARQL constructs, and then moved inside the extended macro query defined under the corresponding category of the rule. A specific boolean flag can determine whether a specific rule should be taken into consideration for the free-text and/or the advanced search. The pre-processing functions (key heuristics) are also defined in the ‘rule’ block. They are listed in the order they must be called, and the result returned by the first will be used as the input for the second, and so on.

Finally, the configuration file also comprises the categories, i.e. particular descriptive operation that are applied to the results returned by the built SPARQL query defined inside them, after the combination of the queries for the selected rules. Each category includes a name and a set of SPARQL query SELECT variables. Each of these variables is accompanied by information about its presentation mechanisms (e.g. the label to use for presenting it in the Web page, and the width of the table column in which to put the related values), and about other filtering operations that can be applied to the values associated with that variable (e.g. the operations link, group and select described in ).

Configuring OSCAR for OpenCitations and Wikidata

An important aspect of OSCAR concerns its flexibility to be adapted to work with any SPARQL endpoint. The first version of OSCAR, presented at the SAVE-SD workshop , was demonstrated to work with three RDF datasets: OpenCitations Corpus , ScholarlyData and Wikidata .

In this article, we present new and more detailed configurations compliant with the new version of OSCAR which enables one to search on the two main datasets of OpenCitations, i.e. the OpenCitations Corpus (OCC) and COCI, and on a precise Wikidata subset, i.e. that dedicated to the description of scholarly articles. In the following subsections, we analyse each case separately and show the configurations made to OSCAR for each chosen searching strategy (free-text and advanced search), the features included, and the appearance of the final interface generated.

The OpenCitations Corpus

The OpenCitations Corpus (OCC) is an open repository of scholarly citation data and bibliographic information that we have developed. Originally this database was the main target and incentive for the development of OSCAR. Currently, the OCC contains 14 million citation links between more than 7 million bibliographic resources.

The OSCAR search interface for the OCC is available at http://opencitations.net/search. The use of OSCAR in this case enables the search of two entities included in the OCC: documents (bibliographic resources) and authors.

We have now configured OSCAR for both the free-textual and advanced search. In particular:

The free-text search allows the recognition of two different types of input: unique global identifiers (DOIs and ORCIDs, that identify published articles and authors, respectively), and any other textual string which could be used to identify the title of a document or the name of an author. It is worth mentioning that this text search string is not matched against the abstracts and the keywords of documents, since these data are not currently stored within the OCC. In , we show a screenshot of OSCAR after the execution of a free text search using the string ‘machine learning’.
The form used for the advanced search changes depending of the kind of entities we are looking for. In the case of bibliographic resources (i.e. published articles), three searching parameters are available which can be combined/connected by using logical operations: the DOI value, a keyword to search for inside the title/subtitle of the bibliographic resource, and the author last name of such resource. Alternatively, where the user is interested in searching for authors, three parameters are available through the interface: the ORCID, the last name, the first name of the author. In , we show a screenshot of the advanced search interface available for the OCC. In that example, we are looking for all the documents containing either the words ‘semantic’ or ‘open citations’ in their title/subtitle, and having shotton as string specified to one of the authors' last name. Once the button ‘Search in OC’ is clicked and the query is executed, the results will appear in a table which looks like the one in .

The Results interface of the OSCAR for the OCC: the results shown are those obtained after the application of a free-text search using the string ‘machine learning’ (http://opencitations.net/search?text=machine+learning). Each row represents a bibliographic resource, while the fields represent (from left to right): the resource identifier in the OpenCitations Corpus (Corpus ID), the year of publication (year), the title (title), the list of authors (Authors), and how many times the bibliographic resource has been cited by other resources according to the data available in the OCC (Cited by).

The advanced search interface for the OCC. In this case we are looking for all the documents containing either the words ‘semantic’ or ‘open citations’ in their title/subtitle, and one of the authors with last name ‘shotton’ – http://opencitations.net/search?text=shotton&rule=author_text&text=semantic&rule=any_text&bc=or&text=open+citations&rule=any_text. — The advanced search interface for the OCC. In this case we are looking for all the documents containing either the words ‘semantic’ or ‘open citations’ in their title/subtitle, and one of the authors with last name ‘shotton’ – http://opencitations.net/search?text=shotton&rule=author_text&text=semantic&rule=any_text&bc=or&text=open+citations&rule=any_text

The configuration file of this instance of OSCAR is available online at: http://opencitations.net/static/js/search-conf.js.

The COCI dataset

COCI, the OpenCitations Index of Crossref open DOI-to-DOI references, is an RDF dataset containing details of all the citations that are specified by the open references to DOI-identified works present in Crossref. These citations are treated as first-class data entities, with accompanying properties including the citations timespan, modelled according to the data model described in the Open Citation Indexes webpage. COCI was first created and released on June 2018, and currently contains almost 450 million citations link between 46 million bibliographic resources.

In this case, we have configured OSCAR to be used only through an advanced search interface, which is currently available at http://opencitations.net/index/coci/search. Users will have three possible searching parameters available to be combined: the value of the citing DOI, the value of the cited DOI, the Open Citation Identifier (OCI) of the citation. These fields may be combined and connected in a complex query using the usual logical connectors. shows the result interface of OSCAR after searching for the value ‘10.1186/1756-8722-6-59’ as citing DOI in COCI. In this case, the values within the fields Citing references and Cited references are provided by querying the Crossref REST API with the DOI of the citing and cited entities.

The results interface for COCI in the OpenCitations website after using its advanced search option and executing a query to look for all the resources (citation entities) having the value ‘10.1186/1756-8722-6-59’ as citing DOI – http://opencitations.net/index/coci/search?text=10.1186%2F1756-8722-6-59&rule=citingdoi. — The results interface for COCI in the OpenCitations website after using its advanced search option and executing a query to look for all the resources (citation entities) having the value ‘10.1186/1756-8722-6-59’ as citing DOI – http://opencitations.net/index/coci/search?text=10.1186%2F1756-8722-6-59&rule=citingdoi

The configuration file of this instance of OSCAR is available online at: http://opencitations.net/static/js/search-conf-coci.js.

Wikidata

Wikidata is a free open knowledge base which acts as a central store for the structured data of Wikimedia Foundation projects including Wikipedia, and of other sites and services. Wikidata offer a SPARQL query service and already has its own powerful Web graphical user interface for facilitating the users to construct SPARQL queries. Our OSCAR customisation to the Wikidata SPARQL endpoint is thus made entirely for demonstration purposes, rather than to provide new functionality for Wikidata users. While Wikidata contains a wide variety of information, we decided to limit our customisation of OSCAR to bibliographic entities within the scholarly domain.

We have consulted previous articles which talks about how to query the Wikidata dataset , and the actual data model used by Wikidata . We have built an entirely new interface dedicated to the Wikidata querying available at the following link https://opencitations.github.io/oscar/example/v2/wikidata.html. This interface has been also recently presented at the WikiCite 2018 conference in Berkeley, California .

The OSCAR configuration for Wikidata includes both the free-text and the advanced search options. Users can decide whether they want to search for scholarly documents or their authors. In case of scholarly documents, users can retrieve them by typing: (1) a DOI, (2) the name of the journal where such document has been published, (3) the cited articles, (4) the articles referenced, (5) the earliest publication year, (6) the Wikidata QID, or (7) a free textual input. All these options can be combined to build a complex query through the advanced searching option. For instance in the advanced query built asks to retrieve all the articles citing the scholarly document with DOI 10.1145/2362499.2362502, where the citing articles have been published in the Journal of Documentation.

The OSCAR interface for querying the Wikidata scholarly documents. On the top right of the page we have an input box dedicated for the free-text search, while on the bottom we have advanced search dedicated section. The query written in the interface asks to retrieve all the articles citing “10.1145/2362499.2362502” published in “Journal of Documentation”. — The OSCAR interface for querying the Wikidata scholarly documents. On the top right of the page we have an input box dedicated for the free-text search, while on the bottom we have advanced search dedicated section. The query written in the interface asks to retrieve all the articles citing 10.1145/2362499.2362502 published in Journal of Documentation.

Where the required results concern authors rather than publications, users can retrieve results by typing: (1) the ORCID, (2) a DOI of a specific work, (3) the job or profession of the author, (4) the last name, or (5) the first name. As for queries concerning publications, users can also decide to build a complex query and combine these options using the AND / OR / AND NOT logical connectors.

The current demo available online already presents a set of query examples to try. In we demonstrate the way OSCAR shows the results retrieved after asking for the list of articles citing the scholarly article with DOI 10.1016/J.WEBSEM.2012.08.001, with a publication year no earlier than 2002.

The results interface of OSCAR for Wikidata after using its advanced search, after asking OSCAR to retrieve the list of articles citing 10.1016/J.WEBSEM.2012.08.001, where the citing publications have a publication year of 2002 or later. Each row represents a publication, while the fields represent (from left to right): the resource identifier in Wikidata (Q-ID), the title (Work title), the list of authors (Authors), the number of citations (Cited), and the year of publication (Date).

The configuration file of this instance of OSCAR is available online at: https://opencitations.github.io/oscar/example/v2/static/js/search-conf-wikidata.js.

Usage statistics from OpenCitations

We have been collecting and monitoring the usage of OSCAR inside the OpenCitations website for both the OCC and COCI datasets. These data refers to the access information to OSCAR since its launch in OpenCitations in February 2018. The statistics and graphics we show in this section highlight the community uptake of OSCAR, and the way it has been used by the users.

This section is divided in two parts. First, we discuss on the general usage of OSCAR since its first integration inside the OpenCitations website. Instead, in the second part, we analyse the different kinds of query that have been performed by the users.

General usage

We gathered the statistics regarding the accesses to OSCAR through the OpenCitations website on both the OCC and COCI datasets maintained by OpenCitations. In the graph shows the number of queries launched for each different dataset, from February 2018 (the date when OSCAR was launched and integrated inside the OpenCitations website) to September 2018. In addition to the total number of accesses, the graph shows the number of queries that led to a further navigation to browse one or more of the resources that had been found and listed in the results table. In particular, this navigation starts by clicking on the contents of the results table, so as to access the metadata related to that particular entity (e.g. a document or an author). The resources are browsed using another tool called LUCINDA, which is a tool made available by OpenCitations to provide an HTML description of the data of a particular entity included the OpenCitations datasets. The description of LUCINDA goes beyond the purpose of this work – for a further reading we recommend visiting the repository and documentation of LUCINDA at https://github.com/opencitations/lucinda.

From we can notice a peak in the usage of OSCAR for OCC during March 2018 (the month after its launch) and June 2018. The latter peak is probably due to the integration of LUCINDA in the website, and the fact that the searched items could be browsed and visualized, as we can see a high number of access that led to a redirection from OSCAR to the resource browser page. In the COCI case, the peak point happened in July 2018, i.e. the month after its official release.

The number of queries launched through OSCAR from the OpenCitations web site, searching for COCI or OCC resources, for each different month starting from February 2018 to September 2018. For each different dataset we show the number of queries that led to a further navigation to browse the metadata related to the searched resources. Note that the vertical axis uses a logarithmic scale.

We wanted also to monitor the usage of the new advanced search feature added to OSCAR, and its ability to build complex queries with multiple restrictions by means of logical connectors. From we can notice that this new feature is still not so popular among the users searching the OpenCitations datasets. This statistic is significant, and might suggest the need to make a further analysis on the usability of the advanced search and how we could improve it, to encourage users using it.

Queries

In this section we wanted to answer the question ‘what type of queries users do?’. We made two different analysis for the OCC and the COCI case.

In case of the OCC dataset, we wanted to see which are the categories mostly searched by users among documents and authors. shows these queries for each month starting from March to September 2018. No values are reported for February 2018, due to the fact that the first version of OSCAR did not have any feature to distinguish between an author or a document query, since the input accepted was only free-text without any category attribution to it. From a general prospective we notice that users are more interested in getting documents rather than authors. In addition, the results returned for a document query also include the corresponding authors, which could be further browsed – and, therefore, users might want to search for authors indirectly by first looking at their works.

The number of queries launched through OSCAR from the OpenCitations web site, searching for Author or Document resources inside the OCC corpus, for each different month starting from February 2018 to September 2018.

The number of queries launched through OSCAR from the OpenCitations web site, searching for Citation resources inside the COCI dataset, by entering the Citing/Cited DOI, or the actual OCI of the resource, for each different month starting from April 2018 to September 2018.

In the OSCAR instance for COCI, we have just one possible category, since the only type of resources included in the dataset are ‘Citations’. Therefore, we made an analysis on the type of queries that users specified. In particular, there are three possible queries to perform in order to retrieve the resources in COCI, by giving: (1) the citing DOI, (2) the cited DOI, or (3) the OCI of the wanted resource. As we can see from the graph of , the numbers start to get higher from the month of July (as is also confirmed from the previous ), and we notice that users prefer typing a citing DOI as input, and retrieve the list of all the citations made by it (the reference list). Beside this, it is normal to see a low number for the OCI input, since it is more unusual to know the specific OCI identifier of the citation.

Abstract