Citation Context and Reasons Ontology-CCRO: A Model for Citation Reasons Between Research Articles


Research papers can be visualized as a networked information space that contains a collection of information entities, inter-connected by directed links, commonly known as Citation Graph. There is a possibility to enrich the citation graph with meaningful relations between the citing and the cited articles to express the citation’s reason using semantic tags. We have explored the existing tags and evaluated them against the representation of the citation’s context and reasons. We have discovered more than 150 citations’ reasons from the published literature to be represented as citation tags. Many of these reasons have overlapped and diffused meanings. A citation graph is a forest of graphs with hundreds of nodes in each graph. Annotating such a large volume of graphs with citation’s reasons manually, requires a huge effort, and is nearly impossible. Thus, giving rise to a need to discover the citation’s reasons automatically with a high accuracy. In order to achieve this, the first step is to develop a minimal set of citation’s context and reasons that are disjoint in nature (if possible). It would be great help to the reasoning system if these reasons are represented in a formal way in the form of Ontology. A formally defined set of reasons can make machine-learning algorithms to identify these reasons. By adopting a well-defined scientific methodology to formulate an ontology of citation reasons, we have reduced 150 reasons into only eight reason classes by using an iterative process of sentiment analysis, collaborative meanings and experts’ opinions. Based on our findings and experiments, we have proposed an Ontology for Citations’ Context and Reasons – CCRO that provides abstract conceptualization required to organize citations’ relations. CCRO has been verified, validated and assessed by using well-defined procedures and tools proposed in the literature for ontology evaluation. The results show that the proposed ontology is concise, complete and consistent. For the instantiation and mapping of ontology classes on real data, we have developed a Mapping Graph between the verbs with predicative complements in English Language, the verbs extracted from the selected corpus using NLP and our CCRO classes.

In recent scientific advances, Artificial Intelligence and Natural Language Processing are the major contributors in classifying documents and extracting information. Classifying citations in different classes has gathered a lot of attention due to large volume of citations available on different digital libraries. Typical citation classification is based on sentiment analysis, where various techniques are applied on citations texts to mainly classify them in “Positive”, “Negative” and “Neutral” sentiments. Using CCRO, next step adapts an ontology-based approach to extract citation’s reasons and instantiate ontology classes and properties on two different corpora of citation sentences. One corpus of citation sentences is a publicly available data-set, while the other is our own manually curated. The process uses a two-step approach. First part is an interface to manually annotate each citation text in the selected corpora on CCRO properties. A team of carefully selected annotators have annotated each citation to achieve high inter-annotator agreement. Second part focuses on automatic extraction of these reasons. Using Natural Language Processing, Mapping Graph and Reporting Verb in a citation sentence, citation’s reason is extracted and mapped onto a CCRO property. After comparing both manual and automatic mapping, accuracy is calculated. Based on experiments and results, our algorithm shows overall accuracy of 85.4% and 96.6% in publicly available and our own corpora of citation sentences respectively.

The number of research articles in today’s world has grown exponentially. With such a huge digital infrastructure, where there is a need to gather actionable intelligence from millions of papers that are without any useful semantics, there is also a need to improve the ways where new research articles are authored and disseminated to build knowledge base. In order to look at both sides the problem, two different application of CCRO are formulated. One that deals with the legacy data and the other that deals with the authoring of new research with useful semantics.

A citation graph has the potential to reveal important and interesting information about the history of a particular scholarly research that has happened during its life-cycle. Citation graphs can be enriched with semantic tags, where scientific papers are inter-connected with citation reasons. For the first application, using CCRO properties, our selected corpora are initially converted to Semantic Citation Graph. With help of guidelines provided in literature, five different queries are then formulated to discover evolutionary path of a scholarly activity, to find current state of a research, and to examine different school of thoughts around a problem etc

However, one of the best source of knowledge to tell the reason of citation is the author of the paper. Authors of the scholarly articles cite other articles, based on certain reasons. Integrating these citations’ reasons in an authoring system can help authors to choose a reason while citing. So for our second application, a Semantic LATEX , that integrates CCRO properties within LATEX document, is proposed. Using CCRO properties, to semantically tag citations with reasons can create an discourse relation between research papers. Furthermore, embedding these structures within RDF Data Store enables the creation of semantic publications that becomes a foundation artifact for the Semantic Publishing Ecosystem and linked resources become part of the current Web of Data.

Download full paper