Context Driven Aspect based Citation Sentiment Analysis
Scientific papers hold an association with the previous research contributions in the form of citations. The nature of the cited material could be positive, negative, or objective. In this thesis a technique is proposed for the identification of citing author’s sentiment towards cited paper by extracting unigram, bigram, trigram and pentagram adjective and adverb patterns from the citation text. After doing part of speech tagging on citation text, I used the sentence parser for the extraction of linguistic features comprising of adjectives, adverbs, and n-grams from the citation text. A sentiment score is then assigned to distinguish them as positive, negative and neutral. In addition, the proposed technique is compared with the manually classified citation text and two commercial tools, namely SEMANTRIA and THEYSAY. The analysis of the results depicted that the proposed approach has achieved comparable results with the commercial counterparts with an average precision, recall and accuracy of 90%, 81.82%, and 85.91% respectively. Further, this thesis presents a novel approach to identify aspect level sentiments. The approach is comprised of two levels. At first level, it extracts the aspects from the citation sentences using the pattern of opinionated phrases around the aspect. At the second level, it detects the sentiment polarity of the identified aspect considering nearby words and associate it with the corresponding aspect category using linguistic rule based approach. The approach consider ‘N-gram after’, ‘N-gram before’ and ‘N-gram around’ features. The results revealed that n-gram around feature performed better than others. It further indicates that SVM outperformed other classifiers for all n-gram models with an average precision 0.82, recall 0.807 and accuracy of 0.89. This thesis also investigates how the citation text and sentiments associated with them are distributed along the IMRaD structure. The analysis of the results depicts that expression of the positive sentiment towards the cited paper is most common at the start of the research paper i.e., “Introduction” followed by the “Discussion” section. The most significant result is that the “Discussion” section is designated with the largest number of negative citation contexts as compared to “Results” and “Introduction” along with majority of objective citation mentions found in “Literature” section.