Predicting Scientific Impact of Authors


Predicting the future impact of a researcher is a critical task, as it can be helpful in making many decisions, like, in identifying potential candidates for research grants, for job recruitment, for promotions etc. One of the key metrics used for evaluating the impact of a researcher is h-index which inherently is a field-speci c metric. Prediction models of h-index for di erent elds have been proposed in literature. However, these models are developed for speci c eld, their performance is not evaluated for multiple fields. Considering the citations and other factors’ variations across different fields, there may be a possibility that same model behaves differently for other field. There can be a need to apply h-index prediction model for different fields. For example, to compare two researchers from different fields and applying for the same job. As per existing approaches the future impact of researchers would be compared using different models. There is a gap to establish a model that performs well across multiple disciplines. Moreover, existing prediction models do not perform well for young researchers, i.e., researchers with low h-index or with less experience. So young researchers are excluded from experiments of prediction models in literature. These two research gaps have been addressed in this study, i.e, prediction model is proposed for the eld of Computer Science, tested for the eld of Physics, and evaluated for young researchers as well. We have considered several features of fundamental importance to authors that include existing feature from literature like average citations, number of publications, and we have also de ned new features like citations in impact factor journals, average h-index of all the coauthors. We have used these features to predict next five years future impact of researchers. Machine learning techniques such as regression and Neural Network, are used to nd the best set of parameters suitable for h-index prediction for the scientists from all career ages. R2 and RMSE are used as performance metrics to measure the accuracy. Experimental results on a large data set of ArnetMiner achieved up to 97% R2 and 0.27 RMSE for one year. Similarly, 90% R2 for ve years with 0.60 RMSE. Models proposed for the eld of Computer Science are further evaluated for the eld of Physics, on the data set acquired from Open Academic Graph (OAG). The proposed model exhibits reasonably good results for the eld of Physics as well i.e., 86% (R2) predictive performance for one year and 66% (R2) for ve years with 0.15 (RMSE) for one year and 0.29 (RMSE) for ve years. However, performance of the proposed models is not satisfactory for young researchers, R2 for young researchers is 67% for one year and 55% for ve years, which is very low as compared to full data set evaluation values. This poses a challenge for impact prediction of young researchers. Therefore, to tackle this challenge of Impact evaluation of young researcher’s, a new measure `NS-Index’ is proposed in this study. According to our findings the proposed index performs well in identifying future impact of young researchers. Our experiments conclude that NS-Index for young researcher is a better refrection of their future performance up to three years. However, to predict the performance of young researchers for more than 3 years our proposed h-index prediction model performs better.

Download full paper