Title

Improving Software Fault Prediction Using Novel Metrics Based on Data Flow Volume and Coupling Complexity

Abstract

Well in time prediction of faulty modules is of key importance in software testing, which is generally done by software fault prediction (SFP) techniques. It lets testers focus more on faulty modules and prioritize test cases. It also assists in integration testing, hence signi cantly minimizes testing effort and improves testing quality. The SFP community accepts the e ectiveness of coupling between software modules in SFP. More speci cally, coupling metrics which are purposed by Martin, Henry, and Chidamber are reported more useful. However, it is found that two important aspects of coupling, i.e., data flow volume and levels in software coupling have not been addressed so far. Keeping in view the same we proposed coupling metrics Vovel-in and Vovel-out, that incorporate these two aspects of coupling to improve the performance of SFP. We performed experimentation by using ve public datasets; Apache Lucene 2.4, Eclipse Equinox Framework 3.4, Eclipse JDT Core 3.4, Eclipse PDE UI 3.4.1, and Mylyn 3.1. These datasets provide class level information of numerous metrics along with the faults reported in each class. We selected ve coupling metrics from the datasets due to their reported e ectiveness in SFP. Finally, we extended the datasets by adding information of the proposed Vovel metrics from the projects’ source code using JavaParser. We first performed the univariate logistic regression to compute the signi cance of all the included coupling metrics, wherein all metrics were found signi cantly correlated with the fault. Later we performed the correlation analysis using Spearman correlation between all the coupling metrics in the datasets, to ensure the absence of duplicate information. It is observed that there is weak correlation exists between the metrics, yet not enough to be dropped. Finally, an experiment is conducted using multivariate logistic regression to analyze the performance achieved by including Vovel metrics. The signi cance of the result is ensured statistically using Wilcoxon test. The results of F-measure reflect signi cantly improved predictive performance of proposed metrics when used in combination with conventional class level coupling metrics. In this thesis, we empirically evaluated the impact of coupling metrics, and more speci cally, data flow volume coupling level in SFP. The results show that the inclusion of these factors signi cantly improves SFP.

Download full paper