Predicting Open-Source Software Quality Using Statistical and Machine Learning Techniques
Phadke, Amit Ashok
AdvisorAllen, Edward B.
CommitteeBoggess, Julian E.
Bridges, Susan M.
Developing high quality software is the goal of every software development organization. Software quality models are commonly used to assess and improve the software quality. These models, based on the past releases of the system, can be used to identify the fault-prone modules for the next release. This information is useful to the open-source software community, including both developers and users. Developers can use this information to clean or rebuild the faulty modules thus enhancing the system. The users of the software system can make informed decisions about the quality of the product. This thesis builds quality models using logistic regression, neural networks, decision trees, and genetic algorithms and compares their performance. Our results show that an overall accuracy of 65 ? 85% is achieved with a type II misclassification rate of approximately 20 ? 35%. Performance of each of the methods is comparable to the others with minor variations.