Learning Optimal Bayesian Networks with Heuristic Search
Malone, Brandon M.
Bayesian networks are a widely used graphical model which formalize reasoning under uncertainty. Unfortunately, construction of a Bayesian network by an expert is timeconsuming, and, in some cases, all expertsmay not agree on the best structure for a problem domain. Additionally, for some complex systems such as those present in molecular biology, experts with an understanding of the entire domain and how individual components interact may not exist. In these cases, we must learn the network structure from available data. This dissertation focuses on score-based structure learning. In this context, a scoring function is used to measure the goodness of fit of a structure to data. The goal is to find the structure which optimizes the scoring function. The first contribution of this dissertation is a shortest-path finding perspective for the problem of learning optimal Bayesian network structures. This perspective builds on earlier dynamic programming strategies, but, as we show, offers much more flexibility. Second, we develop a set of data structures to improve the efficiency of many of the integral calculations for structure learning. Most of these data structures benefit our algorithms, dynamic programming and other formulations of the structure learning problem. Next, we introduce a suite of algorithms that leverage the new data structures and shortest-path finding perspective for structure learning. These algorithms take advantage of a number of new heuristic functions to ignore provably sub-optimal parts of the search space. They also exploit regularities in the search that previous approaches could not. All of the algorithms we present have their own advantages. Some minimize work in a provable sense; others use external memory such as hard disk to scale to datasets with more variables. Several of the algorithms quickly find solutions and improve them as long as they are given more resources. Our algorithms improve the state of the art in structure learning by running faster, using less memory and incorporating other desirable characteristics, such as anytime behavior. We also pose unanswered questions to drive research into the future.