Large-Scale Graph Visual Analytics
Swan, J. Edward, II
CommitteePerkins, Andy D.
Wong, Pak Chung
Large-scale graph analysis and visualization is becoming a more challenging task, due to the increasing amount of graph data. This dissertation focuses on methods to ease the task of exploring large-scale graphs through graph sampling and visualization. Graph sampling aims to reduce the complexity of graph drawing, while preserving properties of the original graph, allowing analysis of the smaller sample which yields the characteristics similar to those of the original graph. Graph visualization is an effective and intuitive approach to observing structures within graph data. For large-scale graphs, graph sampling and visualization are straightforward strategies to gain insights into common issues that are often encountered. This dissertation evaluates commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates based on random graphs, small-world graphs, scale-free graphs, and real-world graphs. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task. In addition, this thesis proposes three types of distributed sampling algorithms and develops a sampling package on Spark. Compared with traditional/non-distributed graph sampling approaches, the scalable distributed sampling approaches are as reliable as the traditional/non-distributed graph sampling techniques, and they bring much needed improvement to sampling efficiency, especially with regards to topology-based sampling. This benchmark study in traditional/non-distributed graph sampling is also applicable to distributed graph sampling as well. A contribution to the area of graph visualization is also made through the presentation of a scalable graph visualization system-BGS (Big Graph Surfer) that creates hierarchical structure from an original graph and provides interactive navigation along the hierarchy by expanding or collapsing clusters when visualizing large-scale graphs. A distributed computing framework-Spark provides the backend for BGS on clustering and visualization. This architecture makes it capable of visualizing a graph up to 1 billion nodes or edges in real-time. In addition, BGS provides a series of hierarchy and graph exploration methods, such as hierarchy view, hierarchy navigation, hierarchy search, graph view, graph navigation, graph search, and other useful interactions. These functionalities facilitate the exploration of very large-scale graphs. Evaluation of BGS is performed through application to several representative of large-scale graph datasets and comparison with other existing graph visualization tools in scalability, usability, and flexibility. The dissertation concludes with a summarization of the contributions and their improvement on large-scale graph analysis and visualization, and a discussion about possible future work on this research field.