Graphs and graph analytics facilitate new approaches to machine learning. They
also provide the ability to extract new insights from the same datasets as used in
traditional machine learning experiments. For this reason, many researchers are
seeking to exploit graph databases in pursuit of better performance for their predictive models. However, the construction of a graph from relational or flat models such
as CSV files is not a straightforward transformation. A careful selection of nodes
and relationships is required to ensure an optimal construction of the target graph.
Overly large graphs can cause performance issues for a number of graph algorithms
and thus, graph compression is an important part of the construction process. This
research has 2 components: the usage of graphs to integrate multiple data sources
and a graph transformation methodology to create the integrated schema and populate the graph. Our approach to validation uses link prediction and community
detection graph analytics to evaluate the graphs built using our methodology.