Classical force-directed algorithms, like spring embedders, are by far the most popular graph visualization techniques. One of the key components of this success is the simplicity of their implementation and the effectiveness of the resulting drawings. Spring embedders and their variants make the final user only a few lines of code away from an effective layout of a network. They model the graph as a physical system, where vertices are equally-charged electrical particles that repeal each other and edges act like springs that give rise to attractive forces. Computing a drawing corresponds to finding an equilibrium state of the force system by a simple iterative approach.
The main drawback of spring embedders is that they are relatively expensive in terms of computational resources, which gives rise to scalability problems even for graphs with a few thousands vertices.
Motivated by the increasing availability of scalable cloud computing services, we conceived GI.L.A., a simple force-directed algorithm for a distributed architecture. GILA is implemented using Giraph, and runs on top of Hadoop. Such implementation allows our algorithm to compute drawings of graphs with million edges using commodity hardware (if you already have your own Hadoop cluster) or cheap PaaS platforms (e.g. Amazon EC2).
GILA is designed according to the Think-Like-A-Vertex (TLAV) paradigm. TLAV is a vertex-centric approach to design a distributed algorithm from the perspective of a vertex rather than of the whole graph. It improves locality, demonstrates linear scalability, and can be easily adopted to reinterpret many centralized iterative graph. Also, TLAV approaches overcome the limits of other popular distributed paradigms like MapReduce, which are often poor-performing for iterative graph algorithms.
Our algorithm shares the resolutive process of Fruchterman-Reingold (FR from now on) force directed, and it comes packed with two different force models: FR’s and LinLog. The two can be swapped via command line. It is easy to add custom force models and cooling strategies, please read the Subclassing Guide for more details.
GILA, on a 10 machine EC2 cluster with R3.xlarge instances, was capable of drawing graphs with more than 1 million edges in reasonable time; the drawing quality was comparable to the one achieved by centralized FR. A complete survey of our experimental results can be found on our paper.
The algorithmic pipeline is the following:
After a first pruning phase in which all one degree vertices are removed, the graph is partitioned in order to assign the vertices to the machines improving data locality. When the layout phase is completed, one degree vertices are reinsterted back into the graph close to their only neighbor using a strategy that can be chosen via command line. To add custom reintegration strategies, please read the Subclassing Guide.