This class contains the implementation of the Spinner partitioner. It has been
left largely untouched, but the "master" computation and other methods have been
modified to adapt to Dafne's algorithm.
Modifications include the code for pruning the graph and the correct order of
computations in order to complete the the pre-process stage. Data structures have been updated
to match the ones used in Dafne.
What follows is taken from the original comments.
##################################################################################
Implements the Spinner edge-based balanced k-way partitioning of a graph.
The algorithm partitions N vertices across k partitions, trying to keep the
number of edges in each partition similar. The algorithm is also able to
adapt a previous partitioning to a set of updates to the graph, e.g. adding
or removing vertices and edges. The algorithm can also adapt a previous
partitioning to updates to the number of partitions, e.g. adding or removing
partitions.
The algorithm works as follows:
1) The vertices are assigned to partitions according to the following
heuristics: a) If we are computing a new partitioning, assign the vertex to a
random partition b) If we are adapting the partitioning to graph changes: (i)
If the vertex was previously partitioned, assign the previous label (ii) If
the vertex is new, assign a random partition c) If we are adapting to changes
to the number of partitions: (i) If we are adding partitions: assign the
vertex to one of the new partitions (uniformly at random) with probability p
(see paper), (ii) If we are removing partitions: assign vertices belonging to
the removed partitions to the other partitions uniformly at random. After the
vertices are initialized, they communicate their label to their neighbors,
and update the partition loads according to their assignments.
2) Each vertex computes the score for each label based on loads and the
labels from incoming neighbors. If a new partition has higher score (or
depending on the heuristics used), the vertex decides to try to migrate
during the following superstep. Otherwise, it does nothing.
3) Interested vertices try to migrate according to the ratio of vertices who
want to migrate to a partition i and the remaining capacity of partition i.
Vertices who succeed in the migration update the partition loads and
communicate their migration to their neighbors true a message.
4) Point (2) and (3) keep on being alternating until convergence is reach,
hence till the global score of the partitioning does not change for a number
of times over a certain threshold.
Each vertex stores the position of its neighbors in the edge values, to avoid
re-communicating labels at each iteration also for non-migrating vertices.
Due to the random access to edges, this class performs much better when using
OpenHashMapEdges class provided with this code.
To use the partitioning computed by this class in Giraph, see
PrefixHashPartitionerFactor
,
PrefixHashWorkerPartitioner
, and
LongWritable
.