[Nauty] Parallelizing generation

Sat Jan 16 14:03:51 AEDT 2021

Hi Tomas,

I suspect you are using parameters for which the splitting is not good.

Consider the case of generating all bicoloured graphs with 8 vertices on 
each side, in total 14,685,630,688 graphs.

One part:  3872 sec
10 parts:  392.9 sec each (factor 9.85)
100 parts: 41.0 sec each (factor 94.4)
1000 parts: 5.33 sec each (factor 726)
10000 parts: 1.74 sec each (factor 2220)

This is close to linear speedup up to around 1000 parts.  I'll explain 
how it is done.  Remember that a part is specified as a pair res/mod, 
where 0<=res<=mod-1.

The generation has the form of a tree, with output graphs as leaves.  
Draw a line across the tree at some level (the splitting level) and 
number the nodes on that level modulo mod (starting at 0).  Part res/mod 
corresponds to the portion of the tree descended from those nodes 
numbered res.

Note that the entire tree must be generated as far as the splitting 
level, which is an overhead for every part. In the example above, it is 
about 1.35 seconds, which is why the parallelism gets poor when the time 
per part isn't much more than the overhead. Check that the time per part 
is close to 1.35 + 3870/N for N parts.

In the case of different classes of bipartite graphs, various things can 
go wrong.  If the shape of the tree is very irregular, the size of the 
parts can vary a lot, which matters if your application relies on 
waiting for the slowest part.  If the tree is strongly pruned in the 
levels above the splitting level, it could mean that the splitting 
overhead becomes a larger fraction of the total. Sometimes there is no 
splitting level that gives both acceptable evenness of splitting and 
acceptable overhead.

I have used two splitting levels together on occasion. This feature is 
not in the code but it is something to try if changing the splitting 
level is not enough to solve the problem.

Brendan.

On 16/1/21 2:15 am, Tomas Peitl wrote:
> Hi everyone,
>
> I've got a question about parallelization in generators such as genbg 
> (I'm using a modified version of genbg provided by Brendan some time 
> ago, but that shouldn't make a difference I hope).
>
> I noticed that the speedup from parallelization using the res/mod 
> feature pretty much plateaus at around 5-6 cores, and it doesn't 
> really make too much sense to use more than that. Is this more or less 
> correct, or am I missing something? To be fair, the speedup does grow 
> a bit even beyond 6 cores (depends on the particular case I guess), 
> but the dependence on the number of jobs looks more logarithmic than 
> linear (didn't really try to fit a curve). So if you have more than 
> one generation task, it makes sense to use fewer cores per task, and 
> instead run several tasks in parallel.
>
> I have some aggressive custom pruning, but on the other hand each job 
> generates roughly the same number of results, and the running times 
> are also more or less uniformly distributed, so the splitting of the 
> search space seems OK.
>
> How does the res/mod feature work? Does each job still have to 
> generate everything, just with the advantage of quickly skipping over 
> graphs that belong to other jobs? Or can it run in "O(N/j)"?
>
> Many thanks,
> Tomas
>
> _______________________________________________
> Nauty mailing list
> Nauty at anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/nauty