[Nauty] Parallelizing generation

Mon Jan 18 21:09:50 AEDT 2021

Hi Brendan,

Thanks for the detailed explanation!

How is the splitting level determined? Can I set it manually, or is it 
simply the first level that has size >= mod? Can I diagnose this; for 
instance see where the splitting occurred and possibly count how much 
time was spent on pruning and generation before and after that?

It seems that the space is split into roughly even parts, so a plausible 
explanation is indeed that generating the shared tree takes too long. 
But on the other hand I'm not trying to use that many cores (30-50) and 
I already see behavior that suggests it may take as much as 300 seconds 
for the shared tree, which I can't quite believe (this is for bicoloured 
12 + 11 vertices with pruning and a matching on the 12 vertices, using 
up to 32 cores, the output has ~140k graphs, running times and output 
sizes are roughly even across parts).

Best,
Tomas

On 16.01.21 04:03, Brendan McKay wrote:
> Hi Tomas,
>
> I suspect you are using parameters for which the splitting is not good.
>
> Consider the case of generating all bicoloured graphs with 8 vertices 
> on each side, in total 14,685,630,688 graphs.
>
> One part:  3872 sec
> 10 parts:  392.9 sec each (factor 9.85)
> 100 parts: 41.0 sec each (factor 94.4)
> 1000 parts: 5.33 sec each (factor 726)
> 10000 parts: 1.74 sec each (factor 2220)
>
> This is close to linear speedup up to around 1000 parts.  I'll explain 
> how it is done.  Remember that a part is specified as a pair res/mod, 
> where 0<=res<=mod-1.
>
> The generation has the form of a tree, with output graphs as leaves.  
> Draw a line across the tree at some level (the splitting level) and 
> number the nodes on that level modulo mod (starting at 0).  Part 
> res/mod corresponds to the portion of the tree descended from those 
> nodes numbered res.
>
> Note that the entire tree must be generated as far as the splitting 
> level, which is an overhead for every part. In the example above, it 
> is about 1.35 seconds, which is why the parallelism gets poor when the 
> time per part isn't much more than the overhead. Check that the time 
> per part is close to 1.35 + 3870/N for N parts.
>
> In the case of different classes of bipartite graphs, various things 
> can go wrong.  If the shape of the tree is very irregular, the size of 
> the parts can vary a lot, which matters if your application relies on 
> waiting for the slowest part.  If the tree is strongly pruned in the 
> levels above the splitting level, it could mean that the splitting 
> overhead becomes a larger fraction of the total. Sometimes there is no 
> splitting level that gives both acceptable evenness of splitting and 
> acceptable overhead.
>
> I have used two splitting levels together on occasion. This feature is 
> not in the code but it is something to try if changing the splitting 
> level is not enough to solve the problem.
>
> Brendan.
>
>
> On 16/1/21 2:15 am, Tomas Peitl wrote:
>> Hi everyone,
>>
>> I've got a question about parallelization in generators such as genbg 
>> (I'm using a modified version of genbg provided by Brendan some time 
>> ago, but that shouldn't make a difference I hope).
>>
>> I noticed that the speedup from parallelization using the res/mod 
>> feature pretty much plateaus at around 5-6 cores, and it doesn't 
>> really make too much sense to use more than that. Is this more or 
>> less correct, or am I missing something? To be fair, the speedup does 
>> grow a bit even beyond 6 cores (depends on the particular case I 
>> guess), but the dependence on the number of jobs looks more 
>> logarithmic than linear (didn't really try to fit a curve). So if you 
>> have more than one generation task, it makes sense to use fewer cores 
>> per task, and instead run several tasks in parallel.
>>
>> I have some aggressive custom pruning, but on the other hand each job 
>> generates roughly the same number of results, and the running times 
>> are also more or less uniformly distributed, so the splitting of the 
>> search space seems OK.
>>
>> How does the res/mod feature work? Does each job still have to 
>> generate everything, just with the advantage of quickly skipping over 
>> graphs that belong to other jobs? Or can it run in "O(N/j)"?
>>
>> Many thanks,
>> Tomas
>>
>> _______________________________________________
>> Nauty mailing list
>> Nauty at anu.edu.au
>> http://mailman.anu.edu.au/mailman/listinfo/nauty
> _______________________________________________
> Nauty mailing list
> Nauty at anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/nauty