[Nauty-list] shortg with a large number of inputs

Brendan McKay bdm at cs.anu.edu.au
Mon Oct 15 18:40:42 EST 2007


Hi Ken,

The best thing you can do is to forget about using dreadnaut format
when you have serious work to do. Write the graphs directly into
sparse6 or graph6 format from your program. Then they will take up
much less space and you won't need to bother with dretog.

To write into sparse6 or graph6 format, use the procedures in gtools.c:
writeg6(FILE *f, graph *g, int m, int n)
writes6(FILE *f, graph *g, int m, int n)
 - if the graph is internally in packed format
writes6_sg(FILE *f, sparsegraph *g)
 - if the graph is internally in sparse (adjacency-list) format

For your parameters, sparse6 is a bit more compact than graph6.

I never tried to make nauty/gtools large-file friendly.  I should!
Meanwhile, here's a trick.  I made a 4GB file xx in dreadnaut format.

% dretog xx
Can't open input file xx
>E gtools: File too large

% dretog <xx
...correct execution

The thing is that the library allows seeking and position queries
for ordinary files opened using fopen() so it needs large-file
pointers and counters. For standard input, seeking and position
queries are not allowed so it doesn't care how much data there is.
Reading the data from a pipe works too.

Incidentally, version 2.4b7 now on the nauty page
     http://cs.anu.edu.au/~bdm/nauty/
has a new shortg with a switch -Tdir to specify "dir" as the directory
for temporary files.

Cheers, Brendan.

* Ken Ryan <kjryan at bgsu.edu> [071015 09:01]:
> Thanks, Brendan.
> 
> Temporary files are being written to the new directory that I
> specified using your instructions, so it appears that your fix worked.
> I wanted to verify (for myself and the mailing list) that my sort now
> handles larger files, but instead first ran into large file problems
> while trying larger problems.  Unfortunately, I am not able to rerun
> the case where that sort error manifested since I do not recall the
> exact inputs I used with my C program.  Before getting your response,
> I ended up solving that case (discussed below) by partitioning the job
> further and using shortg on each of the smaller files or sub-pieces.
> 
> I am working with bipartite undirected graphs with N+k vertices and
> Nk/2 edges.  For example, with N=256 and k=16, the problem (mentioned
> above) had 282,959 input graphs (and 186,595 isomorphism classes).  I
> ran my C code (complied with large file support), and the resulting
> human readable file of all these input graphs in dreadnaut format was
> 2.7GB.  The final step would have been to run detrog followed by
> shortg, but dretog gave an error due to input file size. I think that
> an answer to one of the following will improve my approach:
> 
> 1) Do you have a makefile which compiles gtools applications for large file 
> support?
> 
> 2) In a different format which I use to store the graphs, the 282,959
> input graphs take up only 0.5GB.  It is when I represent them as the
> needed graphs with N+k vertices that they total 2.7GB.  However, after
> dretog the resulting file in g6 format would be (much) less than
> 2.7GB.  While I have the current graph in memory in my program is it
> possible to write it out in g6 format and by-pass writing the large
> 2.7GB file?
> 
> OR
> 
> 3) Some else that I'm missing?
> 
> Regards,
> Ken
> ps A thanks to Keith for a response.  My system does have GNU sort.
> 
> 
> 
> At 03:49 AM 10/13/2007, Brendan McKay wrote:
> 
> >Hi Ken,
> >
> >I never heard of any such problem caused by pipes and doubt if
> >that is the correct explanation.  My theory is that the sort
> >process is running out of temporary disk space.
> >
> >The usual behaviour of sort on unix-like systems is to create
> >temporary files in /tmp.  If /tmp has insufficient space left,
> >sort will crash (to check:  "df -m /tmp" answers in megabytes).
> >
> >A proper fix to this problem would be to have an option on shortg
> >to specify the place to put temporary files.  This has been on my
> >list of things to do for a long time but I just moved it higher.
> >
> >Meanwhile, you can try two things:
> >1. Define the environment variable TMPDIR before running shortg
> >   to be the name of a directory with enough space.  If such a
> >   directory is "dir" (use a full path), this is done by:
> >      In bash:    export TMPDIR=dir
> >      In tcsh:    setenv TMPDIR dir
> >   This method will of course only work if your sort program obeys
> >   the TMPDIR variable.
> >2. Modify shortg.c like this (about line 92):
> >#define SORTCOMMAND  SORTPROG,SORTPROG,"-T","dir","-u","+0","-1"
> >#define VSORTCOMMAND1  SORTPROG,SORTPROG,"-T","dir"
> >#define VSORTCOMMAND2  SORTPROG,SORTPROG,"-T","dir","+0","-1","+2"
> >where dir is as in choice #1.  Then remake it.
> >
> >In both #1 and #2 you should be able to use the current directory
> >under the name '.' .
> >
> >Please let me know if this proves to be the solution.
> >
> >Brendan.
> >
> >
> >* Ken Ryan <kjryan at bgsu.edu> [071013 01:32]:
> >>
> >> I observed that if a large set of graphs are given as an input shortg
> >> crashes. shortg calls the unix sort utility and passes an input of graphs
> >> in canonical form to the unix sort utility. If the number of graphs 
> >passed
> >> to the unix sort utility is large then the unix sort utility crashes and
> >> that causes shortg to crash. I was told by the system administrator that
> >> there is a bottleneck due to the pipe being used to pass the input of
> >> graphs to the unix sort utility. I was wondering if there is a fix to 
> >this
> >> situation.  If needed, I will provide additional details ...
> >>
> >> I really appreciate your time and help on my last few questions.
> >>
> >> Regards,
> >>
> >> Ken Ryan, Assistant Professor
> >> Applied Statistics and Operations Research
> >> Bowling Green State University
> >> Bowling Green, Ohio 43403-0267




More information about the Nauty mailing list