[Perl.sig] Big-O notation

Wed Mar 14 09:01:32 EST 2007

Posted on behalf of Jacinta Richardson.

==== Upcoming courses ====

    We're proud to be running the following courses over the next couple of
    months:

            Programming Perl
                    Canberra:  17th - 20th April  (Early bird: 26th March)
                    Perth:     15th - 18th May

            Object Oriented Perl
                    Melbourne:   1st -  2nd May   (Early bird:  2nd April)
                    Sydney:     19th - 20th June
                    Canberra:   26th - 27th June

            Databases Programming with Perl:
                    Melbourne:   3rd -  4th May   (Early bird:  2nd April)
                    Sydney:     21st - 22nd June
                    Canberra:   28th - 29th June

            Web Development with Perl:
                    Melbourne:  23rd - 24th May

            Perl Security:
                    Melbourne:  25th May

==== Big-O notation ====

    Big-O notation is a mathematical construct used to describe algorithmic
    complexity. In particular it is allows efficiency comparisons of
    different algorithms for large data sets.

    Typical orders are (in order of desirability):

   O(1) also ``C''
        Constant time. Computation does not take appreciably longer as the
        data set grows. For example, a hash lookup is considered to occur in
        constant time.

   ``O(log N)''
        Logarithmic. Computation time grows as N grows, but this growth is
        small for large N. For example doing a binary search over a sorted
        list to find an item.

   ``O((log N)^c)''
        Poly-logarithmic. Computation time grows as N grows, but this extra
        growth in time is still small for large N.

   O(N)
        Linear. Computation time grows in equal proportion to N's growth.
        For example, adding a value to each item in a list is linear.

   ``O(N log N)''
        Quasi-linear. Computation time grows only slightly faster than N's
        growth, for large N.

   ``O(N^2)'' and ``O(N^c)''
        Quadratic and Polynomial. For ``N^2'': doubling ``N'' will quadruple
        execution time, ``N^3'': doubling ``N'' increases execution time by
        a factor of 8. For large N, ``O(N^2)'' and above rapidly become
        unusable.

   ``O(n!)''
        Factorial. Almost never a good idea, large for even small N. 5! =
        120.

    It is an interesting property of Big-O notation that if a function can
    be written as a sum of other functions, then only the fastest growing
    function matters. This is because the growth in that function will
    rapidly dwarf the growth in the others. For example:

                    cN + dN^2      => O(N^2)  for any constants c and d
                    N^4 + N!       => O(N!)

==   Why Big-O matters   ==

    Understanding the basics of how to calculate Big-O notation can help us
    identify bad choices in our code and spot likely causes of slow
    execution time. For example consider the following attempt to find all
    the elements in list1 which are also in list2:

            my @list1 = (10 .. 100);
            my @list2 = (50 .. 200);

            # Walk over @list1
            foreach my $element (@list1) {

                    # Check in @list2 to see if it's there
                    foreach my $item (@list2) {

                            if($item eq $element) {
                                    print "$item is in both lists\n";
                            }
                    }
            }

    If @list1 and @list2 are small, this code works very quickly. However,
    if we use:

            my @list1 = (10 .. 10000);
            my @list2 = (5000 .. 2000000);

    then the code takes a very long time. Let's call the size of @list1
    ``N'' and the size of @list2 ``M'' then what we're doing is:

            for each N
                    for each M
                            do something

    You can see here that this means we walk over @list1 once, ``but'' for
    each item we also walk over the whole of @list2. This means we're
    dealing with ``O(N M)''. As the smaller list approaches the same size as
    the larger, this approaches ``O(N^2)'' which means we should investigate
    whether a faster possibility exists.

    We can rewrite the above code as follows:

            my @list1 = (10 .. 100);        # length N
            my @list2 = (50 .. 200);        # length M

            # Create a hash of all the values in @list1: O(N)
            my %in_list1;
            foreach my $element (@list1) {
                    $in_list1{$element} = ();
            }

            # Walk over each element in @list2 and compare against the hash
            # O(M)
            foreach my $item (@list2) {
                    if(exists $in_list1{$item}) {
                            print "$item is in both lists\n";
                    }
            }

    Now we can see that the end result is ``O(N+M)'' - we walk over @list1
    once, and we walk over @list2 once. This is an O(N) solution so we know
    it's good. Note that we're achieving this trade-off by using extra
    memory to store our %in_list1 hash. For very large ``N'' this needs to
    be considered.

    The above style problem comes up regularly, even though it may not quite
    look as simple as this. For example, imagine that you have a flat file
    containing addresses for businesses, and you wish to display a subset of
    those based on a list of suburbs. You could go through the whole file
    for each suburb (``O(N M)'') or you could adapt the above solution to
    store either the addresses or suburbs in a hash (depending on your
    requirements) and go through each only once. The same style solution can
    also be used to find the complement of a list.

==   Ignoring constants   ==

    Big-O notation does not predict how fast something will run, but rather
    gives us information about how well it will scale. For example doubling
    the size of ``N'' for a ``O(N^2)'' problem will *quadruple* the program
    execution time regardless of whether the function complexity is actually
    ``O(0.5 N^2)'' or ``O(10000 N^2)''.

    Constants only matter when comparing functions of different
    complexities. For example ``O(100*N)'' is slower than ``O(N^2 /100)''
    until ``N'' is greater than 10,000.

==   See also   ==

    Mathematical coverage for Big-O notation can be found on Wikipedia at:
    <http://en.wikipedia.org/wiki/Big_O_notation>.

    There are also some great notes on Big-O notation on perlmonks.org

   *   <http://www.perlmonks.org/?node_id=573138>,

   *   <http://www.perlmonks.org/?node_id=227909> and

   *   <http://www.perlmonks.org/?node_id=25833>.

==== Corporate Courses ====

    http://perltraining.com.au/corporate.html

    Do you have a large group, or the need for training at a particular
    time? Perl Training Australia is happy to arrange a course in the time
    and place that best suits you. For more information read our page on
    Corporate Courses at http://perltraining.com.au/corporate.html or call
    us on +61-3-9354-6001.
_______________________________________________
This Perl tooltip and associated text is Copyright Perl Training Australia.
You may freely distribute this text so long as it is distributed in full
with this Copyright noticed attached.

If you have any questions please don't hesitate to contact us:
   Email: contact at perltraining.com.au
   Phone: 03 9354 6001

Perl-tips mailing list
To change your subscription details visit:
http://perltraining.com.au/cgi-bin/mailman/listinfo/perl-tips