[LINK] Re: Google for universities

David Hawking David.Hawking@cmis.CSIRO.AU
Tue, 30 Jan 2001 00:51:56 +1100 (EST)


Danny,
	Thanks for your comments, even if they are somewhat 
pesimistic.  I hope my responses are of some interest.

> Hmm... dedicated hardware for intranet search?  That seems unlikely
> to take off - it's overkill for small organisations, while big ones
> will mostly prefer to configure a software solution on whatever their
> server platform of choice is, rather than adding an extra system.
> And if I'm going to sell my soul to proprietary software, I'd want
> to go with a "big player", to reduce the chance of being left with
> unsupported and unsupportable legacy software.

a. The hardware cost for a P@NOPTIC box is really quite small and 
can be reduced still further when only a small number of pages need
to be searched.  We've built full-scale demo searches for universities using
only about $2k worth of hardware.  Of course the cost rises when you want
full fault tolerance and very large capacity.

b. For small organisations we propose a bureau service similar to 
the type offered by Google but we won't need to charge as much.

c. The advantage of the dedicated hardware approach is that 
we can avoid nearly all of the potentially harmful interactions
between the search service and hardware/OS/webserver/applications.
It's a model which has been successful in some other network 
applications.

d. In principle, we're happy to licence you the software to run on
your hardware.  And it's fine for you to run other things on the 
same box provided you manage the interactions.   

e. If you choose to sell your soul to us rather than to some of
the "big players" we're almost certain to charge you less for
the privilege ... and we do try to support open standards.

f. I take considerable exception to the claim of "unsupportable".  We
have put a lot of effort into turning a high-performance research 
prototype into a maintainable product.

g. Despite your pessimism, I am cautiously optimistic that there will
be sufficient market for P@NOPTIC to justify ongoing support and 
development.  If I'm wrong and at some stage we cease to renew 
P@NOPTIC support contracts, it is almost certain we would make full 
source available to (at least) our former customers.   

> 
> It might be a cool product (though it's not clear from the web site
> how much it does that something like htdig doesn't), but I don't think
> there's much of a commercial niche for it - why not open source it
> and let the world play with it, learn from it, and benefit from it?
> I hope that's still one of the reasons CSIRO and universities exist,
> and that making money isn't everything...

h. I haven't done a direct comparison with htdig but I can give you
some statistics about the P@NOPTIC service at ANU:
	h/w:                     dual 450 MHz Pentium, 1 gB RAM
        pages:                   ~ 420,000
        servers:	         ~ 180
        crawl-time:	         ~ 24 hrs, (being reasonably polite)
	index-time:	         < 1 hr
	query capacity:	         ~ 250,000 new queries/day
		                   - faster from cache
	index-size:		 ~ 25% of data size 

i. P@NOPTIC provides:
	- metadata/content search
	- departmental searches off central index (>60 at ANU)
	- internal/external views 
	- organisational thesaurus
	- query-biased summaries
	- proven document ranking algorithms.

j. My team are really keen to continue our research into even better
algorithms for search.  The reality of current CSIRO funding
is that the best way we can do so is by using P@NOPTIC  to
help us meet our external earnings targets.  

Dave
-- 
David Hawking
CSIRO Mathematical and Information Sciences
GPO Box 664
Canberra ACT 2601
Australia
email: David.Hawking@cmis.csiro.au   phone: 61-2-6216 7060  fax: 61-2-6216 7111
http://cs.anu.edu.au/people/David.Hawking