[LINK] Google Inadequacies [Was Re: Tony and ANU]
Danny Yee
danny at anatomy.usyd.edu.au
Mon Feb 23 12:05:10 AEDT 2009
Roger Clarke wrote:
> As I understand it, Panoptic uses the (longstanding, until Google)
> convention of occurrence of words or strings within text, including
> location of use (e.g. use-in-title counts more than use-in-text) and
> frequency of use.
>
> Google uses the (at least in 1998) novel approach of (also? or
> exclusively?) determining precedence according to the number of pages
> that point to it, which by inference means how authoritative the page
> is considered by other page-authors:
> http://en.wikipedia.org/wiki/PageRank
Google also uses in-document cues such as word occurrences and
locations. And Panoptic also uses link structure.
But my understanding is that the Panoptic search index is built on
a per-organisation basis, and has no way of using the information
provided by external links. If an "important" (*) web site at
Harvard links to a web page at Sydney using the anchor text "frog",
for example, that's information Google has available to weight its
ranking but Panoptic doesn't.
Countering this, there are web sites that people would consider
"part" of Sydney Uni - an example would be something like
http://www.bosch.org.au/ - which are indexed by the Panoptic engine
but not included in a "site:usyd.edu.au" search on Google. Here local
knowledge allows for smarter search restrictions. (Though Google
has enough information from link structure to know that this
site is related to usyd.edu.au, and could presumably implement a
"siteandrelated:" operator.)
None of this is specific to Panoptic (or Google). There are many web
sites -- including many Australian government ones -- where Google
(or Yahoo, or other full-web search services) can provide a more
useful way to search the site than the "built in" search functionality.
(*) I agree with Roger that notions of "importance" are in the eye
of the beholder to a certain extent, but they're not completely
idiosyncratic.
Danny.
More information about the Link
mailing list