[LINK] deep web

Tue Feb 24 09:41:43 AEDT 2009

Ivan Trundle wrote:
> On 24/02/2009, at 8:59 AM, Michael Still wrote:
> 
>> Eric Scheid wrote:
>>> On 24/2/09 8:23 AM, "Stilgherrian" <stil at stilgherrian.com> wrote:
>>>
>>>> (There's also another question. If you put information on the public
>>>> web, why *wouldn't* you want it indexed so people can find it?  
>>>> Either
>>>> you want it public or you don't. Don't you?)
>>> can a robot tell the difference between a finite database of  
>>> information and
>>> an infinitely large dynamically generated information space?
>> Imagine you're a first year computer science student... Surely you can
>> think of a way of avoiding infinite loops?
> 
> Who's to say that it's an infinite loop? (though, technically, it's  
> unlikely to be infinite otherwise)
> 
> It's the known unknowns that worry me most.

Its a scheduling problem... Controlling access to limited resources
(crawl time in this case) in a fair / efficient manner. It sounds a lot
like things the kernel is doing all the time -- IO scheduling, CPU
scheduling, network QoS, etc. I'm suggesting that the same algorithms
could apply to this case as well.

Mikal