[LINK] deep web
Michael Still
mikal at stillhq.com
Tue Feb 24 09:41:43 AEDT 2009
Ivan Trundle wrote:
> On 24/02/2009, at 8:59 AM, Michael Still wrote:
>
>> Eric Scheid wrote:
>>> On 24/2/09 8:23 AM, "Stilgherrian" <stil at stilgherrian.com> wrote:
>>>
>>>> (There's also another question. If you put information on the public
>>>> web, why *wouldn't* you want it indexed so people can find it?
>>>> Either
>>>> you want it public or you don't. Don't you?)
>>> can a robot tell the difference between a finite database of
>>> information and
>>> an infinitely large dynamically generated information space?
>> Imagine you're a first year computer science student... Surely you can
>> think of a way of avoiding infinite loops?
>
> Who's to say that it's an infinite loop? (though, technically, it's
> unlikely to be infinite otherwise)
>
> It's the known unknowns that worry me most.
Its a scheduling problem... Controlling access to limited resources
(crawl time in this case) in a fair / efficient manner. It sounds a lot
like things the kernel is doing all the time -- IO scheduling, CPU
scheduling, network QoS, etc. I'm suggesting that the same algorithms
could apply to this case as well.
Mikal
More information about the Link
mailing list