[LINK] More to the world than physics [was: Google's WiFi bungle]

Thu May 20 09:08:21 AEST 2010

On Thu, May 20, 2010 at 05:33:13AM +1000, Stephen Wilson wrote:
> If only physics was the best or only model for understanding the world, 
> you'd be OK.  But it's not, and your shrill protestations and ongoing 
> category errors prove my point that there is a chasm between the worlds 
> of technologists and privacy policy makers.  

now it's technologists vs privacy policy makers.  please make up your
mind and quit shifting the goal posts.

if you'll recall - the original dispute was that I, and others, said
this whole wifi "issue" was just a media beat-up. which it is. and it
has the usual elements of witch-hunt and rabble-rousing.

> But you're wrong to treat concepts like private and public like       
> physical properties.                                                  

1. i'm not treating them like physical properties, but as matters of
objective fact.

2. i'm not wrong in doing that.  it is demonstrably true.

> Yet again, I bring you back to a point of law: the operable term is   
> "persional information", and not "public" or "private".               

and, as i've said several times now, you need to first prove that there
was any "personal information" in the packets received.

you're constantly making the *assumption* that there was.  and then
immediately judging google to be guilty of privacy infringment.

this may seem quaint and old-fashioned but in my view, people (and
corporations) ought to be judged on what they've actually done, not on
what a hysterical and ignorant mob assumes they might have done.

> If a data set contains information about a person, where their        
> identity is apparent or can be readily determined, then that          
> data stream is called "personal information" and it's subject to      
> information privacy law.

and *IF* is the operative word here.

you're posing a question, and then leaping immediately to the conclusion
("guilty") without even bothering to answer it.

> Collecting (and hanging on to) payload data is very different from 
> collecting wifi addresses because of the issue of primary purpose.  

you're wrongly assuming that "collecting payload data" and "collecting
wifi addresses" are two different actions. they're not. they're one
action, scanning: you scan, you get a whole load of data, then you
filter out the stuff you don't want.

> To go back to your examples of mail servers collecting personal       
> information, that colelction is intrinsic to how the e-mail system    
> works, and I wouldn't think it was unjustifiable.                     
>
> But if a mail service operator then put that personal information     
> to another unrelated purpose, without informing the individuals       
> concerned, then they may have breached information privacy law.       

yes, of course.

and there's another "if".

so, the next thing you need to prove (after proving that there was
"personal information" in it) is that google used, or intended to, any
collected information.

you can't just *assume* that they did - especially when that assumption
is contrary to the way that wifi scanning actually works.

1. you're assuming that when you scan, you just get the minimal data that
you want and have to go out of your way to gather any extra data.

if only things worked that way. my job, and that of many other IT
people, would be MUCH simpler.

in reality, when you scan, you get everything and you have to take extra
steps to get rid of the junk that you don't want.

in reality, extracting the signal you want from the noise you don't is
always the hardest part of the job, and it's always tedious, error-prone
and pretty much impossible to get right first time. it's always an
iterative process of incrementally stripping back more and more junk to
leave behind the target data.

which brings us to:

2. it's not in the least bit unusual in ANY data acquisition job
(whether you're talking about getting data from sensors or from a
scientific instrument or from web-server log files or from scanning for
wireless networks etc) to separate the tasks of data gathering and data
processing. in fact, it's normal and SOP to do things that way.

which is why it does not surprise me in the least, or think it unusually
suspicious, for google to have done that. i'd be surprised if they did
it any other way.

it's like a video recorder, you press "RECORD" and it records everything
that was broadcast. later you go back and cut out or skip the ads.

what i'm seeing in this mess is a whole lot of people making wrong
assumptions that the technology works contrary to the way that it
actually does, then leaping to conclusions about what the extra data
contained ("ooh. it's not just SSIDs or MAC addresses, so it *must* be
personal information"), then they assume that google used or intended
to use this assumed personal info, and finally they automatically judge
google guilty of breaching various privacy laws around the world. case
closed.

i'm appalled at how easily otherwise rational people are sucked into
both trial-by-media and trial-by-political-grandstanding just because
there's technology involved.

i'm especially appalled that it's happening on LINK where we've seen and
discussed this phenomenon many times over the years.

BTW, if google were trying to extract "personal information" from the
recorded data then it would also be subject to the points i made above
about extracting signal from noise. while it's possible that there *may*
have been *some* personal information in the data, it would have been
miniscule compared to the rest of the garbage - and even more difficult
to extract than purely technical information like SSID or MAC address
because it's subjective and not easily identifiable.

(admittedly, google ARE experts at extracting signal from noise. still
doesn't mean it would be easy...and getting useful data from random
network packets is nowhere near as simple as getting it from formatted
html, pdf, text, and other document types)

any data set gathered by google is not going to be conveniently ordered
and easily accesible - it's more like a garbage truck full of several
street's worth of domestic garbage that may or may not have a few
fragments of someone's carelessly discarded bank statements hidden in
it. if you can find them, and clean off the rotting tomato, then you
might end up with enough fragments to assemble together in order, and
then, yes, you might have some personal information about someone. or
you might just end up with a bit of smudged paper with a bank logo on
it.

which is why i keep making the point that you have to prove that there
actually was any personal information in the data that google gathered.
it's extremely unlikely that there was, OR that there would have been
anywhere near enough to have been worth the effort of extracting it.

> Which is basically what the Buzz fuss was all about. 

which is the root cause of this wifi hysteria - Buzz had serious
privacy problems so anything google does is a privacy infringement
whether it actually is or not.

craig

-- 
craig sanders <cas at taz.net.au>