[LINK] XML isn't evil, just misunderstood

Tue Nov 11 08:11:02 AEDT 2008

Jim,

I guess it's the misuse that gets me rather than the use ...

> Doesn't anyone remember the good old days of trying to reverse engineer
> maverick file formats that were different for every application?  When
> data is in xml you don't have to spend time working out the element
> order, where the commas and spaces go or if they're meant to be pipes or
> whatever.  It might make a simple csv list more difficult to wrangle but
> for any difficult data it's a godsend: you can read and write it with
> just about any recent environment using library functions with little
> work.
That's the theory ... I guess I'll have to pass to Craig's comment about
how often XML is done badly. The problem arises in two things, in my
experience:

(a) people that roll-their-own XML assume that the consumer of the data
will be an expert, or that the "in house" XML reader actually works,
which only holds true until the person maintaining the dataset forgets
to pass on a change to the person maintaining the parser;

(b) I can't, for the life of me, find a 'generic' parser - a stand-alone
app that would (for example) take an arbitrary XML file and produce a
table of "tags" so I can make sense of it. That would be a godsend ...
but it seems my Googling is letting me down.
>   Creating a full xml spec for a data set can be a significant task
> for complex datasets, but once you've got it you can just about put your
> feet up - well, on the usual round of import/export/semantic problems at
> least.
>
> XML is alleged to be human readable but it isn't really, human
> "decodable" is more accurate.  It's hard to see how it could be human
> readable, given it can handle extremely complex data.  There's a
> trade-off between human readability and robustness.  If you happen to be
> human, you need a good viewer that can fold up detail and search for
> items of interest.  Another downside is that the equivalent xml file is
> many times the size of a concise data format.  However, xml is very
> amenable to zipping.  My main objection to xml in practice is that some
> zealots have tried to use it as database rather than a storage and
> transfer format.  Except for a few special applications, the decades of
> development and optimisation of conventional databases produces a vastly
> better result.
>
> You might like CSV but it doesn't scale: you just can't put an
> OpenOffice document in a CSV file, without going insane.  
<grin> But I don't have to put the document *into* CSV, just get the
data *out*. Someone else's insanity isn't my problem...</grin>

I'm not sure about "not scaling". I will admit to inflexibility of the
contents (as you note below). But my favourite example, the Census data,
ships as CSV with hundreds of tables and (I suppose) millions of data
elements, so it's scaled okay, and because it's mundane and
strictly-formatted, I can script the import quite quickly. The same goes
for another "difficult" dataset, the ACMA radiocomm license database.

These really would be a pain if shipped as XML...
> And if you
> think about designing a format for OOO files you'll see why they chose
> xml: All the low level formatting decisions are handled immediately by
> xml.
...There, I guess, is where XML disappears into runaway developer-itis.
I realise I'm speaking purely as Richard, but I don't need a dataset to
try and tell me about its formatting, I just want to get it into the
database and use it. But once you say "describe the data structure,
include the data, *and* carry the formatting" then things are getting a
bit fat and difficult...

If someone argued that XML is an improvement on document formats, I
can't argue with them. But for a consumer of a dataset, it's an
irritation ...

RC
>   You've still got the massive problem of designing the data
> structure but you know the low level semantics aren't going to bite you.
> Once you have the spec you can easily and independently test data for
> compliance - which means that when you're trying to debug the actually
> application you can have a lot of confidence in your test data set.  As
> we've all heard, OfficeXML has some serious problems but they don't come
> within a light year of the intractability of the old doc format.  
>
> Jim
>
>
>
>
>     
>
>
> CONFIDENTIALITY NOTICE AND DISCLAIMER
>
> The information in this transmission may be confidential and/or protected by legal professional privilege, and is intended only for the person or persons to whom it is addressed. If you are not such a person, you are warned that any disclosure, copying or dissemination of the information is unauthorised. If you have received the transmission in error, please immediately contact this office by telephone, fax or email, to inform us of the error and to enable arrangements to be made for the destruction of the transmission, or its return at our cost. No liability is accepted for any unauthorised use of the information contained in this transmission. If the transmission contains advice, the advice is based on instructions in relation to, and is provided to the addressee in connection with, the matter mentioned above. Responsibility is not accepted for reliance upon it by any other person or for any other purpose.
>
> _______________________________________________
> Link mailing list
> Link at mailman.anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/link
>
>