[LINK] XML isn't evil, just misunderstood

Birch Jim Jim.Birch at dhhs.tas.gov.au
Mon Nov 10 16:31:50 AEDT 2008


XML is brilliant.  Brilliant things are often misunderstood.  It's got
to be one of the great advances in software technology to date.  I'll
admit, it's possibly not best for everything but it got a big range of
applications, and solves a lot of problems.

Doesn't anyone remember the good old days of trying to reverse engineer
maverick file formats that were different for every application?  When
data is in xml you don't have to spend time working out the element
order, where the commas and spaces go or if they're meant to be pipes or
whatever.  It might make a simple csv list more difficult to wrangle but
for any difficult data it's a godsend: you can read and write it with
just about any recent environment using library functions with little
work.  Creating a full xml spec for a data set can be a significant task
for complex datasets, but once you've got it you can just about put your
feet up - well, on the usual round of import/export/semantic problems at
least.

XML is alleged to be human readable but it isn't really, human
"decodable" is more accurate.  It's hard to see how it could be human
readable, given it can handle extremely complex data.  There's a
trade-off between human readability and robustness.  If you happen to be
human, you need a good viewer that can fold up detail and search for
items of interest.  Another downside is that the equivalent xml file is
many times the size of a concise data format.  However, xml is very
amenable to zipping.  My main objection to xml in practice is that some
zealots have tried to use it as database rather than a storage and
transfer format.  Except for a few special applications, the decades of
development and optimisation of conventional databases produces a vastly
better result.

You might like CSV but it doesn't scale: you just can't put an
OpenOffice document in a CSV file, without going insane.  And if you
think about designing a format for OOO files you'll see why they chose
xml: All the low level formatting decisions are handled immediately by
xml.  You've still got the massive problem of designing the data
structure but you know the low level semantics aren't going to bite you.
Once you have the spec you can easily and independently test data for
compliance - which means that when you're trying to debug the actually
application you can have a lot of confidence in your test data set.  As
we've all heard, OfficeXML has some serious problems but they don't come
within a light year of the intractability of the old doc format.  

Jim




    


CONFIDENTIALITY NOTICE AND DISCLAIMER

The information in this transmission may be confidential and/or protected by legal professional privilege, and is intended only for the person or persons to whom it is addressed. If you are not such a person, you are warned that any disclosure, copying or dissemination of the information is unauthorised. If you have received the transmission in error, please immediately contact this office by telephone, fax or email, to inform us of the error and to enable arrangements to be made for the destruction of the transmission, or its return at our cost. No liability is accepted for any unauthorised use of the information contained in this transmission. If the transmission contains advice, the advice is based on instructions in relation to, and is provided to the addressee in connection with, the matter mentioned above. Responsibility is not accepted for reliance upon it by any other person or for any other purpose.




More information about the Link mailing list