[LINK] XML isn't evil, just misunderstood

Roger Clarke Roger.Clarke at xamax.com.au
Sat Nov 8 10:46:37 AEDT 2008

Here I go bludging off link yet again.

By sheer good fortune, last night I happened to be doing the latter 
parts of the 'Markup Languages' segment of the course I run up here 
at Uni Hong Kong.  (That's how I segue across from HTML and XML DTDs, 
into client-side and server-side processing capabilities, then onto 
Web Services/SOA and Web 2.0).

After I'd done the XML Schema intro and examples, I relayed the 
essence of yesterday's thread (which I found valuable, and 
sympathised with), in order to make the point that it was powerful 
tool, but that using it wasn't a picnic.

Now I'd like to post to the candidates' Discussion Board an edited 
version of your post this morning Craig (i.e. selected segments of 
it).  Is that okay with you?

The lead-in I give them is consistent with what you said.  The 
difference is that I'm relying on decades-out-of-date actual 
expertise plus reading and interpolation (and teacher's licence). 
Your description will speak to them much more directly.

Thanks in advance!  (And ongoing thanks to the Link Institute for 
enabling me to seem to some people to know more than I actually do).

At 10:30 AM +1100 8/11/08, Craig Sanders wrote:
>On Fri, Nov 07, 2008 at 08:50:40PM +1100, Richard Chirgwin wrote:
>>  I've tried to see things in the light of standardisation, extensibility
>>  and power, and I can't. XML is evil.
>actually, XML is a useful way of transmitting both the data and the
>meta-data describing that data.  The metadata not only gives the name of
>each field, it also describes the data-type of each field (e.g. integer
>or floating point number, string, etc), whether it is an optional or
>required field, whether it is unique per record (e.g. an ID field) or
>can have multiple instances (e.g. a list), and so on.
>these details are necessary because data often doesn't fit neatly into
>simple flat files like CSV - at least, not without either loss of detail
>(i.e. the data-file equivalent of lossy compression) or space and
>bandwidth wasting repetition, or both.
>it is true XML is often misused, and used inappropriately, but the same
>can be said of any technology.
>the most common misuse of XML is to regard it as a data storage protocol(*)
>when it should, for the most part, be seen as purely a data TRANSFER
>protocol - a convenient and documented way of moving data from one
>system to another without loss of descriptive meta-data.
>(*) e.g. databases with built-in XML storage engines to satisfy buzzword
>compliance.  Oracle is one of several offenders here.
>>  Just two examples.
>>  If I buy the ABS CDs, I get hundreds of CSV files, which with a little
>>  script that took an hour to work out, once, I can import into a database
>>  in very little time at all. The data set is huge, but everything works
>>  easy-as.
>scripting is also fairly straight-forward with XML data files, with most
>scripting languages containing libraries/modules for working with XML.
>with the added bonus that you get to work with particular fields by
>*NAME* rather than by field *NUMBER* (which screws up if the field order
>changes or if any fields are added or deleted to the file format)
>e.g. out of the dozens of perl modules (some specialised, some generic)
>for working with XML files, my two favourites for Q&D scripting are:
>XML::Simple     - Easy API to maintain XML (esp config files)
>XML::Mini       - Perl implementation of the XML::Mini XML create/parse ...
>>  Openstreetmap.org allows you to export maps. In X-M-damn-L. You need
>>  a parser to do anything with the data, of which there are several,
>>  none of which work properly. You cannot, without first studying XML
>>  and poring over the schema, do anything off your own bat with the
>>  data. You can't load the data into a database without a parser, which
>>  won't work properly. You can't put the data into a GIS without a
>>  parser, which won't work properly.
>there are numerous tools to convert XML data to flat-files like
>CSV or other formats - for example xml2[1].  They're relatively
>straight-forward and simple, and they're even easy to write *BECAUSE*
>the structure of the data is well-defined in an XML document, so there's
>no need to *guess* what any given field is.
>as for poring over the schema, at least there *IS* a schema to pore
>over. i've wasted many days out of my life trying to figure out what
>some undocumented field in the middle of a CSV file is for or, worse,
>trying to figure out exactly which fields in the CSV file contain the
>data elements i'm interested in extracting and reporting on (it's not
>always obvious, especially if you have CSV lines with dozens or even
>hundreds of fields all with similar looking data. e.g. a line with 20 or
>30 numeric fields, only 2 or 3 are of interest to your current needs).
>some CSV files helpfully have the field names in a comment as the first
>line of the file. not all of them. it's very useful, but it doesn't
>solve all CSV annoyances.
>BTW, don't get me wrong. i'm not saying that CSV sucks and XML should
>be used for everything.  I'm saying that XML, like CSV, has its uses as
>well as its annoyances and limitations.  There are some kinds of data
>that fit perfectly in CSV-style one-line-per-record flat files.  And
>there are other kinds of data that just don't, which are better suited
>to a hierarchical structured data format like XML.
>i use both routinely, and it's really not that difficult to convert from
>one to the other as needed by your task at hand.
>where XML particularly shines is that it gives you the ability to say
>"ah, just give me a dump of everything in XML format and i'll extract
>what i need from that" instead of having to laboriously identify and
>list exactly which fields you want and ask for just them, only to
>find that you forgot one or more fields (or didn't know about - in my
>experience, you often don't know what fields you need until AFTER you've
>seen the data and if you only see a CSV-dump subset you'll never know
>that the field you really want is available).
>[1] xml2 - XML/Unix Processing Tools
>from the man page for 'xml2':
>        xml2 - convert xml documents in a flat format
>        2xml - convert flat format into xml
>        html2 - convert html documents in a flat format
>        2html - convert flat format into html
>        csv2 - convert csv files in a flat format
>        2csv - convert flat format into csv
>        <xml|2xml|html2|2html|csv2|2csv> > outfile < infile
>        There are six tools.  None of them take any command-line
>        arguments.  They are all simple filters which can be used to
>        read files from standard input in one format and output it to
>        standard output in another format.
>        The flat format used by the tools is specific to these tools.
>        It is a syntax for representing structured markup in a way
>        that makes it easy to process with line-oriented tools.  The
>        same format is used for HTML and XML; in fact, you can think
>        of html2 as converting HTML to XHTML and running xml2 on the
>        result; likewise 2html and 2xml.  (Of course, this isn't how the
>        implementation works.)
>similarly, there are tools for analysing an XML file (with or without
>the associated DTD) and giving you a compact summary of what kinds of
>records are in the file and what fields are available.
>IMO, the self-documenting nature of XML more than makes up for the
>slight extra hassle of parsing/extracting from it, especially when that
>extra hassle is mostly handled automatically by existing tools and
>craig sanders <cas at taz.net.au>
>Link mailing list
>Link at mailman.anu.edu.au

Roger Clarke                  http://www.anu.edu.au/people/Roger.Clarke/

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA
                    Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au                http://www.xamax.com.au/

Visiting Professor in Info Science & Eng  Australian National University
Visiting Professor in the eCommerce Program      University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre      Uni of NSW

More information about the Link mailing list