XML, Not Your Mother's Markup Language

Dateline: 8/26/98

XML the eXensible Markup Language is getting serious talk these days on the VRML mailing list so it seems like a good idea to give a brief (and the key word is brief) description of some of XML's features. XML is being developed and used by the W3C (World Wide Web Consortium) as a next generation markup language. But first, a little background and more, beat still my heart, acronyms.

The mother of all markup languages is SGML the Standard Generalized Markup Language ISO 8879. Certainly the most comprehensive collection of SGML and XML information is maintained by Robin Cover, who has been maintaining a extensive bibliography for many years. It has evolved now to "The SGML/XML Web Page" originally with support by SoftQuad, Inc. and by the Summer Institute of Linguistics now it is supported by OASIS the Organization for the Advancement of Structured Information Standards. No I am not making this up :-)

In oversimplified terms XML is really a "lite" version of SGML. Or as some in the SGML community call it "Stealth SGML". What's the big deal you ask? Well XML (and SGML) let you define document structures. The structure of a document is for example fact that a book has a Forward, Table of Contents, Chapters, Afterward.
Within Chapters are the ChapterTitle, Heading, Paragraphs. These names are formalized into a structural definition called a Document Type Definition (DTD). HTML is related to SGML because there now exists an HTML DTD. Originally this was not true but that's long past history. In formal terms HTML and XML are applications of SGML. In practical terms XML lets you define your own tags. Once XML is widely deployed (of course the spec has to actually get finished) web browsers should be able to read DTDs and "understand", your markup. So if you want to define a markup language for cooking recipies, fine, If you want a markup language for cyberspace objects, that's fine also. XML lets you define meta-data. Descriptions of data. In fact the W3C is using XML for all sorts of things. PICS the content rating specification, SMIL the streamed multimedia specification and others are specifications being defined with XML. It is the extensibility that makes XML so powerful.

The other "BIG" thing about XML is the much richer linking structure it supports. No longer will you be limited to simply uni-directional links. You will be able to specify targets (paraphrasing here) "go to the 4th word of the 5th paragraph in the 3rd chapter". Links can also be bidirectional so you can find out who pointed at you. Most of the XML linking concepts were taken from the Text Encoding Initiative (TEI) an amazing collection of DTDs developed by humanities scholors for the markup of literature of all types. So before any VRML group goes on to yet another definition of content I'd sure suggest checking this stuff out!

It seems clear that it is possible to markup VRML with XML. The question is would you want to? Is this the right approach? The answer is, not clear, but probably not. VRML as a computer graphics file format does have some characteristics that make it inherently different from text. Compression and streaming to name a couple. Some level of integration with XML however does make sense. Rather than lots of ad-hoc textual field getting shoved into InfoNodes and other fields putting XML into VRML might make sense. It would be parsable and there already exists a whole bunch of free XML software.

Have fun reading about XML!

Previous Features