The Rise of XML In Web Development Industry

In 1996, at the height of the browser wars, the W3C had also begun working on the eXtensible Markup Language (XML), a separate project from HTML, but one whose background is important to understanding XHTML, a combination of XML and HTML, and more importantly, the subject of this book. When the XML project began, the aforementioned SGML, a complex method of structuring text for later processing, was being used primarily for very large projects— those involving millions of pages of documents. Recall that HTML at this time was a severely limited way of formatting documents for transport over the Internet and for display via a Web browser. The W3C’s objective in developing XML was to create a markup language that had the power, but not the complexity, of SGML.

Defining XML

The following excerpt from “XML in 10 points” (located at XML-in-10-points) defines the parameters of XML:

XML is a method for putting structured data in a text file. For “structured data,” think of such things as spreadsheets, address books, configuration param- eters, financial transactions, technical drawings, etc. Programs that produce such data often also store it on disk, for which they can use either a binary format or a text format. The latter allows you, if necessary, to look at the data without the pro- gram that produced it. XML is a set of rules, guidelines, conventions, whatever you want to call them, for designing text formats for such data, in a way that pro- duces files that are easy to generate and read (by a computer), that are unam- biguous, and that avoid common pitfalls, such as lack of extensibility, lack of support for internationalization/localization, and platform-dependency. 

XMLlooks a bit like HTMLbut isn’t HTML. Like HTML, XMLmakes use of tags (words bracketed by ‘<’ and ‘>’) and attributes (of the form name=”value”), but while HTMLspecifies what each tag & attribute means (and often how the text between them will look in a browser), XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see “<p>” in an XMLfile, don’t assume it is a para- graph. Depending on the context, it may be a price, a parameter, a person, a p... (b.t.w., who says it has to be a word with a “p”?) XML is text, but isn’t meant to be read. XML files are text files, as I said above, but even less than HTML are they meant to be read by humans. They are text files, because that allows experts (such as programmers) to more easily debug applications; and in emergencies, they can use a simple text editor to fix a broken XML file. But the rules for XML files are much stricter than for HTML. A forgotten tag or a an attribute without quotes makes the file unusable, while in HTML such practice is often explicitly allowed, or at least tolerated. It is written in the official XML specification: applications are not allowed to try to second- guess the creator of a broken XML file; if the file is broken, an application has to stop right there and issue an error. XML is a family of technologies. There is XML 1.0, the specification that defines what “tags” and “attributes” are, but around XML1.0, there is a growing set of optional modules that provide sets of tags & attributes, or guidelines for specific tasks. There is, e.g., Xlink (still in development as of November 1999), which describes a standard way to add hyperlinks to an XMLfile. XPointer & XFragments (also still being developed) are syntaxes for pointing to parts of an XMLdocument. (An Xpointer is a bit like a URL, but instead of pointing to documents on the Web, Setting the Stage 9 it points to pieces of data inside an XMLfile.) 

CSS, the style sheet language, is appli- cable to XML as it is to HTML. XSL (autumn 1999) is the advanced language for expressing style sheets. It is based on XSLT, a transformation language that is often useful outside XSLas well, for rearranging, adding or deleting tags & attributes. The DOM is a standard set of function calls for manipulating XML (and HTML) files from a programming language. XML Namespaces is a specification that describes how you can associate a URL with every single tag and attribute in an XML docu- ment. What that URLis used for is up to the application that reads the URL, though. (RDF, W3C’s standard for metadata, uses it to link every piece of metadata to a file defining the type of that data.) XML Schemas 1 and 2 help developers to precisely define their own XML-based formats. There are several more modules and tools available or under development. Keep an eye on W3C’s technical reports page. XML is verbose, but that is not a problem. Since XML is a text format, and it uses tags to delimit the data, XML files are nearly always larger than comparable binary formats. That was a conscious decision by the XML developers. 

The advan- tages of a text format are evident, and the disadvantages can usually be compensated at a different level. Disk space isn’t as expensive anymore as it used to be, and pro- grams like zip and gzip can compress files very well and very fast. Those programs are available for nearly all platforms (and are usually free). In addition, communica- tion protocols such as modem protocols and HTTP/1.1 (the core protocol of the Web) can compress data on the fly, thus saving bandwidth as effectively as a binary format.

No comments:

Post a Comment