Feedback

  • Contents
 

What is the relationship between XML and markup languages, such as HTML or SGML?

If you use the Internet, you probably know that HTML is the markup language used to create World Wide Web pages.  (HTML stands for Hypertext Markup Language.)  HTML and XML are both descendants of an earlier markup language called SGML (Standard Generalized Markup Language).   SGML is a complicated set of rules that define document structures. XML is a subset of SGML that does the same thing, using fewer rules.  Since XML is a less-complicated derivative of SGML, XML is more easily implemented on large networks such as the Internet.  The primary role of XML is to define data

XML delivers the power of SGML without the complexity.  XML does not utilize features that make the authoring difficult or costly.  Yet XML preserves most of the flexibility and richness associated with SGML.

Web browsers use a combined parsing and presentation engine that is tolerant of markup problems.  Sloppy markup in HTML pages is ignored or interpreted in a proprietary way.  For example, if a closing tag is omitted in an HTML document, the browser attempts to guess where the closing tag should have been.  If the browser encounters a tag or attribute that it does not recognize (such as a tag supported by a different brand of browser), the tag or element is ignored.

The loose, uncontrolled nature of HTML makes it impossible to predict exactly how a web page will be displayed.  Browsers attempt to render something on-screen, however odd, rather than display validation error messages. Since HTML is presentation-oriented, it uses markup tags for formatting as well as to define structure.  The complexity of HTML formatting can make it difficult to locate data in HTML documents. HTML was not originally designed to provide precise control over the layout of page elements.  To compensate, savvy page designers use tables, style sheets, and DHTML layers to control the placement of text and graphics.  This creates visually-appealing web pages at the expense of clear-cut document structures.  Complex web pages bury data in a mix of structures in the information stream.  The lack of structural consistency in HTML documents makes it difficult for computer programs to locate, extract or update data.  XML resolves this problem, by demanding that document authors get structure and syntax right.