Feedback

  • Contents
 

Structure of an XML file

An XML file is just a structured text file.  The best way to understand XML is to look at example files.  Listing 1 below contains three records from a movie database.  Each record contains two fields: the title of a movie, and its genre.

The example file is formatted using blank lines, tabs and white space that make the file easier to read.  In practice, those items are ignored by XML parsers.  Likewise, bold text and line numbers in the listing are for illustration purposes only.  Actual XML files do not contain line numbers.

Listing 1: Sample XML File

1  <?xml version="1.0"?>
2  <movies>
3    <movie>
4       <title>The Ghost and Mr. Chicken</title>
5       <genre>Comedy</genre>
6    </movie>
7    <movie>
8       <title>Gone with the Wind</title>
9       <genre>Drama</genre>
10   </movie>
11   <movie>
12      <title>ThunderBall</title>
13      <genre>Adventure</genre>
14   </movie>
15 </movies>

XML Declaration

Line 1 contains a processing instruction known as the XML declaration.  This statement tells parsers that the file contains XML.  The remainder of the file is composed of XML elements.  Each element consists of a start tag and an end tag.  XML data is just information that appears between tags.

The terms tag and element are often used interchangeably.  A tag is an identifier that defines something.  An element is an instance of a set of tags.  In our example, <title> is a tag, and <title>Gone with the Wind</title> is an element.  Elements are the basic building blocks of HTML files.  Elements can be nested inside of other elements.

Rules that govern tags

Tags are governed by a few basic rules:

  • Tag names are case-sensitive.  <movie>, <Movie>, and <MOVIE> are not equivalent.  Attribute names are also case-sensitive.

  • Tag names must begin with an alphabetic character, an underscore, or a colon. 

  • Tag and attribute names cannot begin with "xml", which is reserved.

  • All tags must be closed.  A start tag must be closed by a corresponding end tag.  Empty elements with no attributes can use a backslash as a shortcut for the end tag (e.g. <movie/> is equivalent to <movie></movie>.

The Root Element

Line 2 defines the root element.  Since an XML document is a tree of elements, each document has a single root element that denotes the beginning and end of the XML statements in the file.  In the example, the root element begins with a start tag <movies> and is closed by an end tag </movies>.  All other elements are nested inside the root element. 

Child Elements

Line 3 identifies <movie> as a child of the <movies> root element.  Parent-child relationships are common in XML files.  Parent elements can have many children.  All elements must be properly closed, meaning that each element has a start tag and an end tag.  Likewise, tags must be balanced.  The close tag of a child cannot appear after the close tag of its parent. For example:

<title>ThunderBall<genre>Adventure</title></genre> is incorrect.
<title>ThunderBall<genre>Adventure</genre></title> is correct.

Line 4 contains some data (the title of a movie) between tags that identify the data.

Line 5 contains a different data item. In this case, it is a movie category between genre tags.

Line 6 closes this movie element.

This basic structure is repeated in lines 7 through 14, which define two more records.

Line 15 contains the closing tag for the root element.