XML
Syntax
The
Syntax rules of XML are very simple and very strict. The rules are
very easy to learn, and very easy to use.
Because
of this, creating software that can read and manipulate XML is very
easy to do.
An
example XML document
XML documents use a self-describing and simple
syntax.
<?xml
version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
The
first line in the document - the XML declaration - defines the XML
version of the document. In this case the document conforms to the
1.0 specification of XML.
The
next line describes the root element of the document (like it was
saying: "this document is a note"):
The
next 4 lines describe 4 child elements of the root (to, from,
heading, and body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
|
And
finally the last line defines the end of the root element:
Can you
detect from this example that the XML document contains a Note to
Tove from Jani?
All
XML elements must have a closing tag
With XML, it is illegal to omit the closing tag.
In
HTML some elements do not have to have a closing tag. The following
code is legal in HTML:
<p>This
is a paragraph
<p>This
is another paragraph
|
In
XML all elements must have a closing tag like this:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
|
XML
tags are case sensitive
Unlike HTML, XML tags are case sensitive.
With XML,
the tag <Letter> is different from the tag <letter>.
Opening
and closing tags must therefore be written with the same case:
<Message>This
is incorrect</message>
<message>This
is correct</message>
|
All
XML elements must be properly nested
Improper nesting of tags make no sense to XML.
In
HTML some elements can be improperly nested within each other like
this:
<b><i>This
text is bold and italic</b></i>
|
In XML all
elements must be properly nested within each other like this:
<b><i>This
text is bold and italic</i></b>
|
All
XML documents must have a root tag
The first tag in an XML document is the root tag.
All XML
documents must contain a single tag pair to define the root element.
All other elements must be nested within the root element. All
elements can have sub (children) elements. Sub elements must be
correctly nested within their parent element:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
|
Attribute
values must always be quoted
With XML, it is illegal to omit quotation marks
around attribute values.
XML elements can have attributes in name/value pairs just like in HTML.
In XML the attribute value must always be quoted. Study the two XML
documents below. The first one is incorrect, the second is correct:
<?xml
version="1.0"?>
<note
date=12/11/99>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
<?xml
version="1.0"?>
<note
date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
The
error in the first document is that the date attribute in the note
element is not quoted.
This
is correct: date="12/11/99". This is incorrect:
date=12/11/99.
With
XML, White Space is Conserved
With XML, the white space in your document is not
truncated.
This is
unlike HTML. With HTML, a sentence like this: Hello my name is Tove,
will be displayed like this: Hello my name is Tove, because HTML
strips off the white space.
With
XML, CR / LF is converted to LF
With XML, a new line is always stored as LF.
Have you
ever heard of a typewriter. Well, a typewriter is a type of
mechanical device they used in the previous century :-)
After
you have typed one line of text on a typewriter, you have to
manually return the printing carriage to the left margin position
and manually feed the paper up one line.
In
Windows applications, a new line in the text is normally stored as a
pair of CR LF (carriage return, line feed) characters. In Unix
applications, a new line is normally stored as a LF character. Some
applications use only a CR character to store a new line.
|