XML
Elements
XML
Elements are extensible and they have relationships.
XML
Elements have simple naming rules.
XML
Elements are Extensible
XML documents can be extended to carry more
information.
Look
at the following XML NOTE example:
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't
forget me this weekend!</body>
</note>
|
Let's
imagine that we created an application that extracted the
<to>, <from>, and <body> elements from the XML
document to produce this output:
MESSAGE
To: Tove
From: Jani
Don't
forget me this weekend!
|
Imagine
that the author of the XML document added some extra information to
it:
<note>
<date>1999-08-01</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
Should
the application break or crash?
No.
The application should still be able to find the <to>,
<from>, and <body> elements in the XML document and
produce the same output.
XML documents are Extensible.
XML Elements have Relationships
Elements are related as parents and children.
To
understand XML terminology, you have to know how relationships
between XML elements are named, and how element content is
described.
Imagine
that this is a description of a book:
Book
Title: My First XML
Book Title: My first
XML
Chapter
1: Introduction to XML
·
What is HTML
·
What is XML
Chapter
2: XML Syntax
·
Elements must have a closing tag
·
Elements must be correctly nested
Imagine
that this XML document describes the book:
<book>
<title>My First XML</title>
<prod id="33-657"
media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
Book is the root
element. Title and chapter are child
elements of book. Book is the parent element of both title and
chapter. Title and vchapter are siblings (or sister elements)
because they have the same parent.
Elements
have Content
Elements can have different content types.
An XML element is everything from (including) the element's
start to (including)
the element's end tag.
An element can have element content, mixed content, simple
content, or empty content. An element can also have attributes.
In the example above, book has element content, because it
contains other elements. Chapter has mixed content because it
contains both text and other elements. Para has simple content (or
text content) because it contains only text. Prod has empty content,
because it carries no information.
In the example above only the
prod element has attributes. The attribute named id has the value
"33-657". The attribute named media has the value
"paper".
Element
Naming
XML elements must follow these naming rules:
·
Names can contain letters, numbers, and other characters
·
Names must not start with a number or "_" (underscore)
·
Names must not start with the letters xml (or XML or Xml ..)
·
Names can not contain spaces
Take care when you "invent" elements names and
follow these simple rules:
Any name can be used, no words are reserved, but the idea is
to make names descriptive. Names with an underscore separator are
nice.
Examples: <first_name>, <last_name>.
Avoid "-" and "." in names. It could be a
mess if your software tried to subtract name from first (first-name)
or think that "name" is a property of the object
"first" (first.name).
Element names can be as long as you like, but don't
exaggerate. Names
should be short and simple, like this: <book_title> not like
this:
<the_title_of_the_book>.
XML documents often have a parallel database, where
fieldnames parallel with element names. A good rule is to use the
naming rules of your databases.
Non-English letters like éòá are perfectly legal in XML
element names, but watch out for problems if your software vendor
doesn't support it.
The ":" should not be used in element names because
it is reserved to be used for something called namespaces (more
later).
XML
Attributes
XML
elements can have attributes in the start tag, just like HTML.
Attributes
are used to provide additional information about elements.
XML
Attributes
XML
elements can have attributes.
From HTML
you will remember this: <IMG SRC="computer.gif">.
The SRC attribute provides additional information about the IMG
element.
In
HTML (and in XML) attributes provide additional information about
elements:
<img
src="computer.gif">
<a
href="demo.asp">
|
Attributes
often provide information that is not a part of the data. In the
example below, the file type is irrelevant to the data, but
important to the software that wants to manipulate the element:
<file
type="gif">computer.gif</file>
|
Use
of Elements vs. Attributes
Data
can be stored in elements or in attributes.
Take
a look at these examples:
<person
sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
|
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
|
In
the first example sex is an attribute. In the last, sex is an
element. Both examples provides the same information.
There
are no rules about when to use attributes, and when to use elements.
My experience is however; that attributes are handy in HTML, but in
XML you should try to avoid them. Use elements if the information
feels like data.
My
Favorite Way
I
like to store data in elements.
The
following three XML documents contain exactly the same information:
A date
attribute is used in the first example:
<note
date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
A
date element is used in the second example:
<note>
<date>12/11/99</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
An
expanded date element is used in the third: (THIS IS MY FAVORITE):
<note>
<date>
<day>12</day>
<month>11</month>
<year>99</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
Avoid
using attributes?
Should
you avoid using attributes?
Here
are some of the problems using attributes:
·
attributes can not contain multiple values (elements can)
·
attributes are not easily expandable (for future changes)
·
attributes can not describe structures (child elements can)
·
attributes are more difficult to manipulate by program code
·
attribute values are not easy to test against a DTD
If
you use attributes as containers for data, you end up with documents
that are difficult to read and maintain. Try to use elements
to describe data. Use attributes only to provide information that is
not relevant to the data.
Don't
end up like this ( if you think this looks like XML, you have not
understood the point):
<note
day="12" month="11" year="99"
to="Tove"
from="Jani" heading="Reminder"
body="Don't
forget me this weekend!">
</note>
|
An
Exception to my Attribute rule
Rules
always have exceptions.
My
rule about attributes has one too:
Sometimes
I assign ID references to elements. These ID references can be used
to access XML elements in much the same way as the NAME or ID
attributes in HTML. This example demonstrates this:
<messages>
<note ID="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note ID="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>
|
The ID in
these examples is just a counter, or a unique identifier, to
identify the different notes in the XML file, and not a part of the
note data.
XML
Validation
XML
with correct syntax is Well Formed XML.
XML
validated against a DTD is Valid XML.
"Well
Formed" XML documents
A "Well Formed" XML document has correct
XML syntax.
A
"Well Formed" XML document is a document that conforms to
the XML syntax rules that were described in the previous chapters:
<?xml
version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
"Valid"
XML documents
A "Valid" XML document also conforms to a
DTD.
A
"Valid" XML document is a "Well Formed" XML
document, which also conforms to the rules of a Document Type
Definition (DTD):
<?xml
version="1.0"?>
<!DOCTYPE
note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't
forget me this weekend!</body>
</note>
|
Creating
and Diplaying Your First XML Document
Creaing
an XML Document
Since
XML is written in plain text you can use your favorite editor.
Write
the following code
<?xml
version="1.0"?>
//xml declaration
<!--
File Name: Inventory.xml -->
//comment
<INVENTORY>
<BOOK>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
<BOOK>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>98</PAGES>
<PRICE>$2.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel
Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>473</PAGES>
<PRICE>$10.95</PRICE>
</BOOK>
<BOOK>
<TITLE>Moby-Dick</TITLE>
<AUTHOR>Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>724</PAGES>
<PRICE>$9.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Portrait of a Lady</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>256</PAGES>
<PRICE>$4.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Scarlet Letter</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>253</PAGES>
<PRICE>$4.25</PRICE>
</BOOK>
<BOOK>
<TITLE>The Turn of the Screw</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>384</PAGES>
<PRICE>$3.35</PRICE>
</BOOK>
</INVENTORY>
The
Anatomy of an XML Document
An
XML Document has two parts: the prolog
and the document element(root
element).
The
XML declaration is optional eventhough the XML specification states
that it should be included. It should alwways appear in the
beginning of the document. The second line is a white space. You can
include any lines of white space for readability the XML processor
ignores it.
The
third line is a comment. You can type any text as a comment except --.
The prolog can also contain
<
The document type declaration (DTP)
which defines the type and the structure of the document.(It always
comes after the XML declaration
One or more processing instructions which passes
information that the XML processor passes to the application.
NOTE:
The
XML processor is the software module that reads the XML document and
provides access to the
document contents. It provides this access to a software module
called application which manipulates and displays the document’s
contents.
The
document element
In an XML
document the elements indicate the logical structure of the document
and contain the document’s information content. A typical elements
consists of a start tag, the element content and the end tag. The
element content can be character data, other nested elements or a
combination of both.
The name
that appears at the beginning of the start-tag and end-tag is known
as the elemtns-type.
Some
Basic XML Rules
The
following area the rules to create a well-formed XML document, one
that conforms to the minimal set of rules that allow the document to
be processed by a broweser or other program.
<
The
document must have exactly one top-level element(root)
<
Elements
must be properly nested
<
Each
element must have a start and ending tagThe element type name in
start tag must be exactly the same as the end tag
<
Element
types are case sensitive
|