XML Elements

XML Elements are extensible and they have relationships.

XML Elements have simple naming rules.

XML Elements are Extensible  

XML documents can be extended to carry more information.

Look at the following XML NOTE example:

<note>

<to>Tove</to>

<from>Jani</from>

<body>Don't forget me this weekend!</body>

</note>

Let's imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce this output:

MESSAGE

To: Tove
From: Jani

Don't forget me this weekend!

Imagine that the author of the XML document added some extra information to it:

<note>

<date>1999-08-01</date>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Should the application break or crash?

No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.

XML documents are Extensible.

XML Elements have Relationships

Elements are related as parents and children.

To understand XML terminology, you have to know how relationships between XML elements are named, and how element content is described.

Imagine that this is a description of a book:

Book Title: My First XML

           Book Title: My first  XML

Chapter 1: Introduction to XML

·        What is HTML

·        What is XML

 

Chapter 2: XML Syntax

·        Elements must have a closing tag

·        Elements must be correctly nested

Imagine that this XML document describes the book:

                <book>

                <title>My First XML</title>

                <prod id="33-657" media="paper"></prod>

                <chapter>Introduction to XML

                <para>What is HTML</para>

                <para>What is XML</para>

                </chapter>

                <chapter>XML Syntax

                <para>Elements must have a closing tag</para>

                <para>Elements must be properly nested</para>

                </chapter>

                </book>

                Book is the root element. Title and chapter are child elements of book. Book is the parent element of both title and chapter. Title and vchapter are siblings (or sister elements) because they have the same parent.

Elements have Content

Elements can have different content types.

                An XML element is everything from (including) the element's start  to (including) the element's end tag.

                An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.

                In the example above, book has element content, because it contains other elements. Chapter has mixed content because it contains both text and other elements. Para has simple content (or text content) because it contains only text. Prod has empty content, because it carries no information.

                In the example above only the prod element has attributes. The attribute named id has the value "33-657". The attribute named media has the value "paper".

 

Element Naming

                XML elements must follow these naming rules:

·        Names can contain letters, numbers, and other characters

·        Names must not start with a number or "_" (underscore)

·        Names must not start with the letters xml (or XML or Xml ..)

·        Names can not contain spaces

                Take care when you "invent" elements names and follow these simple rules:

                Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice.

                Examples: <first_name>, <last_name>.

                Avoid "-" and "." in names. It could be a mess if your software tried to subtract name from first (first-name) or think that "name" is a property of the object "first" (first.name).

                Element names can be as long as you like, but don't exaggerate.  Names should be short and simple, like this: <book_title> not like this:

                <the_title_of_the_book>.

                XML documents often have a parallel database, where fieldnames parallel with element names. A good rule is to use the naming rules of your databases.

                Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support it.

 The ":" should not be used in element names because it is reserved to be used for something called namespaces (more later).

 

XML Attributes

XML elements can have attributes in the start tag, just like HTML.

Attributes are used to provide additional information about elements.

 

XML Attributes

XML elements can have attributes.

From HTML you will remember this: <IMG SRC="computer.gif">. The SRC attribute provides additional information about the IMG element.

In HTML (and in XML) attributes provide additional information about elements:

<img src="computer.gif">

<a href="demo.asp">

Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:

<file type="gif">computer.gif</file>

Use of Elements vs. Attributes

Data can be stored in elements or in attributes.

Take a look at these examples:

<person sex="female">

  <firstname>Anna</firstname>

  <lastname>Smith</lastname>

</person>

 

<person>

  <sex>female</sex>

  <firstname>Anna</firstname>

  <lastname>Smith</lastname>

</person>

In the first example sex is an attribute. In the last, sex is an element. Both examples provides the same information.

There are no rules about when to use attributes, and when to use elements. My experience is however; that attributes are handy in HTML, but in XML you should try to avoid them. Use elements if the information feels like data.  


My Favorite Way

I like to store data in elements.

The following three XML documents contain exactly the same information:

A date attribute is used in the first example:

<note date="12/11/99">

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

A date element is used in the second example:

<note>

<date>12/11/99</date>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

An expanded date element is used in the third: (THIS IS MY FAVORITE):

<note>

<date>

  <day>12</day>

  <month>11</month>

  <year>99</year>

</date>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Avoid using attributes?

Should you avoid using attributes?

Here are some of the problems using attributes:

·        attributes can not contain multiple values (elements can)

·        attributes are not easily expandable (for future changes)

·        attributes can not describe structures (child elements can)

·        attributes are more difficult to manipulate by program code

·        attribute values are not easy to test against a DTD

If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data.

Don't end up like this ( if you think this looks like XML, you have not understood the point):

<note day="12" month="11" year="99"

to="Tove" from="Jani" heading="Reminder"

body="Don't forget me this weekend!">

</note>

An Exception to my Attribute rule

Rules always have exceptions.

My rule about attributes has one too:

Sometimes I assign ID references to elements. These ID references can be used to access XML elements in much the same way as the NAME or ID attributes in HTML. This example demonstrates this:

<messages>

  <note ID="501">

    <to>Tove</to>

    <from>Jani</from>

    <heading>Reminder</heading>

    <body>Don't forget me this weekend!</body>

  </note>

 

  <note ID="502">

    <to>Jani</to>

    <from>Tove</from>

    <heading>Re: Reminder</heading>

    <body>I will not!</body>

  </note>

</messages>

The ID in these examples is just a counter, or a unique identifier, to identify the different notes in the XML file, and not a part of the note data.


XML Validation

XML with correct syntax is Well Formed XML.

XML validated against a DTD is Valid XML.  


"Well Formed" XML documents

A "Well Formed" XML document has correct XML syntax.

A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous chapters:

<?xml version="1.0"?>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

"Valid" XML documents

A "Valid" XML document also conforms to a DTD.

A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD):

<?xml version="1.0"?>

<!DOCTYPE note SYSTEM "InternalNote.dtd">

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

 

Creating and Diplaying Your First XML Document 

Creaing an XML Document

Since XML is written in plain text you can use your favorite editor.

Write the following code

<?xml version="1.0"?>        //xml declaration

 

<!-- File Name: Inventory.xml -->     //comment

 

<INVENTORY>

   <BOOK>

      <TITLE>The Adventures of Huckleberry Finn</TITLE>

      <AUTHOR>Mark Twain</AUTHOR>

      <BINDING>mass market paperback</BINDING>

      <PAGES>298</PAGES>

      <PRICE>$5.49</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>Leaves of Grass</TITLE>

      <AUTHOR>Walt Whitman</AUTHOR>

      <BINDING>hardcover</BINDING>

      <PAGES>462</PAGES>

      <PRICE>$7.75</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>The Legend of Sleepy Hollow</TITLE>

      <AUTHOR>Washington Irving</AUTHOR>

      <BINDING>mass market paperback</BINDING>

      <PAGES>98</PAGES>

      <PRICE>$2.95</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>The Marble Faun</TITLE>

      <AUTHOR>Nathaniel Hawthorne</AUTHOR>

      <BINDING>trade paperback</BINDING>

      <PAGES>473</PAGES>

      <PRICE>$10.95</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>Moby-Dick</TITLE>

      <AUTHOR>Herman Melville</AUTHOR>

      <BINDING>hardcover</BINDING>

      <PAGES>724</PAGES>

      <PRICE>$9.95</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>The Portrait of a Lady</TITLE>

      <AUTHOR>Henry James</AUTHOR>

      <BINDING>mass market paperback</BINDING>

      <PAGES>256</PAGES>

      <PRICE>$4.95</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>The Scarlet Letter</TITLE>

      <AUTHOR>Nathaniel Hawthorne</AUTHOR>

      <BINDING>trade paperback</BINDING>

      <PAGES>253</PAGES>

      <PRICE>$4.25</PRICE>

   </BOOK>

   <BOOK>

      <TITLE>The Turn of the Screw</TITLE>

      <AUTHOR>Henry James</AUTHOR>

      <BINDING>trade paperback</BINDING>

      <PAGES>384</PAGES>

      <PRICE>$3.35</PRICE>

   </BOOK>

</INVENTORY>

 

 

The Anatomy of an XML Document

An XML Document has two parts: the prolog and the document element(root element).

The XML declaration is optional eventhough the XML specification states that it should be included. It should alwways appear in the beginning of the document. The second line is a white space. You can include any lines of white space for readability the XML processor ignores it.

The third line is a comment. You can type any text as a comment except --.

The prolog can also contain

<    The document type declaration (DTP) which defines the type and the structure of the document.(It always comes after the XML declaration  

  One or more processing instructions which passes information that the XML processor passes to the application.

NOTE: The XML processor is the software module that reads the XML document and provides access  to the document contents. It provides this access to a software module called application which manipulates and displays the document’s contents.

 

The document element

In an XML document the elements indicate the logical structure of the document and contain the document’s information content. A typical elements consists of a start tag, the element content and the end tag. The element content can be character data, other nested elements or a combination of both.

The name that appears at the beginning of the start-tag and end-tag is known as the elemtns-type.

 

Some Basic XML Rules

The following area the rules to create a well-formed XML document, one that conforms to the minimal set of rules that allow the document to be processed by a broweser or other program.

<    The document must have exactly one top-level element(root)

<    Elements must be properly nested

<    Each element must have a start and ending tagThe element type name in start tag must be exactly the same as the end tag

<    Element types are case sensitive