[XML Attributes][Well Formed XML][DTD/CDATA]

XML Validation


XML with correct syntax is Well Formed XML.
XML validated against a DTD is Valid XML.

"Well Formed" XML documents
A "Well Formed" XML document has correct XML syntax.
A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous chapters:
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>


"Valid" XML documents
A "Valid" XML document also conforms to a DTD.
A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD):

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Creating and Diplaying Your First XML Document
Creaing an XML Document

Since XML is written in plain text you can use your favorite editor.
Write the following code

<?xml version="1.0"?> //xml declaration

Prolog
<!-- File Name: Inventory.xml --> //comment
<INVENTORY>
<BOOK>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
<BOOK>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>98</PAGES>
<PRICE>$2.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>473</PAGES>
<PRICE>$10.95</PRICE>
</BOOK>
<BOOK>
<TITLE>Moby-Dick</TITLE>
<AUTHOR>Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>724</PAGES>
<PRICE>$9.95</PRICE>
</BOOK><BOOK>
<TITLE>The Portrait of a Lady</TITLE><AUTHOR>Henry James</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>256</PAGES>
<PRICE>$4.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Scarlet Letter</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>253</PAGES>
<PRICE>$4.25</PRICE>
</BOOK>
<BOOK>
<TITLE>The Turn of the Screw</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>384</PAGES>
<PRICE>$3.35</PRICE>
</BOOK>
</INVENTORY>

The Anatomy of an XML Document

An XML Document has two parts: the prolog and the document element(root element).

The XML declaration is optional eventhough the XML specification states that it should be included. It should alwways appear in the beginning of the document. The second line is a white space. You can include any lines of white space for readability the XML processor ignores it.

The third line is a comment. You can type any text as a comment except --.

The prolog can also contain

  • The document type declaration (DTP) which defines the type and the structure of the document.(It always comes after the XML declaration
  • One or more processing instructionswhich passes information that the XML processor passes to the application.

NOTE: The XML processor is the software module that reads the XML document and provides access to the document contents. It provides this access to a software module called application which manipulates and displays the document’s contents.

The document element

In an XML document the elements indicate the logical structure of the document and contain the document’s information content. A typical elements consists of a start tag, the element content and the end tag. The element content can be character data, other nested elements or a combination of both.

The name that appears at the beginning of the start-tag and end-tag is known as the elemtns-type.

Some Basic XML Rules

The following area the rules to create a well-formed XML document, one that conforms to the minimal set of rules that allow the document to be processed by a broweser or other program.

  • The document must have exactly one top-level element(root)
  • Elements must be properly nested
  • Each element must have a start and ending tagThe element type name in start tag must be exactly the same as the end tag
  • Element types are case sensitive

Catch XML errors in IE5

Before ie displays your page it first catches errors. If it cactches an error it will first display an error before attempting to display the page.

To see that change the ending tag to </Title>

XML Technologies


This chapter contains a list of technologies that are important to the understanding and development of XML applications.


CSS - Cascading Style Sheets

CSS style sheets can be added to XML document to provide display information.

XSL - eXstensible Style sheet Language

XSL is far more powerful than CSS. It can be used to transform XML files into many different output formats.

DTD - Document Type Definition

A DTD can used to define the legal building blocks of an XML document.

XML Schemas

Schemas are powerful alternatives to DTDs. Schemas are written in XML.

DOM - Document Object Model

The DOM defines interfaces, properties and methods to manipulate XML documents.

SAX - Simple API for XML

SAX is another interface to read and manipulate XML documents.


Creating Well-Formed XML Documents


A well-formed document is one that meets the minimal set of criteria for a conforming XML document.

A valid XML document is one that is well formed and also conforms to a more rigid set of rules.

When creating a valid document you must fully define the structure of the document in a document type declaration in the document’s prolog.

The parts of a Well Formed XML Document


A well-formed XML contains the prolog and the document element that can include comments, processing instructions and white space

<?xml version='1.0' standalone='yes' ?> //XML Declaration
<!-- File Name: Parts.xml --> //Comment
<?xml-stylesheet type="text/css" href="Inventory01.css"?> //Processing Instruction
<INVENTORY>
<BOOK>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>473</PAGES>
<PRICE>$10.95</PRICE>
</BOOK>
<BOOK>
<TITLE>Moby-Dick</TITLE>
<AUTHOR>Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>724</PAGES>
<PRICE>$9.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Turn of the Screw</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>384</PAGES>
<PRICE>$3.35</PRICE>
</BOOK>
</INVENTORY>
<!-- Comments, processing instructions, and white space
can also appear after the document element. -->
<?MyApp Parm1="value 1" Parm2="value 2" ?>

The version number of XML can be either delimited by single or double quotes. The standalone document declaration tells the processor whether external declarations are required for processing of the document. This prevents unnecessary processing of external files. You can use it even if the XML has external markup declarations as long as they don’t affect the content of the document passed from the XML processor to the application.

Also look at the white space. XML ignores any kind of white space, LF, CR as long as it is not contained in character data.

Adding Elements to the Document


The document must have exactly one top-level root with other elements nested within it.

This is well formed

<?xml version="1.0"?>
<!-- File Name: Inventory.xml -->
<INVENTORY>
<BOOK>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
</INVENTORY>
This is not
<?xml version="1.0"?>
<!-- File Name: Inventory.xml -->
<BOOK>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
Elements must be properly nested
<BOOK>
<TITLE>Leaves of Grass</BOOK></TITLE>

An element that contains one or more nested elements is known as the parent element, the one within the parent elements is the child element.

The element that appears within the start or end tag is called element type or generic identifier(GI)

The type name specifies a particular type or class of element not the specific element

Types of Element Content

  • Nested Elements

    <BOOK><TITLE>The Adventures of Huckleberry Finn</TITLE>
    <AUTHOR>Mark Twain</AUTHOR>
    <BINDING>mass market paperback</BINDING>
    <PAGES>298</PAGES>
    <PRICE>$5.49</PRICE>
    </BOOK>
    Character data
    <TITLE>The Adventures of Huckleberry Finn
    <SUBTITLE>This is a tes</SUBTITLE>
    </TITLE>
    General entity reference or character references.
    <TITLE>The Adventures of Huckleberry Finn
    Author: &author;
    Document Name: “How to enter the &#60; character”
    </TITLE>

NOTE: when you open the XML document directly on ie it only checks to see if your document is well formd not if it’s valid.