XML Basics
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</bookstore>
A prolog defines the XML version and the character encoding: this line is called “prolog”
<?xml version="1.0" encoding="UTF-8"?>
next line is the root element of the document:
<bookstore>
The next line starts a <book> element: child element
<book category="cooking">
Every tag should have a closing tag.
Attribute values must always be quoted.
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
Entity References
Character like "<" has special meaning
This will generate an XML error:
<message>salary < 1000</message>
To avoid this error, replace the "<" character with an entity reference:
<message>salary < 1000</message>
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
<!-- This is a -- comment -->
Strange, but allowed:
<!-- This is a - - comment -->
White-space is Preserved in XML
XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one single white-space):
XML: Hello Tove
HTML: Hello Tove
XML Stores New Line as LF
Windows applications store a new line as: carriage return and line feed (CR+LF).
Unix and Mac OSX uses LF.
Old Mac systems uses CR.
XML stores a new line as LF.
What is an XML Element?
An XML element is everything from (including) the element's start tag to (including) the element's end tag.
<price>29.99</price>
An element can contain:
• text
• attributes
• other elements
• or a mix of the above
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
<title>, <author>, <year>, and <price> have text content because they contain text (like 29.99).
<bookstore> and <book> have element contents, because they contain elements.
<book> has an attribute (category="children").
Note: you can have space after the name given in the start element tag, not before
<title > - allowed
< title>, -> Disallowed, Illegal
Empty XML Elements
In XML, you can indicate an empty element like this:
<element></element>
Or
<element />
XML Naming Rules
• Element names are case-sensitive
• Element names must start with a letter or underscore
• Element names cannot start with the letters xml (or XML, or Xml, etc)
• Element names can contain letters, digits, hyphens, underscores, and periods
• Element names cannot contain spaces
Best Naming Practices
Create descriptive names, like this: <person>, <firstname>, <lastname>.
Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.
Avoid "-". If you name something "first-name", some software may think you want to subtract "name" from "first".
Avoid ".". If you name something "first.name", some software may think that "name" is a property of the object "first".
Avoid ":". Colons are reserved for namespaces (more later).
Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software doesn't support them.
Naming Styles
There are no naming styles defined for XML elements. But here are some commonly used:
Style Example Description
Lower case <firstname> All letters lower case
Upper case <FIRSTNAME>
Underscore <first_name> Underscore separates words
Pascal case <FirstName> Uppercase first letter in each word
Camel case <firstName> Uppercase first letter in each word except the first
XML Elements are Extensible
XML elements can be extended to carry more information.
Look at the following XML example:
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Imagine that the author of the XML document added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Should the application break or crash?
No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.
This is one of the beauties of XML. It can be extended without breaking applications.
XML Attributes
Attribute values must always be quoted. Either single or double quotes can be used.
For a person's gender, the <person> element can be written like this:
<person gender="female">
or like this:
<person gender='female'>
If the attribute value itself contains double quotes you can use single quotes, like in this example:
<gangster name='George "Shotgun" Ziegler'>
or you can use character entities: this is more preferable
<gangster name="George "Shotgun" Ziegler">
Note: Metadata (data about data) should be stored as attributes, and the data itself should be stored as elements.
Below is more preferable
<note>
<date>
<year>2008</year>
<month>01</month>
<day>10</day>
</date>
<to>Tove</to>
<from>Jani</from>
</note>
Avoid XML Attributes?
Some things to consider when using attributes are:
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
Don't end up like this:
<note day="10" month="01" year="2008"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>
XML Attributes for Metadata
Sometimes ID references are assigned to elements. These IDs can be used to identify XML elements in much the same way as the id attribute in HTML. This example demonstrates this:
<messages>
<note id="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
</messages>
The id attributes above are for identifying the different notes. It is not a part of the note itself.
Metadata (data about data) should be stored as attributes, and the data itself should be stored as elements.
XML Namespaces
provide a method to avoid element name conflicts.
No comments:
Post a Comment