The following information is from The World Wide
Web Consortium
w3.org
If you are new to XML, it may be hard to know where to begin. This
summary in 10 points attempts to capture enough of the basic
concepts to enable a beginner to see the forest through the trees.
And if you are giving a presentation on XML, why not start with
these 10 points?
Structured data includes things like spreadsheets, address books,
configuration parameters, financial transactions, and technical
drawings. XML is a set of rules (you may also think of them as
guidelines or conventions) for designing text formats that let you
structure your data. XML is not a programming language, and you
don't have to be a programmer to use it or learn it. XML makes it
easy for a computer to generate data, read data, and ensure that the
data structure is unambiguous. XML avoids common pitfalls in
language design: it is extensible, platform-independent, and it
supports internationalization and localization. XML is fully
Unicode-compliant.
Like HTML, XML makes use of tags (words bracketed by '<'
and '>') and attributes (of the form
name="value"
).
While HTML specifies what each tag and attribute means, and often
how the text between them will look in a browser, XML uses the tags
only to delimit pieces of data, and leaves the interpretation of the
data completely to the application that reads it. In other words, if
you see "<p>" in an XML file, do not assume it is a paragraph.
Depending on the context, it may be a price, a parameter, a person,
a p... (and who says it has to be a word with a "p"?).
Programs that produce spreadsheets, address books, and other
structured data often store that data on disk, using either a binary
or text format. One advantage of a text format is that it allows
people, if necessary, to look at the data without the program that
produced it; in a pinch, you can read a text format with your
favorite text editor. Text formats also allow developers to more
easily debug applications. Like HTML, XML files are text files that
people shouldn't have to read, but may when the need arises.
Compared to HTML, the rules for XML files allow fewer variations. A
forgotten tag, or an attribute without quotes makes an XML file
unusable, while in HTML such practice is often explicitly allowed.
The official XML specification forbids applications from trying to
second-guess the creator of a broken XML file; if the file is
broken, an application has to stop right there and report an error.
Since XML is a text format and it uses tags to delimit the data, XML
files are nearly always larger than comparable binary formats. That
was a conscious decision by the designers of XML. The advantages of
a text format are evident (see point 3), and the disadvantages can
usually be compensated at a different level. Disk space is less
expensive than it used to be, and compression programs like zip and
gzip can
compress files very well and very fast. In addition, communication
protocols such as modem protocols and
HTTP/1.1, the core
protocol of the Web, can compress data on the fly, saving bandwidth
as effectively as a binary format.
XML 1.0 is the
specification that defines what "tags" and "attributes" are. Beyond
XML 1.0, "the XML family" is a growing set of modules that offer
useful services to accomplish important and frequently demanded
tasks.
XLink describes a
standard way to add hyperlinks to an XML file.
XPointer is a
syntax in development for pointing to parts of an XML document. An
XPointer is a bit like a URL, but instead of pointing to documents
on the Web, it points to pieces of data inside an XML file.
CSS, the style
sheet language, is applicable to XML as it is to HTML.
XSL is the
advanced language
for expressing style sheets. It is based on
XSLT,
a transformation language used for rearranging, adding and deleting
tags and attributes. The
DOM is a
standard set of function calls for manipulating XML (and HTML) files
from a programming language.
XML Schemas 1
and
2 help
developers to precisely define the structures of their own XML-based
formats. There are several more modules and tools available or under
development. Keep an eye on W3C's
technical reports page.
Development of XML started in 1996 and it has been a W3C
Recommendation since February 1998, which may make you suspect that
this is rather immature technology. In fact, the technology isn't
very new. Before XML there was SGML, developed in the early '80s, an
ISO standard since 1986, and widely used for large documentation
projects. The development of HTML started in 1990. The designers of
XML simply took the best parts of SGML, guided by the experience
with HTML, and produced something that is no less powerful than
SGML, and vastly more regular and simple to use. Some evolutions,
however, are hard to distinguish from revolutions... And it must be
said that while SGML is mostly used for technical documentation and
much less for other kinds of data, with XML it is exactly the
opposite.
There is an important XML application that is a document format:
W3C's XHTML, the successor to HTML. XHTML has many of the same
elements as HTML. The syntax has been changed slightly to conform to
the rules of XML. A format that is "XML-based" inherits the syntax
from XML and restricts it in certain ways (e.g, XHTML allows "<p>",
but not "<r>"); it also adds meaning to that syntax (XHTML says that
"<p>" stands for "paragraph", and not for "price", "person", or
anything else).
XML allows you to define a new document format by combining and
reusing other formats. Since two formats developed independently may
have elements or attributes with the same name, care must be taken
when combining those formats (does "<p>" mean "paragraph" from this
format or "person" from that one?). To eliminate name confusion when
combining formats, XML provides a
namespace
mechanism. XSL and RDF are good
examples of XML-based formats that use namespaces.
XML Schema is
designed to mirror this support for modularity at the level of
defining XML document structures, by making it easy to combine two
schemas to produce a third which covers a merged document structure.
W3C's Resource Description Framework (RDF)
is an XML text format that supports resource description and
metadata applications, such as music playlists, photo collections,
and bibliographies. For example, RDF might let you identify people
in a Web photo album using information from a personal contact list;
then your mail client could automatically start a message to those
people stating that their photos are on the Web. Just as HTML
integrated documents, images, menu systems, and forms applications
to launch the original Web, RDF provides tools to integrate even
more, to make the Web a little bit more into a Semantic Web. Just
like people need to have agreement on the meanings of the words they
employ in their communication, computers need mechanisms for
agreeing on the meanings of terms in order to communicate
effectively. Formal descriptions of terms in a certain area
(shopping or manufacturing, for example) are called ontologies and
are a necessary part of the Semantic Web. RDF, ontologies, and the
representation of meaning so that computers can help people do work
are all topics of the Semantic
Web Activity.
By choosing XML as the basis for a project, you gain access to a
large and growing community of tools (one of which may already do
what you need!) and engineers experienced in the technology. Opting
for XML is a bit like choosing SQL for databases: you still have to
build your own database and your own programs and procedures that
manipulate it, but there are many tools available and many people
who can help you. And since XML is license-free, you can build your
own software around it without paying anybody anything. The large
and growing support means that you are also not tied to a single
vendor. XML isn't always the best solution, but it is always
worth considering.
The above is copyright of WC3 and can be viewed at
w3.org
W3C Communications Team, w3t-comm@w3.org
Revised 13 Nov. 2001 (last update: $Date: 2003/06/02 19:08:30 $)
Created 27 Mar 1999 by Bert Bos
Also see further information at this address: http://www.w3.org/2008/xml10/