XML for Beginners, Part 1: Structured Documents, Plain Text, and Rendering
What is XML?
There are many definitions and descriptions for XML. Here is one that I like.
|XML gives us a way to create and maintain structured documents in plain text that can be rendered in a variety of different ways. |
Oops, there goes the jargon again
In the paragraphs that follow, I will explain the jargon:
- structured documents
- plain text
Let me answer this by providing an example.
A book is a structured document
In its simplest form, a book may be composed of chapters.
The chapters may be composed of sections.
The sections may contain illustrations and tables.
The tables are composed of rows and columns.
Thus, it should be possible to draw a picture that illustrates the structure of a book, and most people who are familiar with books will probably recognize it as such.
What do I mean by "plain text?"
Characters such as the letters of the alphabet and punctuation marks are represented in the computer by numeric values, similar to a simple substitution code that a child might devise.
ASCII is an encoding scheme
For example in one popular encoding scheme (ASCII), the upper-case version of the character "A" is represented by the value 65, a "B" is represented by the value 66, a "C" is represented by 67, etc.
The actual correspondence between the characters and the specific numeric values representing the characters has been described by several different encoding schemes over the years.
ASCII is also an organization
One of the most common and enduring schemes for encoding characters is a scheme that was devised a number of years ago by an organization known as the American Standards Committee on Information Interchange.
Given the initials of the organization, this encoding scheme is commonly known as the ASCII code.
Here is what one author has to say about the ASCII code (or plain text).
|"This stands for American Standards Committee on Information Interchange. What it means in practice is plain text, that is to say text which is readable directly without using any special software. The advantage of ASCII is that it is a lowest common denominator which can be displayed on any platform. The disadvantage is that it is rather limited and somewhat boring. The text cannot display bold, italics or underlined fonts, and there is no scope for graphics or hypertext. However, it is simple, ... and is almost idiot-proof as a means of information exchange. To see a short example of ASCII click HERE, or to see a journal article in ASCII click HERE."|
XML is not confined to the ASCII code
XML is not confined to the use of the ASCII encoding scheme. Several different encoding schemes can be used.
However, all of them, have been selected to make it possible to read a raw XML document without the requirement for any special software (other than perhaps a text editor or the DOS type command).
What is a raw XML document?
A raw XML document is the string of sequential characters that makes up the document, before any specific rendering has been applied to the document.
What is rendering?
In modern computer jargon, rendering typically means to present something for human consumption.
Rendering a drawing
For example, in a computer, drawings and images are nothing more or less than sets of numbers and possibly formulas. Those numbers and formulas, taken at face value, usually mean very little to most human observers.
Recognition by a human observer
When we speak of rendering a drawing or an image, we usually mean that we are going to present it in a way that makes it look like a drawing or an image to a human observer. In other words, we convert the numbers and formulas that comprise the drawing to a set of colored dots (pixels) that a human observer will recognize as a drawing.
Rendering a document
When we speak of rendering a document, we usually mean that we are going to present it in a way that a human will recognize it as a book, a newspaper, or some other document style, which can be read by the human observer.
Passing information through typography
Rendering, in this case, often means to present some of the material in boldface, some of the material in Italics, some of the material underlined, some of the material in color, etc.
Separate presentation from content
Raw XML doesn't exhibit any of these properties, such as boldface, Italics, or color. Remember, a main objective of XML is to separate presentation from content. XML provides only the content. The presentation of that content must come from somewhere else.
Consider a newspaper
These days, there are at least two different ways to render a newspaper. One way is to print the information (daily news), mostly in black and white, on large sheets of low-grade paper commonly known as newsprint. This is the rendering format that ends up on my driveway each morning.
My online newspaper
Another way to render a newspaper is to present the information on a computer screen, usually in full color, with the information content trying to fight its way through dozens of animated advertisements.
For example, here is the sort of rendering format from CNN that ends up on my computer screen each day when I check for Email messages.
The news doesn't change
The base information for the newspaper doesn't (or shouldn't) change for the newsprint and online renderings. After all, news is news and the content of the news shouldn't depend on how it is presented. What does change is the manner in which that information is presented.
A structured document
A newspaper is a structured document consisting of pages, columns, etc.
The great promise of XML
When the information content of a newspaper is created and maintained in XML, that same information content can be rendered on newsprint paper, on your computer screen, or potentially in other formats without having to rewrite the information content.
Not necessarily boring
If you visit the above link to the journal article rendered solely in ASCII, you will probably agree that from a presentation viewpoint it is pretty boring (no offense intended to the author of the article).
However, documents created and maintained in plain text need not necessarily be boring.
When you combine a rendering engine with XML...
It is possible to apply a rendering engine (such as XSL) to the XML content and to render that content in rich and exciting ways.
Separating content from presentation
XML is responsible for maintaining the content, independent of presentation.
A rendering engine, such as XSL, is responsible for rendering that content in ways that are appropriate for the application.
(XSL is an advanced topic that we will be getting to in a few weeks.)
In my next article I will discuss the mechanism by which XML and an appropriate rendering engine can use boring plain text to maintain content and to display richly-formatted structured documents.
Copyright 2000, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.
Page 2 of 2