LanguagesXMLXML for Beginners, Part 1: Structured Documents, Plain Text, and Rendering

XML for Beginners, Part 1: Structured Documents, Plain Text, and Rendering


Preface


Who in the world is Dick Baldwin?

Hello!  My real name is Richard Baldwin, although most people call me Dick.

Although I have authored numerous articles on XML in the past, this is the first XML article that I have authored under my new XML relationship with EarthWeb.

I maintain a consolidated index of hyperlinks to all of my XML articles at my personal website so that you can access earlier articles from there.

Not a new relationship

My relationship with EarthWeb is not a new one.  I have been publishing Java and Python articles on an EarthWeb site for quite some time.  If you have any interest in Java or Python programming, please look me up there as well.

My profession…

I am a college professor, private consultant, and technical author.  Like many in my field, I spend twelve to fourteen hours each day at the keyboard of my trusty Dell laptop.

I am very pleased to be here, and I promise to do my best to provide an XML resource that you will find both useful and productive. — So, read on!


Introduction

The title of this article is XML for Beginners, Part I, and it really is meant for beginners in XML.

Experts skip this article

Those of you who already know a lot about XML can skip ahead to something more challenging, such as some of my articles on XSL, for example.  You will find links to all of my articles at my personal website.

Beginners, keep reading

Those of you who are just getting your feet wet in this area (and may have found the XML water to be a little deep), keep reading.

I will throw you an XML lifeline in this and the next few articles.

No jargon allowed

Computer people are the world’s worst at inventing new jargon.  XML people seem to be the worst of the worst in this regard.

Go to an XML convention and everything that you hear will be X-this, X-that, X-everything.  Sometimes I get dizzy just trying to keep the various X’s separated from one another.

In this explanation of XML for beginners, I will try to avoid the use of jargon, or will at least explain the jargon the first time that I use it.


What is XML?

There are many definitions and descriptions for XML.  Here is one that I like.

XML gives us a way to create and maintain structured documents in plain text that can be rendered in a variety of different ways.

A primary objective of XML is to separate content from presentation.

Oops, there goes the jargon again

In the paragraphs that follow, I will explain the jargon:

  • structured documents
  • plain text
  • rendered

What do I mean by a structured document?

Let me answer this by providing an example.

A book is a structured document

In its simplest form, a book may be composed of chapters.

The chapters may be composed of sections.

The sections may contain illustrations and tables.

The tables are composed of rows and columns.

Thus, it should be possible to draw a picture that illustrates the structure of a book, and most people who are familiar with books will probably recognize it as such.

What do I mean by “plain text?”

Characters such as the letters of the alphabet and punctuation marks are represented in the computer by numeric values, similar to a simple substitution code that a child might devise.

ASCII is an encoding scheme

For example in one popular encoding scheme (ASCII), the upper-case version of the character “A” is represented by the value 65, a “B” is represented by the value 66, a “C” is represented by 67, etc.

The actual correspondence between the characters and the specific numeric values representing the characters has been described by several different encoding schemes over the years.

ASCII is also an organization

One of the most common and enduring schemes for encoding characters is a scheme that was devised a number of years ago by an organization known as the American Standards Committee on Information Interchange.

Given the initials of the organization, this encoding scheme is commonly known as the ASCII code.

Here is what one author has to say about the ASCII code (or plain text).
 

“This stands for American Standards Committee on Information Interchange. What it means in practice is plain text, that is to say text which is readable directly without using any special software. The advantage of ASCII is that it is a lowest common denominator which can be displayed on any platform. The disadvantage is that it is rather limited and somewhat boring. The text cannot display bold, italics or underlined fonts, and there is no scope for graphics or hypertext. However, it is simple, … and is almost idiot-proof as a means of information exchange. To see a short example of ASCII click HERE, or to see a journal article in ASCII click HERE.”

XML is not confined to the ASCII code

XML is not confined to the use of the ASCII encoding scheme. Several different encoding schemes can be used.

However, all of them, have been selected to make it possible to read a raw XML document without the requirement for any special software (other than perhaps a text editor or the DOS type command).

What is a raw XML document?

A raw XML document is the string of sequential characters that makes up the document, before any specific rendering has been applied to the document.

What is rendering?

In modern computer jargon, rendering typically means to present something for human consumption.

Rendering a drawing

For example, in a computer, drawings and images are nothing more or less than sets of numbers and possibly formulas.  Those numbers and formulas, taken at face value, usually mean very little to most human observers.

Recognition by a human observer

When we speak of rendering a drawing or an image, we usually mean that we are going to present it in a way that makes it look like a drawing or an image to a human observer. In other words, we convert the numbers and formulas that comprise the drawing to a set of colored dots (pixels) that a human observer will recognize as a drawing.

Rendering a document

When we speak of rendering a document, we usually mean that we are going to present it in a way that a human will recognize it as a book, a newspaper, or some other document style, which can be read by the human observer.

Passing information through typography

Rendering, in this case, often means to present some of the material in boldface, some of the material in Italics, some of the material underlined, some of the material in color, etc.

Separate presentation from content

Raw XML doesn’t exhibit any of these properties, such as boldface, Italics, or color.  Remember, a main objective of XML is to separate presentation from content.  XML provides only the content.  The presentation of that content must come from somewhere else.

Consider a newspaper

These days, there are at least two different ways to render a newspaper. One way is to print the information (daily news), mostly in black and white, on large sheets of low-grade paper commonly known as newsprint. This is the rendering format that ends up on my driveway each morning.

My online newspaper

Another way to render a newspaper is to present the information on a computer screen, usually in full color, with the information content trying to fight its way through dozens of animated advertisements.

CNN online

For example, here is the sort of rendering format from CNN that ends up on my computer screen each day when I check for Email messages.

The news doesn’t change

The base information for the newspaper doesn’t (or shouldn’t) change for the newsprint and online renderings. After all, news is news and the content of the news shouldn’t depend on how it is presented. What does change is the manner in which that information is presented.

A structured document

A newspaper is a structured document consisting of pages, columns, etc.

The great promise of XML

When the information content of a newspaper is created and maintained in XML, that same information content can be rendered on newsprint paper, on your computer screen, or potentially in other formats without having to rewrite the information content.

Not necessarily boring

If you visit the above link to the journal article rendered solely in ASCII, you will probably agree that from a presentation viewpoint it is pretty boring (no offense intended to the author of the article).

However, documents created and maintained in plain text need not necessarily be boring.

When you combine a rendering engine with XML…

It is possible to apply a rendering engine (such as XSL) to the XML content and to render that content in rich and exciting ways.

Separating content from presentation

XML is responsible for maintaining the content, independent of presentation.

A rendering engine, such as XSL, is responsible for rendering that content in ways that are appropriate for the application.

(XSL is an advanced topic that we will be getting to in a few weeks.)


What’s Next?

In my next article I will discuss the mechanism by which XML and an appropriate rendering engine can use boring plain text to maintain content and to display richly-formatted structured documents.

Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without  express written permission from Richard Baldwin is prohibited.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories