XML is a meta language, a guideline to structure data in a plain textual form. The unanimity of its acceptability is due to its simplistic formulation structure, and is embraced across many a programming languages, especially for enterprise application messaging or B2B communication. Java is no exception. Java enterprise components can be deployed, though optionally, with simple XML descriptors. The primary advantage is that, data in XML can be utilized by the end user without delving into the program that produced it. However, XML communication is mainly targeted between B2B programs than end user.
Listing 1: article.xml
<?xml version="1.0" encoding="UTF-8"?> <catalog title="Article" publisher="www.developer.com"> <article> <title>Creating RESTful Web Services with JAX-RS</title> <publish>November 5, 2013</publish> <author>Manoj Debnath</author> <link>http://www.developer.com/java/creating-restful-web-services-jax-rs/</link> </article> <article> ... </article> </catalog>
XML Specification
The XML ecosystem of Java provides two specifications – JAXP (JSR 206) and JAXB (JSR 222). JAXP (Java API for XML Parsing) is a part of the Java SE platform; it provides various API to parse XML documents using DOM(Document Object Model), SAX(Simple API for XML) and XSLT(Extensible Stylesheet Language Transformations) parsers. The main vantage point of JAXP is its flexibility and ability to manipulate XML documents at a very low-level. JAXB (Java Architecture for XML Binding) is rather a binding specification facilitating marshalling and unmarshalling of XML documents into a set of POJO with the help of various APIs and annotations. JAXB is closely integrated with JAX-RS web services and better suited in enterprise development. XML processing through JAXB is comparatively faster than JAXP.
Listing 2: Catalog.java
public class Catalog { private String title; private String publishDate; private String author; private String link; //...constructor, getters, setters, toString }
Listing 3: ParseXML.java
public class ParseXML { public static void main(String[] args) { SAXParsing sax = new SAXParsing(); sax.parseDocument(); List<Catalog> list = sax.getListOfArticles(); for(Catalog c:list){ System.out.println(c.toString()); } /*DOMParsing dom = new DOMParsing(); dom.parseDocument(); List<Catalog> list = dom.getListOfArticles(); for(Catalog c:list){ System.out.println(c.toString()); }*/ } }
JAXP with SAX
If we want to access the XML data of Listing 1 with JAXP, we can use either SAX or DOM API. SAX is event based and requires less memory than DOM. SAX is actually a serial access mechanism to parse XML documents. The parser generates the event and delivers through callback methods. The basic requirement pertaining to the use of SAX 2.0 APIs are as follows:
- Import three packages: org.sax.xml for SAX Interface, javax.xml.parsers for using SAXParser and SAXParserFactory and org.xml.sax.helpers, which has the DefaultHandler class.
- DefaultHandler class is the default implementation of ContentHandler interface. This class provides the event notification of the parsing events.
- SAXParser class is used to parse the XML document.
- The static method newInstance() of SAXParserFactory class is used to create SAXParser.
Parsing Steps
1. Use newInstance() method of SAXParserFactory to create the object.
2. Create SAXParser object with the help of newSAXParser() factory method.
3. Extend DefaultHandler, the default implementation class for the interface ContentHandler.
4. Parse the XML document with the help of overloaded methods.
Listing 4: SAXParsing.xml
public class SAXParsing extends DefaultHandler { private List<Catalog> articles = new ArrayList<>(); private boolean title = false; private boolean publish = false; private boolean author = false; private boolean link = false; private Catalog c; public List<Catalog> getListOfArticles(){ return articles; } public void parseDocument() { try { SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); saxParser.parse(new File("src/article.xml"), this); } catch (Exception e) { e.printStackTrace(); } } public void characters(char[] ch, int start, int length) throws SAXException { if (title) { c.setTitle(new String(ch, start, length)); title = false; } if (publish == true) { c.setPublishDate(new String(ch, start, length)); publish = false; } if (author == true) { c.setAuthor(new String(ch, start, length)); author = false; } if (link == true) { c.setLink(new String(ch, start, length)); link = false; } } public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if(qName.equalsIgnoreCase("article")){ c=new Catalog(); articles.add(c); } if (qName.equalsIgnoreCase("title")) title = true; if (qName.equalsIgnoreCase("publish")) publish = true; if (qName.equalsIgnoreCase("author")) author = true; if (qName.equalsIgnoreCase("link")) link = true; } public void endElement(String uri, String localName, String qName) throws SAXException { } }
JAXP with DOM
DOM represents XML documents as a tree of nodes. DOM is memory intensive and comparatively slower than SAX especially in parsing large XML documents. The essential parts and key points for using DOM level 3 API is as follows:
- Most important package for using DOM API is org.w3c.dom.
- The NodeList interface represents an ordered list of nodes. This interface is used to traverse the DOM tree structure of an XML document.
Parsing Steps
1. Create DOM Parser factory with the help of DocumentBuilderFactory class’s newInstance() method.
2. Instantiate DOM parser with DocumentBuilder. The DocumentBuilder class implements the DOM parser.
3. Parse the XML document and create the DOM tree.
4.Manipulate XML content by accessing DOM tree.
Listing 5: DOMParsing.java
public class DOMParsing { private List<Catalog> articles = new ArrayList<>(); public List<Catalog> getListOfArticles(){ return articles; } public void parseDocument() { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new File("src/article.xml")); doc.getDocumentElement().normalize(); NodeList nodeList = doc.getElementsByTagName("article"); for (int i = 0; i < nodeList.getLength(); i++) { Node node = nodeList.item(i); if (node.getNodeType() == Node.ELEMENT_NODE) { Element element = (Element) node; Catalog c=new Catalog(); c.setTitle(element.getElementsByTagName("title").item(0) .getTextContent()); c.setAuthor(element.getElementsByTagName("author").item(0) .getTextContent()); c.setPublishDate(element.getElementsByTagName("publish").item(0) .getTextContent()); c.setLink(element.getElementsByTagName("link").item(0) .getTextContent()); articles.add(c); } } } catch (Exception e) { e.printStackTrace(); } } }
JAXB: Java Architecture for XML Binding
JAXB is the architecture to bind the schema for XML documents with a plain old Java Object. It provides a higher level of abstraction and is very convenient to use due to the annotation than the SAX or DOM models. Any class can be made XML bound with the annotation and APIs provided by the JAXB. Observe that Catalog class is transformed into XML bound class with simple annotation, @XmlRootElement in Listing 6. JAXB will now bind the class back and forth from XML to Java. With marshalling mechanism JAXB can create the XML representation of Catalog instance such as in Listing 1.
Listing 6: Catalog class with JAXB
@XmlRootElemet public class Catalog { private String title; private String publishDate; private String author; private String link; //...constructor, getters, setters, toString }
Marshalling is nothing but the transformation of an object into an XML document and inversely, unmarshalling is where the XML document is taken as input and Catalog object is delivered. The primary advantage of this two-way mapping is that a developer can treat XML documents as if they are Java objects without the need to write explicit code to parse the XML document like SAX and DOM parsers. JAXB annotation is very similar to JPA in the sense that JPA provides object relational mapping to the database whereas JAXB provides annotation for mapping an XML document. Another variation for the listing 6 can be as follows:
Listing 7: Catalog class with JAXB
@XmlRootElemet @XmlAccessorType(XmlAccessType.FIELD) public class Catalog { @XmlAttribute(required = true) private String title; @XmlElement(name = "publish-date", defaultValue = "01/2013") private String publishDate; private String author; private String link; //...constructor, getters, setters, toString }
Conclusion
XSLT API is used in JAXP to transform XML documents. An XML document containing XSLT transformation is commonly referred to as Stylesheet. XSLT stylesheet merely provides a set of transformation and an XSLT processor is needed to apply these transformations to an XML document. In this article I have tried to give an overview of how to parse XML documents and the different ways to do so. Obviously, JAXB is the best option when choosing among them from the point of view of convenience but SAX and DOM have their own uses, especially while manipulating XML documents at a very low level.