caucho
 JAXP Parsing


The standard Java interface for XML parsing, JAXP, provides two methods for parsing XML: DOM and SAX. The DOM (document object model) parses the input into an XML tree using the org.w3c.xml.Node API. SAX doesn't create any result object tree, instead calling methods in a ContentHandler.

In general, DOM parsing is easier to understand, but SAX parsing can be more efficient because many applications can avoid creating an intermediate XML tree.

DOM parsing

For strict XML parsing to the DOM, the best technique is to use the standard JAXP API. That way, you can configure your application to use whichever XML parser is most convenient for you.

JAXP parsing uses the following steps:

  1. Create a DocumentBuilderFactory instance.
  2. Set any parser flags or properties.
  3. Create DocumentBuilder to create the parser.
  4. Parse the document
Reading and Writing using the DOM
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import com.caucho.xml.*;

...

// Create a new parser using the JAXP API (javax.xml.parser)
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();

// Parse the file into a DOM Document (org.w3c.dom)
Document doc = parser.parse("test.xml");

// Create a new XML printer (com.caucho.xml)
FileOutputStream os = new FileOutputStream("out.xml");
XmlPrinter printer = new XmlPrinter(os);

// Print the document
printer.print(doc);
os.close();

Printing DOM objects is easily done by using Resin's XmlPrinter API.

Configuration

JAXP is a standard interface which supports pluggable XML parser implementations. JAXP selects the parser based on system properties. You can set the properties to select a different parser than the default one.

If you use libraries which include JAXP classes, those libraries might default to another, probably slower, XML parser. You may need to configure the system properties to ensure that you'll use Resin's fast parser.

JAXP Properties for Resin
system propertyResin value
javax.xml.parsers.DocumentBuilderFactory com.caucho.xml.parsers.XmlDocumentBuilderFactory
javax.xml.parsers.SAXParserFactory com.caucho.xml.parsers.XmlSAXParserFactory
javax.xml.transform.TransformerFactory com.caucho.xsl.Xsl

JAXP Properties for Xalan/Xerces
system propertyXerces value
javax.xml.parsers.DocumentBuilderFactory org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
javax.xml.parsers.SAXParserFactory org.apache.xerces.jaxp.SAXParserFactoryImpl
javax.xml.transform.TransformerFactory org.apache.xalan.processor.TransformerFactoryImpl

The resin.conf and web.xml will let you configure system properties on a per-application basis. The configuration looks like:

<web-app>
  <system-property javax.xml.parsers.DocumentBuilderFactory=
                      "com.caucho.xml.parsers.XmlDocumentBuilderFactory"/>

  ...
</web-app>


Copyright © 1998-2002 Caucho Technology, Inc. All rights reserved.
Resin® is a registered trademark, and HardCoretm and Quercustm are trademarks of Caucho Technology, Inc.