Last week we discussed the incorrect usage of Java EE in the EJAPP Top 10 countdown. After that very broad topic, it’s time for a very specific one on number 7.
The unnecessary use of XML is a boon to the performance of your Java EE application. XML has a number of interesting properties that cause it to be used for a whole range of functions in the average Enterprise Java application:
- as a configuration file format, e.g. Java EE deployment descriptors and Spring configuration files,
- as a remoting protocol, e.g. SOAP or Burlap, and even
- as a data storage format, e.g. Apache Xindice and eXist.
Someone once said that "XML is the noun and Java is the verb" and that has caught on. Unfortunately XML processing is heavy on CPU cycles and on memory usage.
This performance hit can be traced to a number of areas:
- Parsing. There are a number of ways to parse an XML document, each with its own trade off between programming convenience and runtime performance:
- SAX – its event based model is efficient but can be hard to program against.
- DOM – its tree model is easier to use but causes the whole XML document to be loaded into memory in one go. Alternative APIs using a tree model like JDOM, dom4j, and XOM suffer from the same problem.
- StAX – its pull model is a cross between SAX and DOM. It allows the program to pull parsed fragments of the XML document into memory thereby avoiding SAX’s complicated event model and DOM’s memory-inefficient tree model. Available as part of Java SE 6.
When parsing an XML document, choose the parser that suits your needs, set its features (do you need validation? can you use a local DTD?) , and tune its performance.
- Transformation. When transforming an XML document into another XML document, XSLT is used most of the time. Unfortunately XSLT processing is an expensive operation, both CPU- and memory-wise. Xalan-J is the most popular Java XSLT transformer and its performance can be improved by using compiled stylesheets and by tuning its usage. Other XSLT transformers may offer better performance.
- Generation. Generating an XML document (also known as XML serialization) is generally a lot faster than parsing XML. No complex logic is needed and no objects need to be created. Just make sure you write to a buffered stream and not a String for big documents.
- Data binding. Some applications don’t directly process the XML, but use XML data binding to transform the parsed XML document into POJO’s (unmarshalling) and back (marshalling). Different data binding frameworks have different performance characteristics.
Even though it is possible to tune the performance of your XML parser, transformer, and data binding framework, XML processing is never going to beat property files (for configuration files), Java serialization (for remoting), or plain CSV files and SQL databases (for data storage) when it comes down to performance. So don’t choose XML just because it is the hip thing to do!
More from this Series
- EJApp Top 10 BOF Session at JavaPolis 2006
- #1: Incorrect Database Usage
- #2: Unnecessary Remoting
- #3: Incorrectly Implemented Concurrency
- #4: Badly Performing Libraries
- #5: Excessive Memory Usage
- #6: Improper Caching
- #7: Unnecessary Use of XML
- #8: Incorrect Usage of Java EE
- #9: Incorrect Application Server Configuration
- #10: Excessive Logging
- EJApp Top 10 Countdown Wrap-Up