Getting Started with XMLBeans
Intro
XMLBeans provides intuitive ways to handle XML that make it easier for you to access and manipulate XML data and documents in Java.
Characteristics of XMLBeans approach to XML:
- It provides a familiar Java object-based view of XML data without losing access to the original, native XML structure.
- The XML's integrity as a document is not lost with XMLBeans. XML-oriented APIs commonly take the XML apart in order to bind to its parts. With XMLBeans, the entire XML instance document is handled as a whole. The XML data is stored in memory as XML. This means that the document order is preserved as well as the original element content with whitespace.
- With types generated from schema, access to XML instances is through JavaBean-like accessors, with get and set methods.
- It is designed with XML schema in mind from the beginning — XMLBeans supports all XML schema definitions.
- Access to XML is fast.
The starting point for XMLBeans is XML schema.
A schema (contained in an XSD file) is an XML document that defines a set of rules to which other XML documents must conform. The XML Schema specification provides a rich data model that allows you to express sophisticated structure and constraints on your data. For example, an XML schema can enforce control over how data is ordered in a document, or constraints on particular values (for example, a birth date that must be later than 1900). Unfortunately, the ability to enforce rules like this is typically not available in Java without writing custom code. XMLBeans honors schema constraints.
Note:Where an XML schema defines rules for an XML document, an XML instance is an XML document that conforms to the schema.
You compile a schema (XSD) file to generate a set of Java interfaces that mirror the schema. With these types, you process XML instance documents that conform to the schema. You bind an XML instance document to these types; changes made through the Java interface change the underlying XML representation.
Previous options for handling XML include using XML programming interfaces (such as DOM or SAX) or an XML marshalling/binding tool (such as JAXB). Because it lacks strong schema-oriented typing, navigation in a DOM-oriented model is more tedious and requires an understanding of the complete object model. JAXB provides support for the XML schema specification, but handles only a subset of it; XMLBeans supports all of it. Also, by storing the data in memory as XML, XMLBeans is able to reduce the overhead of marshalling and demarshalling.
Accessing XML Using Its Schema
To get a glimpse of the kinds of things you can do with XMLBeans, take a look at an example using XML for a purchase order. The purchase order XML contains data exchanged by two parties, such as two companies. Both parties need to be able to rely on a consistent message shape, and a schema specifies the common ground.
Here's what a purchase order XML instance might look like.
This XML includes a root element, purchase-order, that has three kinds of child elements: customer, date, line-item, and shipper. An intuitive, object-based view of this XML would provide an object representing the purchase-order element, and it would have methods for getting the date and for getting subordinate objects for customer, line-item, and shipper elements. Each of the last three would have its own methods for getting the data inside them as well.
Looking at the Schema
The following XML is the the schema for the preceding purchase order XML. It defines the XML's "shape" — what its elements are, what order they appear in, which are children of which, and so on.
This schema describes the purchase order XML instance by defining the following:
- Definitions for three complex types — customer, line-item, and shipper. These are the types used for the children of the purchase-order element. In schema, a complex type is one that defines an element that may have child elements and attributes. The sequence element nested in the complex type lists its child elements.
- These are also global types. They are global because they are at the top level of the schema (in other words, just beneath the schema root element). This means that they may be referenced from anywhere else in the schema.
- Use of simple types within the complex types. The name, address, and description elements (among others) are typed as simple types. As it happens, these are also built-in types. A built-in type (here, one with the "xs" prefix) is part of the schema specification. (The specification defines 46 built-in types.)
- A global element called purchase-order. This element definition includes a nested complex type definition that specifies the child elements for a purchase-order element. Notice that the complex type includes references to the other complex types defined in this schema.
In other words, the schema defines types for the child elements and describes their position as subordinate to the root element, purchase-order.
When you use the XMLBean compiler with an XSD file such as this one, you generate a JAR file containing the interfaces generated from the schema.
Writing Java Code That Uses the Interfaces
With the XMLBeans interfaces in your application, you can write code that uses the new types to handle XML based on the schema. Here's an example that extracts information about each of the ordered items in the purchase order XML, counts the items, and calculates a total of their prices. In particular, look at the use of types generated from the schema and imported as part of the org.openuri.easypo package.
The printItems method receives a File object containing the purchase order XML file.
Notice that types generated from the schema reflect what's in the XML:
- A PurchaseOrderDocument represents the global root element.
- A getPurchaseOrder method returns a PurchaseOrderDocument.PurchaseOrder type that contains child elements, including line-item. A getLineItemArray method returns a LineItem array containing the line-item elements.
- Other methods, such as getQuantity, getPrice, and so on, follow naturally from what the schema describes, returning corresponding children of the line-item element.
- The name of the package containing these types is derived from the schema's target namespace.
Capitalization and punctuation for generated type names follow Java convention. Also, while this example parses the XML from a file, other parse methods support a Java InputStream object, a Reader object, and so on.
The preceding Java code prints the following to the console:
Creating New XML Instances from Schema
As you've seen XMLBeans provides a "factory" class you can use to create new instances. The following example creates a new purchase-order element and adds a customer child element. It then inserts name and address child elements, creating the elements and setting their values with a single call to their set methods.
The following is the XML that results. Note that XMLBeans assigns the correct namespace based on the schema, using an "ns1" (or, "namespace 1") prefix. For practical purposes, the prefix itself doesn't really matter — it's the namespace URI (http://openuri.org/easypo) that defines the namespace. The prefix is merely a marker that represents it.
Note that all types (including those generated from schema) inherit from XmlObject, and so provide a Factory class. For an overview of the type system in which XmlObject fits, see XMLBeans Support for Built-In Schema Types. For reference information, see XmlObject Interface.
XMLBeans Hierarchy
The generated types you saw used in the preceding example are actually part of a hierarchy of XMLBeans types. This hierarchy is one of the ways in which XMLBeans presents an intuitive view of schema. At the top of the hierarchy is XmlObject, the base interface for XMLBeans types. Beneath this level, there are two main type categories: generated types that represent user-derived schema types, and included types that represent built-in schema types.
This topic has already introduced generated types. For more information, see Java Types Generated from User-Derived Schema Types.
Built-In Type Support
In addition to types generated from a given schema, XMLBeans provides 46 Java types that mirror the 46 built-in types defined by the XML schema specification. Where schema defines xs:string, xs:decimal, and xs:int, for example, XMLBeans provides XmlString , XmlDecimal , and XmlInt . Each of these also inherits from XmlObject, which corresponds to the built-in schema type xs:anyType.
XMLBeans provides a way for you to handle XML data as these built-in types. Where your schema includes an element whose type is, for example, xs:int, XMLBeans will provide a generated method designed to return an XmlInt. In addition, as you saw in the preceding example, for most types there will also be a method that returns a natural Java type such as int. The following two lines of code return the quantity element's value, but return it as different types.
In a sense both get methods navigate to the quantity element; the getQuantity method goes a step further and converts the elements value to the most appropriate natural Java type before returning it. (XMLBeans also provides a means for validating the XML as you work with it.)
If you know a bit about XML schema, XMLBeans types should seem fairly intuitive. If you don't, you'll learn a lot by experimenting with XMLBeans using your own schemas and XML instances based on them.
For more information on the methods of types generated from schema, see Methods for Types Generated From Schema. For more about the how XMLBeans represents built-in schema types, see XMLBeans Support for Built-In Schema Types.
Using XQuery Expressions
With XMLBeans you can use XQuery to query XML for specific pieces of data. XQuery is sometimes referred to as "SQL for XML" because it provides a mechanism to access data directly from XML documents, much as SQL provides a mechanism for accessing data in traditional databases.
XQuery borrows some of its syntax from XPath, a syntax for specifying nested data in XML. The following example returns all of the line-item elements whose price child elements have values less than or equal to 20.00:
This code creates a new cursor at the start of the document. From there, it uses the XmlCursor interface's execQuery method to execute the query expression. In this example, the method's parameter is an XQuery expression that simply says, "From my current location, navigate through the purchase-order element and retrieve those line-item elements whose value is less than or equal to 20.00." The $this variable means "the current position."
For more information about XQuery, see XQuery 1.0: An XML Query Language at the W3C web site.
Using XML Cursors
In the preceding example you may have noticed the XmlCursor interface. In addition to providing a way to execute XQuery expression, an XML cursors offers a fine-grained model for manipulating data. The XML cursor API, analogous to the DOM's object API, is simply a way to point at a particular piece of data. So, just like a cursor helps navigate through a word processing document, the XML cursor defines a location in XML where you can perform actions on the selected XML.
Cursors are ideal for moving through an XML document when there's no schema available. Once you've got the cursor at the location you're interested in, you can perform a variety of operations with it. For example, you can set and get values, insert and remove fragments of XML, copy fragments of XML to other parts of the document, and make other fine-grained changes to the XML document.
The following example uses an XML cursor to navigate to the customer element's name child element.
What's happening here? As with the earlier example, the code loads the XML from a File object. After loading the document, the code creates a cursor at its beginning. Moving the cursor a few times takes it to the nested name element. Once there, the getText method retrieves the element's value.
This is just an introduction to XML cursors. For more information about using cursors, see Navigating XML with Cursors.
Where to Go Next
- XMLBeans provides intuitive ways to handle XML, particularly if you're starting with schema. If you're accessing XML that's based on a schema, you'll probably find it most efficient to access the XML through generated types specific to the schema. To do this, you begin by compiling the schema to generate interfaces. For more information on using XMLBeans types generated by compiling schema, see Java Types Generated From User-Derived Schema Types and Methods for Types Generated From Schema.
- You might be interested in reading more about the type system on which XMLBeans is based, particularly if you're using types generated from schema. XMLBeans provides a hierarchical system of types that mirror what you find in the XML schema specification itself. If you're working with schema, you might find it helps to understand how these types work. For more information, see XMLBeans Support for Built-In Schema Types and Introduction to Schema Type Signatures.
- XMLBeans provides access to XML through XQuery, which borrows path syntax from XPath. With XQuery, you can specify specific fragments of XML data with or without schema. To learn more about using XQuery and XPath in XMLBeans, see Selecting XML with XQuery and XPath.
- You can use the XmlCursor interface for fine-grained navigation and manipulation of XML. For more information, see Navigating XML with Cursors.



