Selecting XML with XQuery and XPath

You can use XPath and XQuery to retrieve specific pieces of XML as you might retrieve data from a database. XQuery and XPath provide a syntax for specifying which elements and attributes you're interested in. The XMLBeans API provides two methods for executing XQuery and XPath expressions, and two ways to use them. The methods are selectPath for XPath and execQuery for XQuery.

You can call them from and XmlObject instance (or a generated type inheriting from it) or an XmlCursor instance. As noted below, each of the four methods works slightly differently; be sure to keep these differences in mind when choosing your approach.

Note: Both XQuery and complex XPath expressions require additional classes on the class path, as noted in the sections that follow. Also, be sure to see the XMLBeans installation instructions.

Using XPath with the selectPath Method

You can execute XPath expressions use the selectPath method. When you use XPath with the selectPath method, the value returned is view of values from the current document — not a copy of those values. In other words, changes your code makes to XML returned by the selectPath method change the XML in the document queried against. In contrast, with XQuery executed using the execQuery method, the value returned is a copy of values in the XML queried against.

Note that XPath itself does not provide syntax for declaring prefix to URI bindings. For user convenience, we allow XQuery syntax to be used for such purposes. You can consult the latest XQuery draft when using syntax for declaring namespaces.

Note: By default, XMLBeans supports only very simple XPath expressions. To execute complex expressions — such as those with predicates, function calls, and the like — you will need xbean_xpath.jar on your class path. This JAR is among those created when you build XMLBeans from source.

Calling XmlObject.selectPath

When called from XmlObject (or a type that inherits from it), the selectPath method returns an array of objects. If the expression is executed against types generated from schema, then the type for the returned array is one of the Java types corresponding to the schema, and you can cast it accordingly.

For example, imagine you have the following XML containing employee information. You've compiled the schema describing this XML and the types generated from schema are available to your code.

<xq:employees xmlns:xq="http://xmlbeans.apache.org/samples/xquery/employees">
    <xq:employee>
        <xq:name>Fred Jones</xq:name>
        <xq:address location="home">
            <xq:street>900 Aurora Ave.</xq:street>
            <xq:city>Seattle</xq:city>
            <xq:state>WA</xq:state>
            <xq:zip>98115</xq:zip>
        </xq:address>
        <xq:address location="work">
            <xq:street>2011 152nd Avenue NE</xq:street>
            <xq:city>Redmond</xq:city>
            <xq:state>WA</xq:state>
            <xq:zip>98052</xq:zip>
        </xq:address>
        <xq:phone location="work">(425)555-5665</xq:phone>
        <xq:phone location="home">(206)555-5555</xq:phone>
        <xq:phone location="mobile">(206)555-4321</xq:phone>
    </xq:employee>
</xq:employees>
If you wanted to find the phone numbers whose area code was 206, you could capture the XPath expression in this way:
String queryExpression =
    "declare namespace xq='http://xmlbeans.apache.org/samples/xquery/employees';" +
    "$this/xq:employees/xq:employee/xq:phone[contains(., '(206)')]";

Notice in the query expression that the variable $this represents the current context node (the XmlObject that you are querying from). In this example you are querying from the document level XmlObject.

You could then print the results with code such as the following:

// Retrieve the matching phone elements and assign the results to the corresponding
// generated type.
PhoneType[] phones = (PhoneType[])empDoc.selectPath(queryExpression);

// Loop through the results, printing the value of the phone element.
for (int i = 0; i < phones.length; i++)
{
    System.out.println(phones[i].stringValue());
}

Calling XmlCursor.selectPath

When called from an XmlCursor instance, the selectPath method retrieves a list of selections, or locations in the XML. The selections are remembered by the cursor instance. You can use methods such as toNextSelection to navigate among them.

The selectPath method takes an XPath expression. If the expression returns any results, each of those results is added as a selection to the cursor's list of selections. You can move through these selections in the way you might use java.util.Iterator methods to move through a collection.

For example, for a path such as $this/employees/employee, the cursor instance from which you called selectPath would include a selection for each employee element found by the expression. Note that the variable $this is always bound to the current context node, which in this example is the document. After calling the selectPath method, you would use various "selection"-related methods to work with the results. These methods include:

The following example shows how you might use selectPath, in combination with the push and pop methods, to maneuver through XML, retrieving specific values.

public void printZipsAndWorkPhones(XmlObject xml)
{
    // Declare the namespace that will be used.
    String xqNamespace =
        "declare namespace xq='http://xmlbeans.apache.org/samples/xquery/employees';";

    // Insert a cursor and move it to the first element.
    XmlCursor cursor = xml.newCursor();
    cursor.toFirstChild();

    // Save the cursor's current location by pushing it
    // onto a stack of saved locations.
    cursor.push();
    // Query for zip elements.
    cursor.selectPath(xqNamespace + "$this//xq:zip");

    // Loop through the list of selections, getting the value of
    // each element.
    while (cursor.toNextSelection())
    {
        System.out.println(cursor.getTextValue());
    }
    // Pop the saved location off the stack.
    cursor.pop();
    // Query again from the top, this time for work phone numbers.
    cursor.selectPath(xqNamespace + "$this//xq:phone[@location='work']");

    // Move the cursor to the first selection, then print that element's
    // value.
    cursor.toNextSelection();
    System.out.println(cursor.getTextValue());
    // Dispose of the cursor.
    cursor.dispose();
}

Using selections is somewhat like tracking the locations of multiple cursors with a single cursor. This becomes especially clear when you remove the XML associated with a selection. When you do so the selection itself remains at the location where the removed XML was, but now the selection's location is immediately before the XML that was after the XML you removed. In other words, removing XML created a kind of vacuum that was filled by the XML after it, which shifted up into the space — up into position immediately after the selection location. This is exactly the same as if the selection had been another cursor.

Finally, when using selections keep in mind that the list of selections is in a sense "live". The cursor you're working with is keeping track of the selections in the list. In other words, be sure to call the clearSelections method when you're finished with the selections, just as you should call the XmlCursor.dispose() method when you're finished using the cursor.

Using XQuery with the execQuery Method

You use the execQuery method to execute XQuery expressions. With XQuery expressions, XML returned is a copy of XML in the document queried against. In other words, changes your code makes to the values returned by execQuery are not reflected in the document queried against.

Note: To execute XQuery expressions, you must have the Saxon 8.1.1 JAR on your class path. Look for the download at the Saxon web site. This JAR is also included in the lib directory when you build XMLBeans from source.

Calling XmlObject.execQuery

As with selectPath, calling execQuery from an XmlObject instance will return an XmlObject array.

The following example retrieves work <zip> elements from the incoming XML, adding the elements as children to a new <zip-list> element.

public boolean collectZips(XmlObject empDoc)
{
    String namespaceDeclaration = 
        "declare namespace xq='http://xmlbeans.apache.org/samples/xquery/employees';";
    // The query is designed to return results, so return
    // true if it does.
    boolean hasResults = false;

    // The expression: Get the <zip> elements and return them as children 
    // of a new <zip-list> element.
    String queryExpression =
        "let $e := $this/xq:employees " +
        "return " +
        "<zip-list> " +
            "{for $z in $e/xq:employee/xq:address/xq:zip " +
            "return $z} " +
        "</zip-list>";

    // Execute the query. Results will be copies of the XML queried against,
    // stored as members of an XmlObject array.
    XmlObject[] results = 
        empDoc.execQuery(namespaceDeclaration + queryExpression);

    // Print the results.
    if (results.length > 0)
    {
        hasResults = true;
        System.out.println("The query results: \n");
        System.out.println(results[0].toString() + "\n");
    }
    return hasResults;
}

Calling XmlCursor.execQuery

Unlike the selectPath method called from a cursor, the execQuery method doesn't return void. Instead it returns an XmlCursor instance positioned at the beginning of a new XML document representing the query results. Rather than accessing results as selections, you use the cursor to move through the results in typical cursor fashion (for more information, see Navigating XML with Cursors). The models are very different.

As always, you can cast the results to a type generated from schema if you know that the results conform to that type.

The following example retrieves work <phone> elements from the incoming XML, then changes the number in the results.

public boolean updateWorkPhone(XmlObject empDoc)
{
    boolean hasResults = false;
      
    // A cursor instance to query with.
    XmlCursor empCursor = empDoc.newCursor();
    empCursor.toNextToken();

    // The expression: Get the  elements with  elements whose
    // value is "WA".
    String queryExpression =
        "for $e in $this/xq:employees/xq:employee " +
        "let $s := $e/xq:address/xq:state " +
        "where $s = 'WA' " +
        "return $e//xq:phone[@location='work']";

    // Execute the query. Results, if any, will be available at 
    // the position of the resultCursor in a new XML document.
    XmlCursor resultCursor = 
        empCursor.execQuery(namespaceDeclaration + queryExpression);
    
    System.out.println("The query results, element copies made " +
		"from the received document: \n");
    System.out.println(resultCursor.getObject().toString() + "\n");
        
    // If there are results, the results will be children of the fragment root
    // where the new cursor is positioned. This statement tests for children
    // and moves the cursor if to the first if it exists.
    if (resultCursor.toFirstChild())
    {
        hasResults = true;
        // Use the cursor to loop through the results, printing each sibling
        // element returned by the query.
        int i = 0;
	    do 
	    {
	        // Change the phone numbers.
            XmlCursor editCursor = resultCursor.newCursor();
	        editCursor.toLastAttribute();
	        editCursor.toNextToken();
	        editCursor.removeXml();
	        editCursor.insertChars("(206)555-1234");
	    } while (resultCursor.toNextSibling());

	    resultCursor.toStartDoc();
	    System.out.println("The query results after changes: \n");
	    System.out.println(resultCursor.getObject().toString() + "\n");

    	System.out.println("The received document -- note that it is unchanged. " +
            "Changes were made to the copy created by the execQuery method. \n");
    	System.out.println(empDoc + "\n");
    }
    return hasResults;
}

Related Topics

Getting Started with XMLBeans