CS639 Class 15

Back to XPath

XPath is used in many XML-handling tools. XSLT uses it to find nodes to work on—see pg. 808 for examples. We aren’t covering XSLT, however.

Another example of XPath use: in databases

Databases can now handle XML along with their ordinary relational data. A very useful way is by defining an XML datatype, so that a single column value can hold a whole XML document. Then a certain order, for example, can have a row in the orders table with relational columns order_id, qty, price, etc. , and also (say in column “req”) the XML document that came in from the web to make the order.

Oracle has good XML support in this way (since v 9.2 I think). Once an XML document is held in a column value, it can be queried by the Oracle SQL function

extractValue(tab.col, ‘xpathexpr’)

where the xpathexpr selects a certain element or attribute node. There are other related functions as well.

For example the following could get the qty from the XML in the req column of orders:

select extractValue(o.req, ‘/order/qty’) from orders o

where order_id = 100;

This kind of extraction of information from the XML can be done to populate relational columns in the database from the incoming XML, while also preserving that original XML.

OK, end of sales pitch for XPath…

Look at weather.xml in Chap. 16, available in $cs639/xpath/weather.xml in the original Latin-1 encoding, and also weather2.xml in UTF-8 encoding.

Recall from class 5:

/weather/report/temperature 2 temperature nodes

/weather/report [locality = “Santa Monica”] -> 1 report.

Localities where wind is NE:

/weather/report[wind/direction = “NE”]/locality

//report[locality = “Block Island”]/@longitude

localities having the same temperature as Santa Monica:

/weather/report/[temperature = /weather/report[locality=”Santa Monica”]/temperature]/locality

All temperature nodes under report at any level--

/report//temperature

All elements: //*

All attributes //@*

All attributes named units: //@units

All textnodes: //text()

All textnodes of temperature elements: //temperature/text()

We can do all these queries with TestXPath, also in $cs639/xpath.

What about XML with a namespace?

Note that weather.xml has no namespace. If we add a namespace to it, even a default namespace, almost none of these XPath queries work. If a query has an element name in it, as most do, it must be prefixed, for example, if w: is set up as a prefix for weather.xml-with-NS, we can put

/w:weather/w:report/w:temperature to find the 2 temperature nodes

//w:report[w:locality = “Block Island”]/@longitude (attribute longitude needs no prefix)

And similarly for the other queries.

Since //* has no element names, it will work OK without setting up a prefix.

Example in Harold, pg. 771: //stk:Price/@currency

Where stk: was set up for the queries.

Note that there’s no place to put the prefix->URI specification on the XPath. That is, XPath syntax has no way to specify a prefixàURI binding. That is left up to the software environment. This problem is alluded to on pg. 768 of Harold.

Oracle case: just use an optional third argument for extractValue:

select extractValue(o.req, ‘/ord:order/ord:qty’,’xmlns:ord=”http:/…”’) from orders o

where order_id = 100;

The third argument sets up ord: as the prefix for the xpath expression. Multiple prefixes can be set up by separating them by spaces in the string.

XSLT case: example from edankert.com:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="//edx:cd" xmlns:edx="http://www.edankert.com/examples/">

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

XPath Support in DOM

So far we’ve only covered XPath on no-namespace XML. In JDK DOM, we need to provide a NamespaceContext object to provide the prefix mapping. See http://www.edankert.com/defaultnamespaces.html for the handout example and explanation.

Recall that DOM parsing has namespace-aware false by default.

Correction to last class:

You can call getLocalName() and getPrefix() and getNamespaceURI() to get the individual parts of the name, if the parsing was done with namespace-aware. Surprisingly, getLocalName() returns null if namespace-aware was not enabled for DOM parsing.

As Harold says, we really should run the DOM parser with namespace-aware.

TestXPath.java: “secretly” uses DOM, in the sense that we never see the parser config, so we can’t control its namespace-awareness. Apparently because it uses DOM from XPath support, it does enable namespace-aware, in spite of DOM’s supposed default namespace-unaware.

We can tell that TextXPath.java did have namespace-aware parsing, because we can get local names and URIs from Nodes returned from say //* from a namespace-using doc (we have to edit it slightly to see this).

But we can’t use TestXPath to search for certain elements by name, a huge handicap, in any document with a namespace. We can try—

Try //section for book3.xml (with default NS)-à no results

Try //* for book3.xml, see all the nodes

We can control namespace-awareness by doing the parsing to a Document in our program and then using XPath support. SeeTestXPathIgnoreNS.java. But turning off namespace-aware only helps in the default-NS case.

The real solution is supporting a prefix for the XPath query itself. See XPathExample (handout).

Sets up a NamespaceContext object for “edx”à URI, puts it in the XPath object, then does .evaluate with

“//edx:cd” XPath query.

This allows cd matches against catalog1.xml, which itself is using e: prefixes.

It is clear how to generalize this to multiple known NS’s for a given XML doc.

Super TestXPath for namespace-XML

How about a tool, like TestXPath, but lets the user query docs with arbitrary namespaces?

How can we build a tool that finds all the NS’s and gives them prefixes for the query to be input?

We can use SAX or DOM to find the NS’s:

SAX has a callback for each NS prefix mapping. See Harold’s subsection on Receiving Namespace Mappings, pg. 291-294.

DOM: Although XPath won’t work for matching element names until we have the NS’s under control, it can be used to find the NS-defining attributes using the wildcard-NS query:

//namespace::*

Here is the old TestXPath being used to find all the namespaces:

java TestXPath catalog1.xml

Enter XPath expression:

//namespace::*

Using file catalog1.xml, xpath = //namespace::*

Report of node.toString's and paths of all hits:

xmlns:e="http://www.edankert.com/examples/" /e:catalog/@xmlns:e

com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTMdefaultNamespaceDeclarationNode@10bc49d/e:catalog/@xmlns:xml

String value of first hit:

http://www.edankert.com/examples/

Once we have the namespace URIs, we can set up a NamespaceContext object, and list prefixes to the user to use with the XPath expression and do XPath queries for the user even involving multiple NSs.

Good to know that we can do XPath regardless of namespace use, since it is a powerful tool that can be a shortcut for many real projects.

Another use for Prefix-to-URI mappings: understanding attribute values with prefixes in them

The SAX coverage of NS prefix mappings covers another advanced topic involving NSs. Some uses of XML have NS prefixes showing up in attribute values or (unusually) in element content. In fact, we’re familiar with one such use, schemas. We often see, for example:

<xsd:element name=”foo” type = “xsd:decimal”/>

Here “xsd:decimal” is an attribute value using a NS prefix.

How can the parsing application find out what NS is being used here, i.e., it’s real name, the URI?

For this, we need the prefix to URI mapping very much like what the NamespaceContext provides.

SAX has a helper class that is more careful about making this mapping tree-position-dependent. If a NS is only valid in a certain subtree of the XML, it will only show up in the mapping when the parser is working on that subtree.

Upcoming REST Project

REST in Practice

First 3 chapters: important points

Web tolerance of change, p.4—just live with 404s

Thinking in Resources, pg 4-7, URIs, URLs, etc

Resources and their representations: XML, JSON, text, etc., p. 8-9

REST background, p. 12

Idea of hypermedia, p. 13

Loose coupling, pg. 16

Richardson’s levels: pa2 is level 1, Chap 4 is level 2

(Richardson &Ruby are authors of first important book on REST, RESTful Web Services)

Restbuck’s shop basics pg. 22-24, 28

Can skip Chap. 3, mostly on how not to do things

Except pg. 38: idea that we use GET only to get info, not change anything in the server, and def. of idempotent, operation that can be repeated without changing the result.

Also worth reading: Chap.11

Are Web Services evil?

SOAP, WSDL

After midterm: tackle Chap. 4