CS639 Class 15

DOM Exception page 486 -> RunTimeException.

This exception should be a checked exception but actually is a RunTime Exception, so IDEs don't remind us to put a catch clause in, but we should anyway!

SAX vs DOM

- They take the same forms of input!

- SAX is lean & mean (all event handlers)

- SAX can handle huge documents (greater than memory size)

- If you only need a little bits of a sizable tree, consider SAX.

- It is easy to find all the nodes of a given tagname with DOM (as well as SAX)

- DOM can edit old and compose new XML, but good to validate it before serializing it.

- DOM has XPath support, a huge advantage—more on this today

Back to XPath

XPath is used in many XML-handling tools. XSLT uses it to find nodes to work on—see pg. 808 for examples. We aren’t covering XSLT, however.

Another example of XPath use: in databases

Databases can now handle XML along with their ordinary relational data. A very useful way is by defining an XML datatype, so that a single column value can hold a whole XML document. Then a certain order, for example, can have a row in the orders table with relational columns order_id, qty, price, etc. , and also (say in column “req”) the XML document that came in from the web to make the order.

Oracle has good XML support in this way (since v 9.2 I think). Once an XML document is held in a column value, it can be queried by the Oracle SQL function

extractValue(tab.col, ‘xpathexpr’)

where the xpathexpr selects a certain element or attribute node. There are other related functions as well.

For example the following could get the qty from the XML in the req column of orders:

select extractValue(o.req, ‘/order/qty’) from orders o

where order_id = 100;

This kind of extraction of information from the XML can be done to populate relational columns in the database from the incoming XML, while also preserving that original XML.

OK, end of sales pitch for XPath…

Look at weather.xml in Chap. 16, available in $cs639/xpath/weather.xml in the original Latin-1 encoding, and also weather2.xml in UTF-8 encoding.

Recall from class 5:

/weather/report/temperature 2 temperature nodes

/weather/report [locality = “Santa Monica”] -> 1 report.

Localities where wind is NE:

/weather/report[wind/direction = “NE”]/locality

//report[locality = “Block Island”]/@longitude

localities having the same temperature as Santa Monica:

/weather/report/[temperature = /weather/report[locality=”Santa Monica”]/temperature]/locality

All temperature nodes under report at any level--

/report//temperature

All elements: //*

All attributes //@*

All attributes named units: //@units

All textnodes: //text()

All textnodes of temperature elements: //temperature/text()

We can do all these queries with TestXPath, also in $cs639/xpath.

What about XPath on XML with a namespace?

Note that weather.xml has no namespace. If we add a namespace to it, even a default namespace, almost none of these XPath queries work. If a query has an element name in it, as most do, it must be prefixed, for example, if w: is set up as a prefix for weather.xml-with-NS, we can put

/w:weather/w:report/w:temperature to find the 2 temperature nodes

//w:report[w:locality = “Block Island”]/@longitude (attribute longitude needs no prefix)

And similarly for the other queries.

Since //* has no element names, it will work OK without setting up a prefix.

Example in Harold, pg. 771: //stk:Price/@currency

Where stk: was set up for the queries.

Note that there’s no place to put the prefix->URI specification in the XPath expression. That is, XPath syntax has no way to specify a prefixàURI binding. That is left up to the software environment. This problem is alluded to on pg. 768 of Harold.

Oracle case: just use an optional third argument for extractValue:

select extractValue(o.req, ‘/ord:order/ord:qty’,’xmlns:ord=”http:/…”’) from orders o

where order_id = 100;

The third argument sets up ord: as the prefix for the xpath expression. Multiple prefixes can be set up by separating them by spaces in the string.

XSLT case: example from edankert.com: Sinc XSLT is XML, the prefixes for XPath are declared the usual way, here xmlns:edx=http://www.edankert.com/examples/:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="//edx:cd" xmlns:edx="http://www.edankert.com/examples/">

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

XPath Support in DOM

So far we’ve only covered XPath in Java on no-namespace XML. In JDK DOM, we need to provide a NamespaceContext object to provide the prefix mapping. See http://www.edankert.com/defaultnamespaces.html for the handout example and explanation.

Recall that DOM parsing has namespace-aware false by default.

As Harold says, we really should run the DOM parser with namespace-aware, but we backed off on this for XPath handling to avoid dealing with prefixes in XPath expressions (like //x:method) and how to give them meaning.

TestXPath.java: “secretly” uses DOM, in the sense that we never see the parser config, so we can’t control its namespace-awareness. Apparently because it uses DOM from XPath support, it does enable namespace-aware, in spite of DOM’s supposed default namespace-unaware.

But we can’t use TestXPath to search for certain elements by name, a huge handicap, in any document with a namespace. We can try—

Try //section for book3.xml (with default NS)-à no results

Try //* for book3.xml, see all the nodes

We can control namespace-awareness by doing the parsing to a Document in our program and then using XPath support. SeeTestXPathIgnoreNS.java. But turning off namespace-aware only helps in the default-NS case.

The real solution is supporting a prefix for the XPath query itself. See XPathExample (handout).

Sets up a NamespaceContext object for “edx”à URI, puts it in the XPath object, then does .evaluate with

“//edx:cd” XPath query.

This allows cd matches against catalog1.xml, which itself is using e: prefixes.

It is clear how to generalize this to multiple known NS’s for a given XML doc.

Super TestXPath for namespace-XML

How about a tool, like TestXPath, but lets the user query docs with arbitrary namespaces?

How can we build a tool that finds all the NS’s and gives them prefixes for the query to be input?

We can use SAX or DOM to find the NS’s:

SAX has a callback for each NS prefix mapping. See Harold’s subsection on Receiving Namespace Mappings, pg. 291-294.

DOM: Although XPath won’t work for matching element names until we have the NS’s under control, it can be used to find the NS-defining attributes using the wildcard-NS query:

//namespace::*

Here is the old TestXPath being used to find all the namespaces:

java TestXPath catalog1.xml

Enter XPath expression:

//namespace::*

Using file catalog1.xml, xpath = //namespace::*

Report of node.toString's and paths of all hits:

xmlns:e="http://www.edankert.com/examples/" /e:catalog/@xmlns:e

com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTMdefaultNamespaceDeclarationNode@10bc49d/e:catalog/@xmlns:xml

String value of first hit:

http://www.edankert.com/examples/

Once we have the namespace URIs, we can set up a NamespaceContext object, and list prefixes to the user to use with the XPath expression and do XPath queries for the user even involving multiple NSs.

Good to know that we can do XPath regardless of namespace use, since it is a powerful tool that can be a shortcut for many real projects.

Another use for Prefix-to-URI mappings: understanding attribute values with prefixes in them

The SAX coverage of NS prefix mappings covers another advanced topic involving NSs. Some uses of XML have NS prefixes showing up in attribute values or (unusually) in element content. In fact, we’re familiar with one such use, schemas. We often see, for example:

<xsd:element name=”foo” type = “xsd:decimal”/>

Here “xsd:decimal” is an attribute value using a NS prefix.

How can the parsing application find out what NS is being used here, i.e., it’s real name, the URI?

For this, we need the prefix to URI mapping very much like what the NamespaceContext provides.

Advanced Topic: XML tree position-dependent prefix->URI mappings: skip if you want

SAX has a helper class (pg. 292) that is more careful about making this mapping tree-position-dependent. If a NS is only valid in a certain subtree of the XML, it will only show up in the mapping when the parser is working on that subtree. (Often we don’t have to be this careful in practice, and can just use a global prefixàURI mapping.)

Note that the code on pg. 293 of the printed book is incomplete. We need to turn off “needNewContext” at startElement, since the tree position changes then. This is corrected in the online book. Don’t worry about actually using this—just remember it’s possible.

Consider Harold’s example. pg 294. He’s trying to find the NS URI for x: in say

<b:MyElement type = “x:MyType”>

Similarly, we could have <b:MyElement>z:loc</MyElement>, and need NamespaceContext handling to find the URI for z:.

As Harold points out, if he was looking for b:’s URI, he wouldn’t have to go to all this trouble. He could just use the namespaceURI argument of startElement.

If he was looking for y:’s URI in

<b:MyElement y:foo = “something”>

He could use Attributes’s getIndex(“y:foo”) to get the index and then getURI(index).

End of Advanced Topic

Upcoming REST Project: next time, midterm on following Thurs.

REST in Practice

First 3 chapters: important points

Web tolerance of change, p.4—just live with 404s

Thinking in Resources, pg 4-7, URIs, URLs, etc

Resources and their representations: XML, JSON, text, etc., p. 8-9

REST background, p. 12

Idea of hypermedia, p. 13

Loose coupling, pg. 16

Richardson’s levels: pa2 is level 1, Chap 4 is level 2

(Richardson &Ruby are authors of first important book on REST, RESTful Web Services)

Restbuck’s shop basics pg. 22-24, 28

Can skip Chap. 3, mostly on how not to do things

Except pg. 38: idea that we use GET only to get info, not change anything in the server, and def. of idempotent, operation that can be repeated without changing the result.

Also worth reading: Chap.11

Are Web Services evil?

SOAP, WSDL

After midterm: tackle Chap. 4