CS639 Class 13

 

Midterm Mar. 27

There will be no more assignments due before the midterm

Note Midterm reading guide linked to class web page.

Handout on DOM and namespaces

 

After Midterm: REST Web Services

Read Webber et al, Chap 1-3: background, we’ve covered the underlying Web technology already.

Then start studying Chap 4, the CRUD service for coffee orders, first real example.  I’ll supply a Java project for this.

 

Intro to DOM  programming

Back to Chapter 5 for intro program. (Chapter 5 intros both SAX and DOM)

 

Example 5.5, Pg 235-237 -> First DOM Example, see tree of objects in use, similarity to TestXPath code.

NOTE: This program uses class DOMImplementationImpl (not to be confused with DOMImplementation), also XMLSerializer, not in JDK.

Telltale sign of non-JDKness: look at the imports of “org.apache.xerces.parsers”, vs on page 239 “javax.xml.parsers”.

 

So skip to pg. 239 for the second version, which does work under JDK.  The Transformer part is further explained in pg. 484 as trivial “transformation”, i.e., no real change.

 

With DOM we can build XML representation in terms of objects as well as read XML files. And use XPath with the object tree.

 

Example 5.6 on page 239 is program that uses JAXP, which is in the JDK. Here is the relevant code section:

 

            try {      

                  // Build the request document

                        DocumentBuilderFactory builderFactory

                              = DocumentBuilderFactory.newInstance();

                        DocumentBuilder builder

                              = builderFactory.newDocumentBuilder();

                        Document request = builder.newDocument();

                  ...

            }

 

Steps required to use DOM

 

            1.         get a factory object (builderFactory  = DocumentBuilderFactory.newInstance())

 

            2.         get a document builder object (builder = builderFactory.newDocumentBuilder())

 

            3.         get a document object (builder.newDocument())

 

(Also, using code from pg. 496, we can use the builder object to get a DOMImplementation object, discussed later in this class)

 

 

+          here is the tree depicting the XML request message on page 139

 

                  <methodCall>

                  /                       \

       <methodName>     <params>

              |                                |

     [calculateFibonacci]    <param>

                                                  |

                                             <value>

                                                  |

                                             <int>

                                                 |

                                              [23]

 

 

           

 

To create this in DOM we need to create a Document object, then create an Element object, which originally stands by itself. We then attach this Element object to the Document object by appending it as a child.

 

We keep coming back to the Document object to create node objects which are then appended to their parent object. In this way we build up the tree

Note that Element ISA Node and Text ISA Node and so forth.

 

 Pg 241: uses XSL support to serialize the DOM into text XML to be sent out on the network.

--specifically an “identity transform” – same info on both sides, DOM vs textXML.

We’re just using XSLT here for its capability of writing XML.

 

But now can avoid this obscure construction by using DOM3 “load-save” module:  from $cs639/dom/FibonacciEx.java (like Example 5.6 except outputs request XML to System.out rather than doing any XML-RPC)

 

                  // Use DOM3 "Load-Save" module to serialize DOM tree--

                  DOMImplementation impl = request.getImplementation();

                  DOMImplementationLS domLS =

                        (DOMImplementationLS) impl.getFeature("LS", "3.0");

                  LSSerializer serializer = domLS.createLSSerializer();

                  LSOutput outLS = domLS.createLSOutput();

                  outLS.setByteStream(System.out);

                  serializer.write(request, outLS); // or use methodCall instead of request, for same result

 

For this XML RPC program, simply use outLS.setByteStream(out), where out is from the connection, as at top of pg. 241.

Although this approach is more straightforward, not calling on the powerful and (to us) mysterious XSLT, it isn’t perfect.  It mishandles comments and PIs that should show before the root element, by placing them after the root element, i.e., at the end of the document.  So perhaps we should stick with the older approach using XSLT.

 

Reading XML with DOM

Reading the response, ie, the text XML coming in from the network, is shown on pg. 241, at the end of Example 5.6’s code.

 

builder.parse(in)

 

returns a DOM representation of the response.  (We could use an LSParser from the DOM3 Load-Save module here.)

 

Here is the node tree for the response document: this will show up under the Document node after parsing.  The program needs to extract the “55” out of the text node at the bottom.

 

(Actually there isn't any whitespace around <double>, but this picture gives the general idea.)

 

Pasted Graphic

 

 

The DOM representation is held in variable “response”, of type Document.

 

We could navigate down the tree to the text node: first use response.getDocumentElement() to get the methodResponse element at the root, then getFirstChild to get its params child, and so on down the tree, and finally getFirstChild on the double element to get the text node with value 55.

 

There’s another way, however, as shown in the program: find the “double” element by a search:

 

            response.getElementsByTagName(“double”)

 

searches DOM tree for double elements.  Gets back a NodeList—API, pg. 898.

Then need first child of first node, the text node with the answer. (There is only one text node under the “double” element)

 

Note that this program ignores the whole concept of namespaces. It assumes that all you need to specify an element name is the local name.  That’s because it has roots in DOM1, which didn’t support namespaces.  So this first example works for XML without namespaces.  We need to go on to understand how to use DOM3, which has namespace support.  The various methods that work with element and attribute names will have forms that take additional namespace arguments.

 

Namespace Support in SAX and DOM

 

SAX: namespaces are enabled by default: for details, see pp 328-329, where we find that startElement gets the namespaceURI and localName for the element, but the qualifiedName might require an additional feature turned on.

DOM: namespaces are not fully enabled by default: see pg. 463, so we need to turn on the namespace-aware feature to get proper support

 

Both SAX and DOM (with NS-aware set) can figure out what the NS is for each element as they parse and let us know—

SAX: at startElement call

DOM: by Node calls, pg.  904, see Node’s getNameSpaceURI() and getPrefix()

 

Note that the NS URI returned by DOM Node’s getNameSpaceURI()  is only non-null for Element and Attr Nodes and for Attr, only if there is an explicit prefix on the attribute in the XML (this is as expected, since attributes with no prefix are in no namespace).

 

So a newly parsed document has good namespace information.  But the DOM does not take responsibility for noticing if you move an Element around in the tree and end up giving it an inappropriate NS.  It just sets the namespace info for a Node when it creates the Node by parsing or by Document’s createElement calls and keeps it that way.

We can build a DOM tree with ridiculous NS info in it.  It’s great we can build DOM trees, but we should remember to check them out once built, for example, by validating the resulting documents (more on this later).

 

Chap 10: Just one example to show use of DOM with Namespaces

 

Example 10.5 SimpleSVG:

 

Creates doc with default namespace (bug referred to on pg. 504 has been fixed):

                   

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!--An example from Chapter 10 of Processing XML with Java-->

<?xml-stylesheet type="text/css" href="standard.css"?>

<svg xmlns="http://www.w3.org/2000/svg"><desc>An example from Processing XML with Java</desc></svg>

 

Recall we are not trying to use DTDs and namespaces together, so we drop the svgDOCTYPE creation and just put null in createDocument’s third arg, as done in SimpleSVG1.java.

 

This program shows another way to get started on building a DOM document: get a DOMImplementation object from the builder, and use it to call createDocument, with arguments specifying the root element tagname and namespace. This one call creates both the Document and the Element for the root element, i.e., the minimal valid DOM tree.  Then we get the Element for the root element from the new Document and use it the same way as before in Example 5.6.

 

DOM3 methods using NS: now use doc.createElementNS() with NS URI arg.

 

See SimpleSVG1.java in $cs639/dom and on handout.

 

Handout: like SimpleSVG, but with namespace prefixes. All you need to do is add them on the names. These methods take qualified name arguments.

This is made clear for createElementNS on pg. 897, and createDocument, pg. 900.

 

DOMReader.java: this shows how to read in an XML document with prefixed namespaces and use a NS-aware tagname-search on it, that is, finding all the nodes with a certain localName in a certain NS.

 

Note the two feature settings used here:

·         one to set the “coalescing” feature, which puts CDATA text (after “resolution”) into the Text nodes along with the normal text. More on this later, with CDATA example. This is off by default.  More on this soon.

·         another to set namespace-aware, so DOM will maintain NS info on elements and attributes for us, and match up names based on NS and localName.  This is off by default.

 

These two are both good settings in general for reading into DOMs for data-centric apps.  However, if you want to use XPath on the DOM, you should leave namespace awareness off, unless you want to set up a NamespaceContext object to do the prefix-URI mapping for the names in the XPath query itself.

 

Chapter 9 DOM

 

DOM Modules: can skip, trusting JDK to have the important support

Optional notes

-         note that traversal and range are not in J2SE 5.0-6.0

Leaves 4 modules that are data-related (so we’re interested) & in J2SE5.0  and 6.0 (so we can easily use them)

Core, XML, Events and MutationEvents.

Add to this list Load-Save module, new with DOM3 and in JDK 6.0 and used in FibonacciEx.java discussed above.

 

Pg. 437: hasFeature() method of DOMImplementation interface provides info on features of a specific DOM implementation.  This is an interface with a funny name, since an “implementation” usually means a concrete class. 

 

Example 9.1 Java 6 has “DOMImplementation” as an interface, but no XMLDOMImplementation class, needed by this program.  Note warning sign: “import oracle….”  This class is Oracle-specific.

 

How can we get an object like this, so we can ask about features? From a Document or a DocumentBuilder.

 

See Pg 897 DOM Document API – we can ask a Document for a DOMImplementation. We could reimplement this example (Ex 9.1) and find the features of the software.

 

See pg. 912 JAXP DocumentBuilder API—we can ask a DocumentBuilder for it too.

 

DOMImplementation API: pg. 900  See createDocument, another way to start a Document.

 End of optional notes

 

App-specific DOMS—can skip.

 

Trees – DOM Data Model

-         idea of “Nodes”, similar to XPATH nodes.

-         Note that SAX did not use this terminology and the XML 1.0 spec does not use the term NODE.

-         SAX has events reporting simple strings, etc, no objects that hold stuff. (Well, it does give us the attributes in a special object, but it’s not at all node-like.)

 

DOM tree:

-         the official tree of nodes

-         has various other nodes associated with certain tree nodes: attributes are the main case here

-         disconnected nodes.

 

Page 441: 12 kinds of nodes for DOM

The first 6 listed correspond to XPATH nodes.

The document NODE corresponds to XPATH root Node.

 

For DOM: 12 kinds of Nodes

For XPath: 7 kinds of Nodes.

See pg. 760 for discussion of XPath vs. DOM

 

Coalescing: Pg. 461, setCoalescing, needed for XPath compatibility, and generally useful

Also, setNamespaceAware, pg. 463, so namespace information is kept in Element and (some) Attr nodes.

  

DOM node

Corresp. XPath node

Document node

Root node

Element node

Element node

Text node (includes CDATA text if the coalescing feature is on)

Text node (but includes CDATA, entity text)

Attribute node

Attribute node

PI node (skip)

PI node

Comment node

Comment node

CDATA node (missing if coalescing is on)

--- Absorbed into text node

Document type node

 

Notation node (skip)

 

Document fragment node (skip)

Entity node (skip)

--- Resolved entities are in Text nodes

Entity Reference node (skip)

 

 

Note:  Namespace Nodes are in XPath but not in DOM. In DOM, namespace information for nodes is available via Node methods getNamespaceURI and getPrefix. See pg. 904. This setup is convenient for programmers—DOM has digested the whole document and determined from the various xmlns=... attributes what NS pertains to each element and attribute.

Note: The built-in entity references (&amp;, &lt;, &gt;, &quot; and &apos;) are always expanded during parsing, SAX or DOM. pg. 462.