CS639 Reading

CS639 Reading Guide for Midterm

Note that the midterm is open books, notes, handouts, etc.

Documents to study
Harold’s text
XML 1.0 Spec
XML Bible free Chap. 20 on XML Schemas
Documents referenced in homework.

Text Coverage
Chap. 1: Basic XML--everything except a few things described below.
pg. 20 The rules here are about use of Unicode for the characters in a document. It corresponds to the Char ::= rule in Sec. 2.2 of the XML1.0 spec. Additionally, the CharData rule of Sec. 2.4 says that < and & must be escaped, etc. This is noted for attribute values on pg. 21 of the text but applies also to element content.
pg. 24-27: you can skip Processing Instructions and Entities. The only kind of entities we cover are the built-in entities <, > and &, which work without a DTD. (Also ' and " are in this group.) You should also be aware of character references, which are similar to but not actually entities, like • to represent Unicode 0x2022.
pg. 34: Skip NMTOKEN*, ENTITY*, IDREFS, NOTATION

Pg. 38: Ex. 1.9. We have not covered xsd:simpleContent or xsd:extension yet, or the details of handling attributes in XML Schema. The xsd:element for “Product” is a good example of structure we have studied.
pg. 41. Skip Schematron
pg, 44-53: Just know what XSL and its two parts, XSLT and XSL-FO do. Note that the select="Customer", etc., of XSL use XPath queries to locate data. Note we can use the JDK to do XSL processing. We notes that XSL can do XInclude processing.

Chap. 2
We covered everything to pg. 73, RSS, and you can skip RSS. We will later look at Atom, a more current syndication format.
Then we covered everything from pg. 77 to pg. 82, but then you can skip XML-RPC, and everything to SOAP, pg. 96.
Then we covered SOAP to pg. 99, and then skipped Faults and Encoding Styles and the rest of the chapter. We will never cover SOAP Encoding, as it is now considered obsolete. But the SOAP Envelope is crucial to classic web services. Note the difference between SOAP-ENV: and SOAP-ENC: and skip anything with SOAP-ENC:.

Chap. 3
We have covered everything to pg. 142, a Simple SOAP client, but not that section. Also, the important character set for XML is UTF-8, the encoding of Unicode in 8-bit bytes, using multiple bytes for non-ASCII characters. ISO-8859-1, aka Latin-1, is the default for HTML, and is not recommended (although tolerated) for XML.
Then we covered the section on Servlets, pp. 145-148. Note that we always use a web.xml with our servlets, so don’t worry about servlets running without a web.xml (discussed on p. 148.)

Chap. 5
We covered all the material on SAX, DOM and JAXP here. Note that Example 5.5 does not work with just Java 6, because of the Apache packages needed. However, Example 5.6 does work with Java 6, and is almost the same in terms of DOM use, so it is the one we're covering.

Chap. 6 SAX
We covered everything except Receiving Processing Instructions, Receiving Skipped Entities, and Receiving Locators.
Note SAX APIs starting on pg. 869.

Chap. 7 SAX, continued.
We covered everything except EntityResolver, lexical handler (pg. 339), declaration handler (pg. 343). The only Xerces custom features we covered were for validation. We didn't cover the DTDHandler interface, but its ability to provide default attribute values and attribute type is of note.

Chap. 9 DOM
We are concentrating on DOM2, although DOM3 (in its scaled-down present form) is available in Java 6. Note that Example 9.1 depends on an Oracle package. We can get a DOMImplementation from a Document object.--see pg. 897.
You can skip Application-Specific DOMs
pg. 441 on--Ignore Notation nodes, entity nodes, entity reference nodes (built-in entities are resolved for us, so not there in the DOM)
Various parsers--we are only covering Xerces, since it is in Java 6, via JAXP, so you can skip from "Parsing Documents with a DOM Parser" to "JAXP DocumentBuilder and DocumentBuilderFactory", the way we'll do it.
You can skip "DOM3 Load and Save" it’s obsolete: see the supplied $cs639/FibonacciEx.java to see saving of the DOM tree using current calls.
Note DOM APIs in Appendix.
Read to pg. 478, up to but not including Modifying the Tree. Then read pp. 489-490.

Chap. 10 DOM, continued

We looked at Example 10.5 to see how to use a default namespace, and, with a modification of the code, to use namespace prefixes too. Use the program as written for the default namespace case, or drop DocumentType creation and use null for the third argument of createDocument. See $cs639/dom/SimpleSVG.java. Replace “svg” in the createDocument call with “s:svg” and “desc” in the createElementNS call with “s:desc” to add prefixes ($cs639/dom/SimpleSGG1.java) The bug described on pg. 504 has been fixed for namespaces, so the output will have <svg xmlns=”http://www.w3.org/2000/svg”> in the default namespace case (fix the first line of pg. 504), and <s:svg s:xmlns=”http://www.w3.org/2000/svg”> in the prefixed variation.

New: Note the appendix on the DOM API, pp. 891-908. Fix Table A.1: Attr’s parent is null, not Element—seen pp 894-895, need to use getOwnerElement() to get the enclosing Element. Add public String getTextContent() to Node’s API on pg. 905. It returns the concatenated text content of this node and its descendents (not including comments), especially useful for Element nodes, but also provides contents of text nodes and comment nodes.

Chap 16 XPath
We covered the intro part, to pg. 759, i.e., to "Location Paths", but no further.