CS639 Class 10: handout
on servlet1
PA2, servlet1, servlet2 are available, Pa1b
solution is coming soon,
Need to install tomcat at home, see links near
pa2 on the class web page
Many cases to deal with: XML + DTD or not + schema or not + namespace
or not—
No Namespaces: our previous
coverage:
XML + DTD
XML + XSD (with no targetNS)
XML+ DTD+XSD: uncommon
With Namespaces:
XML + NS
XML + NS +XSD (with targetNS
= NS)
XML+NS+DTD: will look at case of SVG, the
graphics language.
However, note that a
NS doesn’t by itself have a NS document the way the DTD and XSD have docs. It’s just an identifier URI attached to XML
to disambiguate names.
XML + NS case: only
the XML is a document. The NS is just a
construct, holding all the prefixed local names, plus element names if it’s
being used as a default namespaces.
In the XML+NS+XSD
case, the XSD has the NS URI as its target namespace, so the XSD serves as a
document for the NS, along with its job to express the XML’s structure. Thus
the NS URI bridges between the XML doc and the XSD doc.
Well-known schemas
are found by parsers at well-known places
Parsers need help
finding application schemas. Need to
cover global attributes for this.
Global Attributes: These can be attached to elements of other namespaces. Look at XLink example, pg. 31. xlink:type and xlink:href are global attributes, belonging to the “http://.../xlink” namespace but added to an element of the “http:/.../Address” namespace.
<ShipTo xmlns="http://ns.cafeconleche.org/Address/"
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="mailto:chezfred@yahoo.com"> <--global attributes
<GiftRecipient>Samuel
Johnson</GiftRecipient>
<Street>271 Old Homestead
Way</Street >
<City>Woonsocket</City>
<State>RI</State> <Zip>02895</Zip>
</ShipTo>
This xlink construct sounds like an important technology (and has a Wikipedia page), but in fact it isn’t in general use in applications of XML. An exception is SVG, an important graphics language in XML, with a tutorial at http://www.w3schools.com/svg/default.asp. SVG stands for Scalable Vector Graphics. One of their examples is downloaded to sample.svg in our class web page, and you can see that Chrome (but not Internet Explorer) can display it and show a curved text. Here is its XML:
$cs639/sample.svg
<svg xmlns="http://www.w3.org/2000/svg"
version="1.1"
xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<path
id="path1" d="M75,20 a1,1 0 0,0
100,0" />
</defs>
<text
x="10" y="100" style="fill:red;">
<textPath xlink:href="#path1">I
love SVG I love SVG</textPath>
</text>
</svg>
You can see the xlink:href=”#path1” attribute of <textPath>. Here #path1 is a URL which consists solely of a “fragment”.
The fragment part of a URL is at the end, and locates something inside a
resource. Here the URL is pointing to
another element in the current XML document, the one with id=”path1”. We also
note that SVG doesn’t use
xlink:type=”simple” as
seen in the book example.
Now it turns out that SVG does not have an XML Schema, but instead uses a DTD for validation. That brings up the strange case of using a namespace along with a DTD, even though DTDs know nothing of namespaces.
We can figure out what the DTD must look like, something like this for <textPath>, to require it to have an attribute named xlink:href:
<!ELEMENT textPath
(#PCDATA)>
<!ATTLIST textPath
xlink:href CDATA
#REQUIRED>
and the svg element additionally needs an attribute of name xmlns:xlink, and a specific value:
<!ATTLIST svg
xmlns:xlink CDATA #FIXED “http://www.w3.org/1999/xlink">
This is pretty odd, and means that in this case the xlink prefix is not a dummy variable, but instead a fixed-in-stone string.
Going back to the address.xml example:
If there is a schema for the Address NS, it must give permission for the “extra” attributes, or validation will fail. The simplest way is to allow any attribute: See address.xsd in $cs639/validate-ns for this:
<?xml
version="1.0"?>
<xsd:schema targetNamespace="http://ns.cafeconleche.org/Address/"
xmlns="http://schemas.cs.umb.edu/book"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xsd:element name="ShipTo">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="GiftRecipient"
type="xsd:string” minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="Street" type="xsd:string"/>
<xsd:element name="City" type="xsd:string"/>
<xsd:element name="State" type="xsd:string"/>
<xsd:element name="Zip" type="xsd:string"/>
</xsd:sequence>
<xsd:anyAttribute/> <---add to schema to allow global
attributes (or any others)
</xsd:complexType>
</xsd:element>
</xsd:schema>
With the help of the validation service at http://www.w3.org/2001/03/webdata/xsv, with its somewhat more useful error messages, I got this working.
You also must provide the parser with access to the schema that establishes the attribute as a global attribute, namely, the XLink schema. We tell this important fact to the parser by linkage or other means (here linkage).
address.xml in
$cs639/validate-ns:
<ShipTo xmlns="http://ns.cafeconleche.org/Address/"
xmlns:xlink="http://www.w3.org/1999/xlink"
xsi:schemaLocation="http://www.w3.org/1999/xlink
http://www.w3.org/1999/xlink.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xlink:href="mailto:chezfred@yahoo.com"> <!-- removed xlink:type here -->
<GiftRecipient>Samuel Johnson</GiftRecipient>
<Street>271 Old Homestead Way</Street >
<City>Woonsocket</City> <State>RI</State>
<Zip>02895</Zip>
</ShipTo>
java sax.Counter -s -v -schema address.xsd address.xml
address.xml: 598 ms (6 elems, 2 attrs, 0 spaces, 79
chars)
To restrict the attribute to a
particular namespace, add namespace = “…” to snyAttribute,
for example in address1.xsd:
…
<xsd:anyAttribute namespace = http://www.w3.org/1999/xlink/>
To test it at http://www.w3.org/2001/03/webdata/xsv: enter the URLs for these files in the form:
http://www.cs.umb.edu/cs639/validate-ns/address.xml
http://www.cs.umb.edu/cs639/validate-ns/address.xsd
Linkage to Schema from XML: done with global attributes
Recall book2.xml, with its linkage to schema. This is done with the help of the XMLInstance namespace, another standard namespace, and its global attribute noNamespaceSchemaLocation:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<book xsi:noNamespaceSchemaLocation="book.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<title>Data on the Web</title> ...
This is an example of a global attribute (noNamespaceSchemaLocation) that doesn’t need to be in the schema, since it’s part of the infrastructure.
Here the xsi prefix for the XML instance NS is set up with the xmlns:xsi=”URI_of_XMLInstance”. By XML instance we mean the XML document itself, rather than the schema. The XML document needs to point to its schema, which it does with the help of the XSI namespace.
Obviously we need a different construct for the XSD linkage for the case of NS+XSD
· Without namespace: xsi:noNamespaceSchemaLocation with value “URL”
· With namespace: xsi:schemaLocation with value “URI URL”, two URL-syntax strings separated by whitespace, the first for the namespace URI and the second for the schema’s URL.
Example of second form:
address.xml above.
Intro to pa2--set up servlet to serve out PA1 XML. Use the request URL to specify the Java file to process (the REST way).
Ex: http://users2.cs.umb.edu:xxxxx/pa2/examples.sort.Sortex.xml
to get XML describing SortEx.
Then write a client that uses XPath to print out
methods.
Write a second client that uses SAX directly.
Deploying Servlets: servlet1, servlet2
The servlet1 is in $cs639/servlet1, etc.
To try it out:
cd cs639
mkdir servlet1
cp –r $cs639/servlet1/* servlet1
cd servlet1
ant build
ant deploy -> copies to your webapps
dir in Tomcat.
ant test1
Looking at servlet1 Handout:
web.xml specifies how the request URI is mapped to the servlet’s .class file, so tomcat knows what to load into its JVM.
This is done by match-up on servlet-name content (here “HelloWorld”) between the servlet and servlet-mapping elements.
So here the URL-ending “/servlet/HelloWorld” specifies “cs639.xml.servlet.HelloWorld”, understood to be under WEB-INF/classes in the webapp’s deployed area.
i.e.,$CATALINA_HOME/webapps/servlet1/WEB-INF/classes/cs639/xml/servlet/HelloWorld.class.
The full URL that matches, for my tomcat, is http://users2.cs.umb.edu:11600/servlet1/servlet/HelloWorld
Here we see host, port, webapp name, and finally /servlet/HelloWorld, which is handled by servlet1’s web.xml.
In fact, longer URLs (with query strings or ;xxx or both) would also match, such as http://users2.cs.umb.edu:11600/servlet1/servlet/HelloWorld?x=10
The x=10 would never be processed by this servlet, however.
--->We looked at the book example, pg. 146. A servlet producing XML, useful for pa2. Much like servlet2.
We looked at web.xml from servlet1 and servlet2 as an example of an XML doc with both a namespace and a schema associated with it.
Related example from XML Primer: allow an element to have any XHTML elements and attributes:
<element
name="htmlExample">
<complexType>
<sequence>
<any
namespace="http://www.w3.org/1999/xhtml"
minOccurs="1" maxOccurs="unbounded"
processContents="skip"/>
</sequence>
<anyAttribute
namespace="http://www.w3.org/1999/xhtml"/>
</complexType>
</element>
See examples htmlExample.xml/xsd
in $cs639/validate-ns. Note that the <anyAttribute>
here is not needed, since there are no global attributes in the XHTML schema.
Another example of a standard namespace is the SOAP Namespace, specifically the SOAP Envelope Namespace
This is the request studied above with the standard SOAP envelope around it, from pg. 97:
This shows an example with one default NS and one non-default NS in use.
<?xml …?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/”> ßSOAP-ENV
is a prefix
<SOAP-ENV:Body>
<getQuote xmlns=”http://namespaces.cafeconleche.org/xmljava/ch2/”>
<symbol>RHAT</symbol>
</getQuote>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Here we see local names Envelope and Body of the SOAP envelope NS.
We looked into how the SOAP envelope schema can handle the message part inside, which is designed by the app developers.
It uses <any ...>. This schema is in Appendix B, pp. 969-972. The <any> element is on pg. 971, inside schema element name=”Body”.
We have not yet shown how to validate XML using both the SOAP envelope schema and the app schema for the contents of Body. Need to import one schema into another. That’s a more advanced topic.
Also note there is another SOAP schema in Appendix B, for “encoding”. We will never use this encoding technique. It is now obsolete. Skip pp. 104-end of chapter 2.
We are skipping now
to Chapter 5, Reading XML. We should return to the multiple schema
case later sometime.
We will see that reading XML in a Java program (or other programming language) is greatly aided by an XML Parser.
XML Parsers, the quick summary: SAX, DOM are the most important APIs. Only other one we need from Chap. 5 is JAXP. StAX is newly important, but not in Ch. 5.
All these are in the current JDK.
Chapter 5: some discussion of which SAX and DOM parser distribution to choose, ending with conclusion that Xerces is best.
-- Luckily, that’s the one in the current JDK, as predicted on pg. 228.
--So we don’t have to agonize over this, or obtain additional jars, just use the JDK
Starting from the basic idea of reading XML…
In order to read XML we must be able to accept files in UTF-8 encoding and turn them into the Unicode that Java uses.
This is accomplished with the following code snippet from page 215 of our text, also in servlet2’s EchoHtml.java:
Reader reader = new InputStreamReader( in, "UTF-8" );
Here in is a InputStream, i.e., a byte stream
InputStreamReader is a bridge class between Unicode and UTF-8, (as OutputStreamWriter on the output side). It knows how to decode UTF-8 into UTF-16 for Java Strings.
Suppose we were trying to read the XML from page 211-212 of the text, to obtain just the number in the middle:
<?xml
version="1.0"?>
<methodResponse>
<params>
<param>
<value><double>28657</double></value>
</param>
</params>
</methodResponse>
We could parse the XML "by hand" to extract the single number, 28657, as is done by the code on page 215, which looks for the string "<value><double>" This is very fragile and will break down if there is whitespace between the start tags.
The code on page 215 is ugly and not fully-internationalized code. It would break on supplementary (2-char) Unicode characters.
A quick fix to support internationalization is to use String instead of char, as is done in the servlet2 code:
Reader reader
= new InputStreamReader(in, “UTF-8”); // as
on pg. 215
BufferedReader in = new BufferedReader(reader);
String line;
while
((line = in.readLine()) != null) // read into String, not by bytes
sb.append(line); // the end-of-line is not
preserved, but we don’t need it
// servlet2’s code
uses println of PrintWriter
to keep eol’s
However the code following this part on pg. 215 is still dependent on an exact “<value><double>” match. It uses String’s indexOf to search for “<value><double>” in the string, and also for “</double></value> and thus traps the location of the desired number.
Note: the following is not covered in class
We could improve this code using the Scanner class which is new to Java 5. Scanners work something like the scanf in C. they parse the text looking for items specified by regular expressions. you can construct a scanner from almost anything that provides a sequence of chars. The following allows whitespace between <value> and <double>, etc.
Scanner s = new Scanner( in, "UTF-8"
); // in is an InputStream, as above.
s.findWithinHorizon( "<value>\\s*<double>(\\d*)</double>\\s*</value>", 0
);
//
(\d) is a "group", like %d in scanf
MatchResult result = s.match( );
if ( result.groupCount( ) ==
1) {
String value = result.group( 0 );
} else {
...
}
where "<value>\\s*<double>(\\d*)</double>\\s*</value>" is a regular expression, really <value>\s*<double>(\d*)</double>\s*</value>, but we had to escape each \ in the String. The parentheses of (\d*) make it a group (of digits) and \s* matches 0 or more whitespace chars
End of skipped-in-class example
But hand parsing is not the way to go. We want a parser that knows XML to do the work for us.