CS639 Class 10: handout on servlet1

PA2, servlet1, servlet2 are available, Pa1b solution is coming soon,

Need to install tomcat at home, see links near pa2 on the class web page

Many cases to deal with: XML + DTD or not + schema or not + namespace or not—

No Namespaces: our previous coverage:

XML + DTD

XML + XSD (with no targetNS)

XML+ DTD+XSD: uncommon

With Namespaces:

XML + NS

XML + NS +XSD (with targetNS = NS)

XML+NS+DTD: will look at case of SVG, the graphics language.

However, note that a NS doesn’t by itself have a NS document the way the DTD and XSD have docs. It’s just an identifier URI attached to XML to disambiguate names.

XML + NS case: only the XML is a document. The NS is just a construct, holding all the prefixed local names, plus element names if it’s being used as a default namespaces.

In the XML+NS+XSD case, the XSD has the NS URI as its target namespace, so the XSD serves as a document for the NS, along with its job to express the XML’s structure. Thus the NS URI bridges between the XML doc and the XSD doc.

Well-known schemas are found by parsers at well-known places

Parsers need help finding application schemas. Need to cover global attributes for this.

Global Attributes: These can be attached to elements of other namespaces. Look at XLink example, pg. 31. xlink:type and xlink:href are global attributes, belonging to the “http://.../xlink” namespace but added to an element of the “http:/.../Address” namespace.

<ShipTo xmlns="http://ns.cafeconleche.org/Address/"

xmlns:xlink="http://www.w3.org/1999/xlink"

xlink:type="simple" xlink:href="mailto:chezfred@yahoo.com"> <--global attributes

<GiftRecipient>Samuel Johnson</GiftRecipient>

<Street>271 Old Homestead Way</Street >

<City>Woonsocket</City> <State>RI</State> <Zip>02895</Zip>

</ShipTo>

This xlink construct sounds like an important technology (and has a Wikipedia page), but in fact it isn’t in general use in applications of XML. An exception is SVG, an important graphics language in XML, with a tutorial at http://www.w3schools.com/svg/default.asp. SVG stands for Scalable Vector Graphics. One of their examples is downloaded to sample.svg in our class web page, and you can see that Chrome (but not Internet Explorer) can display it and show a curved text. Here is its XML:

$cs639/sample.svg

xmlns:xlink="http://www.w3.org/1999/xlink">

<defs>

</defs>

</text>

</svg>

You can see the xlink:href=”#path1” attribute of <textPath>. Here #path1 is a URL which consists solely of a “fragment”. The fragment part of a URL is at the end, and locates something inside a resource. Here the URL is pointing to another element in the current XML document, the one with id=”path1”. We also note that SVG doesn’t use xlink:type=”simple” as seen in the book example.

Now it turns out that SVG does not have an XML Schema, but instead uses a DTD for validation. That brings up the strange case of using a namespace along with a DTD, even though DTDs know nothing of namespaces.

We can figure out what the DTD must look like, something like this for <textPath>, to require it to have an attribute named xlink:href:

<!ELEMENT textPath (#PCDATA)>

<!ATTLIST textPath

xlink:href CDATA #REQUIRED>

and the svg element additionally needs an attribute of name xmlns:xlink, and a specific value:

<!ATTLIST svg xmlns:xlink CDATA #FIXED “http://www.w3.org/1999/xlink">

This is pretty odd, and means that in this case the xlink prefix is not a dummy variable, but instead a fixed-in-stone string.

Going back to the address.xml example:

If there is a schema for the Address NS, it must give permission for the “extra” attributes, or validation will fail. The simplest way is to allow any attribute: See address.xsd in $cs639/validate-ns for this:

<?xml version="1.0"?>

<xsd:schema targetNamespace="http://ns.cafeconleche.org/Address/"

xmlns="http://schemas.cs.umb.edu/book"

xmlns:xsd="http://www.w3.org/2001/XMLSchema"

elementFormDefault="qualified">

<xsd:element name="ShipTo">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="GiftRecipient" type="xsd:string” minOccurs="0" maxOccurs="unbounded"/>

<xsd:element name="Street" type="xsd:string"/>

<xsd:element name="City" type="xsd:string"/>

<xsd:element name="State" type="xsd:string"/>

<xsd:element name="Zip" type="xsd:string"/>

</xsd:sequence>

<xsd:anyAttribute/> <---add to schema to allow global attributes (or any others)

</xsd:complexType>

</xsd:element>

</xsd:schema>

With the help of the validation service at http://www.w3.org/2001/03/webdata/xsv, with its somewhat more useful error messages, I got this working.

You also must provide the parser with access to the schema that establishes the attribute as a global attribute, namely, the XLink schema. We tell this important fact to the parser by linkage or other means (here linkage).

address.xml in $cs639/validate-ns:

<ShipTo xmlns="http://ns.cafeconleche.org/Address/"

xmlns:xlink="http://www.w3.org/1999/xlink"

xsi:schemaLocation="http://www.w3.org/1999/xlink http://www.w3.org/1999/xlink.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xlink:href="mailto:chezfred@yahoo.com">

<GiftRecipient>Samuel Johnson</GiftRecipient>

<Street>271 Old Homestead Way</Street >

<City>Woonsocket</City> <State>RI</State> <Zip>02895</Zip>

</ShipTo>

java sax.Counter -s -v -schema address.xsd address.xml

address.xml: 598 ms (6 elems, 2 attrs, 0 spaces, 79 chars)

To restrict the attribute to a particular namespace, add namespace = “…” to snyAttribute, for example in address1.xsd:

…

<xsd:anyAttribute namespace = http://www.w3.org/1999/xlink/>

To test it at http://www.w3.org/2001/03/webdata/xsv: enter the URLs for these files in the form:

http://www.cs.umb.edu/cs639/validate-ns/address.xml http://www.cs.umb.edu/cs639/validate-ns/address.xsd

Linkage to Schema from XML: done with global attributes

Recall book2.xml, with its linkage to schema. This is done with the help of the XMLInstance namespace, another standard namespace, and its global attribute noNamespaceSchemaLocation:

<?xml version="1.0" encoding="ISO-8859-1"?>

<book xsi:noNamespaceSchemaLocation="book.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<title>Data on the Web</title> ...

This is an example of a global attribute (noNamespaceSchemaLocation) that doesn’t need to be in the schema, since it’s part of the infrastructure.

Here the xsi prefix for the XML instance NS is set up with the xmlns:xsi=”URI_of_XMLInstance”. By XML instance we mean the XML document itself, rather than the schema. The XML document needs to point to its schema, which it does with the help of the XSI namespace.

Obviously we need a different construct for the XSD linkage for the case of NS+XSD

· Without namespace: xsi:noNamespaceSchemaLocation with value “URL”

· With namespace: xsi:schemaLocation with value “URI URL”, two URL-syntax strings separated by whitespace, the first for the namespace URI and the second for the schema’s URL.

Example of second form: address.xml above.

Intro to pa2--set up servlet to serve out PA1 XML. Use the request URL to specify the Java file to process (the REST way).

Ex: http://users2.cs.umb.edu:xxxxx/pa2/examples.sort.Sortex.xml to get XML describing SortEx.

Then write a client that uses XPath to print out methods.
Write a second client that uses SAX directly.

Deploying Servlets: servlet1, servlet2

The servlet1 is in $cs639/servlet1, etc.

To try it out:

cd cs639

mkdir servlet1

cp –r $cs639/servlet1/* servlet1

cd servlet1

ant build

ant deploy -> copies to your webapps dir in Tomcat.

ant test1

Looking at servlet1 Handout:

web.xml specifies how the request URI is mapped to the servlet’s .class file, so tomcat knows what to load into its JVM.

This is done by match-up on servlet-name content (here “HelloWorld”) between the servlet and servlet-mapping elements.

So here the URL-ending “/servlet/HelloWorld” specifies “cs639.xml.servlet.HelloWorld”, understood to be under WEB-INF/classes in the webapp’s deployed area.

i.e.,$CATALINA_HOME/webapps/servlet1/WEB-INF/classes/cs639/xml/servlet/HelloWorld.class.

The full URL that matches, for my tomcat, is http://users2.cs.umb.edu:11600/servlet1/servlet/HelloWorld

Here we see host, port, webapp name, and finally /servlet/HelloWorld, which is handled by servlet1’s web.xml.

In fact, longer URLs (with query strings or ;xxx or both) would also match, such as http://users2.cs.umb.edu:11600/servlet1/servlet/HelloWorld?x=10

The x=10 would never be processed by this servlet, however.

--->We looked at the book example, pg. 146. A servlet producing XML, useful for pa2. Much like servlet2.

We looked at web.xml from servlet1 and servlet2 as an example of an XML doc with both a namespace and a schema associated with it.

Related example from XML Primer: allow an element to have any XHTML elements and attributes:

<any namespace="http://www.w3.org/1999/xhtml"

minOccurs="1" maxOccurs="unbounded"

processContents="skip"/>

</sequence>

</complexType>

</element>

See examples htmlExample.xml/xsd in $cs639/validate-ns. Note that the <anyAttribute> here is not needed, since there are no global attributes in the XHTML schema.

Another example of a standard namespace is the SOAP Namespace, specifically the SOAP Envelope Namespace

This is the request studied above with the standard SOAP envelope around it, from pg. 97:

This shows an example with one default NS and one non-default NS in use.

<?xml …?>

<SOAP-ENV:Envelope

xmlns:SOAP-ENV=”http://schemas.xmlsoap.org/soap/envelope/”> ßSOAP-ENV is a prefix

<SOAP-ENV:Body>

</getQuote>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

Here we see local names Envelope and Body of the SOAP envelope NS.

We looked into how the SOAP envelope schema can handle the message part inside, which is designed by the app developers.

It uses <any ...>. This schema is in Appendix B, pp. 969-972. The <any> element is on pg. 971, inside schema element name=”Body”.

We have not yet shown how to validate XML using both the SOAP envelope schema and the app schema for the contents of Body. Need to import one schema into another. That’s a more advanced topic.

Also note there is another SOAP schema in Appendix B, for “encoding”. We will never use this encoding technique. It is now obsolete. Skip pp. 104-end of chapter 2.

We are skipping now to Chapter 5, Reading XML. We should return to the multiple schema case later sometime.

Reading XML in Java programs

We will see that reading XML in a Java program (or other programming language) is greatly aided by an XML Parser.

XML Parsers, the quick summary: SAX, DOM are the most important APIs. Only other one we need from Chap. 5 is JAXP. StAX is newly important, but not in Ch. 5.

All these are in the current JDK.

Chapter 5: some discussion of which SAX and DOM parser distribution to choose, ending with conclusion that Xerces is best.

-- Luckily, that’s the one in the current JDK, as predicted on pg. 228.

--So we don’t have to agonize over this, or obtain additional jars, just use the JDK

Starting from the basic idea of reading XML…

In order to read XML we must be able to accept files in UTF-8 encoding and turn them into the Unicode that Java uses.

This is accomplished with the following code snippet from page 215 of our text, also in servlet2’s EchoHtml.java:

Reader reader = new InputStreamReader( in, "UTF-8" );

Here in is a InputStream, i.e., a byte stream

InputStreamReader is a bridge class between Unicode and UTF-8, (as OutputStreamWriter on the output side). It knows how to decode UTF-8 into UTF-16 for Java Strings.

Suppose we were trying to read the XML from page 211-212 of the text, to obtain just the number in the middle:

<?xml version="1.0"?>

<param>

</param>

</params>

</methodResponse>

We could parse the XML "by hand" to extract the single number, 28657, as is done by the code on page 215, which looks for the string "<value><double>" This is very fragile and will break down if there is whitespace between the start tags.

The code on page 215 is ugly and not fully-internationalized code. It would break on supplementary (2-char) Unicode characters.

A quick fix to support internationalization is to use String instead of char, as is done in the servlet2 code:

Reader reader = new InputStreamReader(in, “UTF-8”); // as on pg. 215

BufferedReader in = new BufferedReader(reader);

String line;

while ((line = in.readLine()) != null) // read into String, not by bytes

sb.append(line); // the end-of-line is not preserved, but we don’t need it

// servlet2’s code uses println of PrintWriter to keep eol’s

However the code following this part on pg. 215 is still dependent on an exact “<value><double>” match. It uses String’s indexOf to search for “<value><double>” in the string, and also for “</double></value> and thus traps the location of the desired number.

Note: the following is not covered in class

We could improve this code using the Scanner class which is new to Java 5. Scanners work something like the scanf in C. they parse the text looking for items specified by regular expressions. you can construct a scanner from almost anything that provides a sequence of chars. The following allows whitespace between <value> and <double>, etc.

Scanner s = new Scanner( in, "UTF-8" ); // in is an InputStream, as above.

s.findWithinHorizon( "<value>\\s*<double>(\\d*)</double>\\s*</value>", 0 );

// (\d) is a "group", like %d in scanf

MatchResult result = s.match( );

if ( result.groupCount( ) == 1) {

String value = result.group( 0 );

} else {

...

}

where "<value>\\s*<double>(\\d*)</double>\\s*</value>" is a regular expression, really <value>\s*<double>(\d*)</double>\s*</value>, but we had to escape each \ in the String. The parentheses of (\d*) make it a group (of digits) and \s* matches 0 or more whitespace chars

End of skipped-in-class example

But hand parsing is not the way to go. We want a parser that knows XML to do the work for us.