CS639 – class 04

CS639 – class 4

Join the google group! See invitation in your cs.umb.edu email or wherever you forward it.

HW1 due, have set up forum for pa1a, also added more files to dir, including sample output for Scan1. Q’s on pa1a?

First steps for pa1a: (like readme.txt in pa1a)

transfer pa1a dir or pa1a.zip to your PC, say to c:\cs639\pa1a
“ant build” as first check, “ant test00” runs DumpMethods on Grid
In eclipse, create a Java project, and choose the option "Create project

from existing source", and browse to your directory.  Make sure you are using Java 1.6.

(You may need to fix the “Default output folder” from pa1a/bin to pa1a/build/classes.)

Also--

Check that the directory structure is right: build/classes/…
In project’s Properties>Java Build Path>Source, you should see Source folders “pa1a/src”, and at the bottom under “Default output folder: “pa1a/build/classes”. In the Package Explorer, see both src and input directories, plus build.xml and README.
As a check, delete the project, keeping files, and recreate it—useful way out of problems.

Note: we are doing this project “the hard way”, from basics, so you can see the nitty-gritty details. You could use the powerful JAXB package to convert Java objects to XML, but then you would have to create objects for each construct, an unnecessary job. Another approach that is more practical is using JDK6’s XMLStreamWriter, part of the StAX API, (“streaming parser” on reading side) which became available in Java since our book was written. You can use it if you want. Several tutorials exist on the web.

We were talking about DTDs and XML Schemas, specifically about book.dtd and book.xsd of the handout on recursive XML.

We went over the various content models in book.dtd. Note that although an element can have a “content model” in a DTD, that description does not have the power to say that there should be a number represented by the CharData, as in <price>10.99</price>.

XML Schema:

- in XML !

- allows us to build up type definition from parts:

ex: Image Type – how it has attribute “source”

Figure Type – has child element of ImageType.

- allows a Section Type defined with child element – Section Type.

- All field-level types in this example are just strings (common).

- But in general, supports useful types for elements, such as <xsd:decimal>, unlike DTDs

Look at book.xsd on handout and see the build-up of types, familiar to programmers.

We can reorder the various type definitions. We can also reorder the lines of the DTD.

Note only one top-level “xsd:element” element under the xsd:schema element, for the root element named book. All the rest fall under this, of various types given by the xsd:complexType elements. Somewhat like a Java class with all its fields, themselves having type definitions.

The type of the book element has no name; it’s an “anonymous type”. We could give it a name if we wanted, with a little more text. But since it shows up only in one place here, it doesn’t need a name.

The schema does not describe exactly the same structures as the DTD. The schema only allows one figure per section, while the DTD allows additional (figure, p) pairs, so a section could have, say, 3 figures as long as each is followed by a p. We could modify the schema to match the DTD this way, but it would require us to use another xsd element type, the <xsd:group> element, to form the (figure, p) group that itself is allowed to repeat.

We drew a tree of elements to show its structure, described by DTD content models and XML Schema complex-type declarations.

Note the “extra” element declaration in book.dtd:

<!ELEMENT c (#PCDATA)>

where there are no <c>’s in book.xml or use of c in other element declarations in the DTD. This is harmless, because a declaration only comes into play if there is a <c> element in the document. There’s no requirement that they all are related, or all used. In fact, the same DTD can be used for documents with different root elements, so the following is valid

<?xml version="1.0"?>
<!DOCTYPE c SYSTEM "book.dtd">
<c> foo </c>

Though this looks stupid, this capability can be used in serious ways. For example, we might have one DTD to describe both requests and response messages, one with root element request and the other with root element response, but common subtrees.

Similarly, in the XML schema, we can add a top-level c element:

<xsd:element name="c" type="xsd:string"/>

and then a simple <c>-rooted document is valid:

<?xml version="1.0" encoding="ISO-8859-1"?>

foo

</c>

Note that the XML schema linkage does not specify the root element name like the DTD does.

book.xsd is an example of XML schema for structured data, no “mixed content”, i.e. no semi-structured data.

Mixed Content: semi-structured data

Mixed content, or “semi structured” data: not in our book.xml. Need to change it a little. Suppose a <p> element could look like this:

<p> <c> The </c> most <c> important </c> … <web> … </web> </p>

A p element can have text between any number of occurrences of <c> and <web> markup. That’s the DTD idea of mixed content, expressed by:

<!ELEMENT p (#PCDATA|c|web)*>

allows c and other web element anywhere, in any number, inside a p element. No elements within elements, however. The star is required if you use the |.

From the standard:

[51]	`Mixed`	::=	`'(' S? '#PCDATA' (S? '\|' S? Name)* S? ')*'`
			`\| '(' S? '#PCDATA' S? ')'`

So we see that the simple <!ELEMENT p (#PCDATA)> is officially “mixed content”, although used all the time for structured data.

XML Schema & Semi-Structured data.

XML schema has a different definition of a mixed content than DTD.

- much more restrictive than DTD;

- can add mixed = “true” to the complexType element. All this means that ordinary text can show up between the elements, otherwise constrained as before (without mixed=”true”).

- Thus we can’t allow markup like arbitrary <c>’s and <web>’s as above

Ex we can do in XML Schema: “form letter” where we want just one name element and one amt element—can handle this easily in XML Schema.

<letter> Dear <name> Joe </name> ,

You have just won <amt> 1000 </amt> dollars.

</letter>

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="letter">

<xsd:complexType mixed=”true”>

<xsd:sequence>

<xsd:element name="name type="xsd:string"/>

<xsd:element name="amt" type="xsd:string" />

</xsd:sequence>

</xsd:complexType>

</xsd:element>

Of course it’s even easier in a DTD:

<!ELEMENT letter (#PCDATA|name|amt)*>

But this does not specify just one name followed by one amt. All you can specify in a mixed-content DTD content spec is which elements can appear, not their order or number of occurrences.

Note that Harold warns against use of mixed content in data-oriented apps, at top of pg. 19. It is very useful for XHTML, and other document-oriented apps.

ANY Good XML Content for an element

We can go further in the direction of free-formed XML content, beyond “mixed”--

Elements can have any well-formed XML as contents in DTD or XML Schema.

Again let us free up the contents of the p elements of book.xml, now to any well-formed XML.

DTD: <!ELEMENT p ANY>

XML Schema: set up type for p elements:

<xsd:complexType name=”PType”>

<xsd:sequence>

<xsd:any processContents = “skip” ß”any” for XML Schema

minOccurs=”0” maxOccurs=”unbounded”/>

</xsd:sequence>

</xsd:complexType >

Now put

<xsd:element name=”p” type=“PType”/> in Section def.

Important extensibility capability: we can package up any XML in a document generally constrained by a vocabulary.

The embedded XML could have its own vocabulary (ex. XHTML). You have to use namespaces to differentiate multiple vocabularies (so we won’t pursue this now.)

- see XML Schema Primer, sec 5.5 is linked under Resources.

Stylesheets: just the idea

XML is used to describe data, not its presentation (to the user).

XML HTML

(holds the data) stylesheet presentation

Stylesheet:

- how to present data;

- CSS simple, too limited (skip)

- XSL – full power of functional programming language = XSLT + XSL–FO

o XSLT is the processor

o XSL stylesheets say what to do

XSLT and its stylesheets can transform XML to other XML also, so it’s not just for UI

Can JDK do XSLT ? Sure….

Start on XPath Basics

Read the first 3 pages of Chap. 16 for now.