CS639 – class 1

 

Syllabus

Home PC

JEE Eclipse – IDE      

JDK for Java 6.

ant – related to the command line (see linked setup doc from the class webpage)

 

You will be delivering on UNIX/Linux. You can do tests over here using ant on our systems.

 

For Thursday you should run apply in S/3/158.  I’ll set up a Google group and invite you via your local email. 

 

greeting.xml (the HelloWorld of XML)

 

<?xml version=”1.0”?>

<greeting>  ß start tag                  |

Hello, XML! ß content of this element    |

            ß character data             |  all this is a XML fragment, and the greeting element

</greeting> ß end tag                    |

 

“greeting” is the tag name, chosen by the designer

 

 

Page 9 Example (a fragment):

 

------

<Product>

      <Name> Birdsong clock </Name>             // this is an element, child of Product element

      <Quantity> 12 </Quantity>

      <SKU>244</SKU>

      <Price currency = “USD”> 21.95 </Price>  //element

      ...   // see text

      ...

</Product>

------

 

Between the 2 ----- there is another element.

 

<Price currency=“USD”> start tag. Price. currency is an attribute of the price element.

 “USD” is the attribute value for the currency attribute

Note that the price value is in the element content, and the attribute says what the units mean, a common XML usage pattern.

 

This looks like class Product with fields Name, Quantity etc.

 

But not all Elements look like that! We can have multiple elements of the same name---

 

<Customer>

      <Name> Joe </Name>

<Phone> 617 – 999 – 9999 </Phone>

<Phone> 781 – 999 – 9999 </Phone>

</Customer>

 

Well-formed-ness: strictly nested elements, quotes around attribute values, etc.

 

Foo.xml – is well-formed? You just have to browse to it.  All the browsers know how to display well-formed XML.

 

Validity

You have an XML document and also another document which tells you how this xml should be done. We are in transition between 2 ways of describing XML structure.

 

DTDs – old way, still common. Foo.dtd – read in chapter 1 about this.

 

The newer way (XML Schema) is in XML itself. (DTD is not xml). We will mainly use XML schema.

To validate, means to run a program on the  XML file and its DTD or schema. JDK has XML support but not programs. So someone hase to write a main program to call the various methods in the JDK.  We have such a program, pointed to by Harold's XML Bible, called Counter.

 

Reading :          - chapter 1 of text to pg 50

                        - linked chap 20 of XML Bible to “complex types”

                        - explains how to do Counter.java (from apache) to validate XML.

In your .cshrc, be sure to set an environment variable "cs639" with value /data/htdocs/cs639, so you can see the class web page directory from UNIX.

Using the JDK validation support via the "Counter" program.

The UNIX directory $cs639/validate has the Counter.java program, in  $cs639/validate/sax/Counter.java

All we need is this main program and the JDK. We can just do "javac Counter.java" to generate Counter.class

We run Counter from the validate directory, because Counter.java has a "package sax" statement.

1. Well-formedness checking

Here's valid_greeting.xml, the basic example XML file:

  

<?xml version="1.0"?>
    <greeting>
        Hello XML!
    </greeting>

 

    cd $cs639/validate

    java sax.Counter valid_greeting.xml               checks for well-formedness


I'm assuming no CLASSPATH defined (the default classpath is . – current directory).  Check for an old CLASSPATH setting with "env|grep CLASS" and clear it out of your .cshrc if you find it. We will be using ant to set up our classpath for Java builds on UNIX as well as command-line builds on Windows.

2. DTD validation.  Look at valid_greeting1.xml to see DOCTYPE pointing to greeting.dtd, saying "that's my DTD"

    <?xml version="1.0"?>
    <!DOCTYPE greeting SYSTEM "greeting.dtd">
    <greeting>
        Hello XML!
    </greeting>

greeting.dtd  is simply:
    <!ELEMENT greeting (#PCDATA)>

 

This is saying that there is an element of tagname “greeting”, with text contents. PCDATA stands for “parseable char data”, which means text without < or > chars that would be markup for child elements (and parsed that way). If you want to put in a < char, you need to quote it with &lt; so it isn’t taken to be the start of a child element.  More on this later.

java sax.Counter -v valid_greeting1.xml             checks for DTD validity

// you use –s –v for schema validation

// if we use only –v is for DTD validation. The default validation is DTD.

 

XML Schema validation:

Here is the schema, in greeting.xsd:

 

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="greeting" type="xsd:string"/>
</xsd:schema>

 

Lots of boilerplate, and XML format.. See an element named "xsd:element". That's describing the greeting element: it has name "greeting" and type "xsd:string".

 

Here is the XML document with linkage to the schema, in valid_greeting2.xml

 

<?xml version="1.0"?>
<greeting xsi:noNamespaceSchemaLocation="greeting.xsd"                <--extra attribute stuck in greeting's start tag to link to schema
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
Hello XML!
</greeting>

java sax.Counter -v -s valid_greeting2.xml             checks for XML Schema validity

 

If you want to see the XML support classes in the JDK in use here---

 

java -verbose sax.counter –s –v valid_greeting.xml       // shows all the classes loaded