CS639- XML and Semi-structured Data

Spring, 2012


Official Description: A study of the international standard eXtended Markup Language (XML) and related semi-structured data technologies for application to Web programming. Special attention will be given to combining data from multiple sites and on-line data bases, and to the transformation, display, and extraction of data from XML documents for data exchange, resource discovery, and the building of interactive web applications. 

More specifically, for 2012: XML parsing, generation, querying, mostly with Java. Web services including RESTful services, implemented with Java.

Professor: Betty O’Neil  (eoneil at cs.umb.edu,)
Class meets  TuTh 7:00-815 in S-3-028 at least temporarily

Office Hours: TuTh 2:45-3:30, 6:15-6:45 in S/3/169

Prerequistes:  
Significant Java experience including use of the Java Collection classes, and one of CS451/651, CS636, or CS437/637. (or other compiler-related or database application experience, or CS420 or CS450 or CS430/630)

Syllabus

Textbooks:
1. Mostly for the first part of the term:

Processing XML with Java, by Elliotte Rusty Harold, Addison0Wesley, ISBN 0-201-77186-1. Available free at http://www.cafeconleche.org/books/xmljava/, but worth paying for in hardcopy (1071 pages!)

2. Mostly for the later part of  the term, but has useful intro topics:

REST in Practice, by Jim Webber, Savas Parastatidis, and Ian Robinson, O'Reilly, ISBN 978-0-596-80582-1 (at Amazon

Recommended books:
XML 1.1 Bible, 3rd ed., by Elliotte Rusty Harold, Wiley, 2004, ISBN 0-7645-4986-3. XML from first priniciples, i.e., more basic than our text. Five chapters are free online: Chap. 15 on XSL  Chap 20 on XML Schemas  all of them

Core Java 2, Volume II--Advanced Features, by Cay Hortstmann and Gary Cornell, Sun/Prentice Hall, ISBN 0-13-111826-9.  This book has a good chapter on JDBC, plus many other useful Java topics.  Get the latest version (J2SE 5.0.)  Volume I is great too, and especially relevant if you are coming from C/C++ because of its little side notes on the differences between Java and C++.

Getting Ready:
Check out your development PC: my old PC, a 3Ghz Pentium 4 Windows XP Professional system with 1.5GB of memory, is fine running the development tools we need. Your system should have at least 1GB of memory, and Windows XP or Windows 7. See software development setup instructions for UNIX and your home PC 

First week (Jan. 23, 25):
Get a UNIX account for cs639 by running apply, even if you already have a UNIX account here. Set up your software environment for UNIX and your home PC as detailed above in Getting Ready.  Read Chap. 1 of the text to pg. 40, plus  Chap 20 of Harold's XML Bible 1.1  to the section heading "Complex Types."

Tues. Jan 24 notes Intro, simple examples of XML, DTD, XML Schemas, validation by Counter.java
Thurs. Jan 26 notes Network basics, HTTP, HTML.
Tues., Jan 31 notes XML basics, handout: Recursive XML Example
Join our new Google group(email): your invitation has been sent to your cs.umb.edu address.
Thurs, Feb. 2 notes XML basics (Chap.1 of Harold) weather example
Tues., Feb. 7 notes XPath (Chap. 16 of Harold), XML in Latin-1 and UTF-8.
Thurs., Feb. 9 notes UTF-8 output from Java, HTTP in Java, HTTP GET, POST, SOAP
Tues., Feb. 14 notes Intros to SOAP, REST, tomcat, servlets (handout). Note new "Notes on schema testing" for pa1b linked below
Thurs., Feb 16 notes Tomcat basics, intro to XML Namespaces
Tues., Feb. 21 notes XML Namespaces and XML Schemas (handout)
Thurs., Feb. 23 notes Intro to pa2, servlet1 (handout), Start Chap. 5, Reading XML
Tues, Feb. 28 notes Reading XML in Java programs, Chap 5, 6
Thurs., Mar.1 notes SAX, Ch 6, 7, intro DOM
Tues, Mar 6 notes DOM programming, DOM with namespaces (handout)
Thurs, Mar.8 notes Finish DOM coverage (Ch. 9, plus Ex 5.6 and Ex. 10.5 and handout)
Tues., Mar. 20 notes XPath on XML with Namespaces (handout), Oracle XML
Thurs., Mar. 22 notes Midterm Review
Midterm Exam Tues, Mar. 27 Reading Practice Midterm Exam  Practice Midterm Solution
Thurs, Mar. 29 notes XML Design, Intro REST
Tues, Apr 3 notes REST: intro to implementation in Java (handout) and Java EE in general
Thurs, Apr 5 notes Java REST clients using Jersey (handout)
Tues., Apr. 10 notes orderService example (handout)
Thurs., Apr 12 notes JAXB (handout)
Tues., Apr 17 notes firstRest2 example, more on orderService, layers (handout)
Thurs., Apr. 19 notes REST services: more advanced features, JAX-RS subresources
Tues, Apr. 24 notes Hypermedia services (handout), vendor-specific media types
Thurs, Apr. 26 notes Using multiple schemas, type extension (handout), start on SOAP Web Services (handout)
Tues., May 1 notes WSDL, Amazon S3 REST/SOAP storage service
Thurs., May 3 notes XML to/from Relational Databases (with XML support)
Tues., May 8 notes Review

Final Exam Thurs, May 17 3pm-6pm, M-2-209 (different from Wiser listing of Fri evening)
New:Practice Final (Solution)

Assignments

hw1due Thurs. Feb. 2, in class, on paper. Web basics, XML Validation, Intro to ant, etc. hw1 Solution
pa1
pa1a due Thurs Feb. 9 provided project (zip), pa1b due Sat., Feb. 18 Notes on schema testing. Turning non-XML Data into XML pa1a Solution
hw2 due Tues., Feb. 28 HTTP, using Tomcat, installing Tomcat on UNIX, servlet hw2 Solution
pa2 due Tues., Mar. 21 (after spring break) Delivering XML by POX servlet. Starting point: pa1b Solution (zip) pa2 Solution (zip)
hw3 due Tues., Apr. 10, Getting Started with REST, JUnit4 firstRest (zip) (hw3 Solution)
orderService (zip) README (edited 4/7), wadl, xsd, web.xml (edited 4/6) eclipse setup: source window (now deprecated: project facets, JAX-RS Capabilities)
pa3 due Mon, Apr. 23 REST Web Service Clients pa3 Solution (zip)--thanks to Max Ward
New:
pa3 Solution using only POJOs (zip)--thanks to Mohamed Kahin and David Lowery

orderDap (zip) wadl, xsd, Link.xsd Hypermedia orderService Project from Chap. 5 
pa4 due Tuesday, May 8 JAX-RS Server-side development or Client to hypermedia service
pa4 is optional due to shortness of time. Just be sure to understand orderDap, or at least its coverage in Chap. 5

Resources

XML: We are using version 1.0 (5th ed.) Standard: W3C Recommendation, referenced in text, pg. 13 Sun XML tutorial
XML Schema: We are using the second edition, with namespace http://www.w3.org/2001/XMLSchema. W3C primer   Chap 20 of Harold's XML Bible 1.1  XML validator (Counter), with examples: without namespaces with namespaces with namespaces
Survey of XML standards
XPath: We are using version 1.0. Standard: W3C Recommendation (v1.0)  XPath processor from JDK (TestXPath)  XPath tutorial with "lab", interactive XPath processor Harold's 2008 article

JAXP: Java 6 has version 1.4, based on the Apache Xerces library  JAXP chapter in J2EE 1.4 tutorial at Sun
Java6 JAXP/DOM/SAX compatibility
SAX: Java 6 has version 2.0.2. SAX chapter in the J2EE 1.4 tutorial at Sun (not in JEE5 tutorial at all, or in Java 6 Tutorials either)
DOM: Java 6 supports Level 3 DOM APIs.  DOM chapter in the J2EE 1.4 tutorial at Sun

HTML, especially links and forms: Read Basic HTML tutorial, specifically through Tables, then tackle  HTML Forms tutorial

URLs: See Ian Graham's tutorial at UToronto for full (absolute) URLs.  Then Web Diner's tutorial for the important idea of relative URLs and their use in HTML.
HTTP: Tutorial,  read through section 3, Sample HTTP Exchange spec (RFC 2616, by Fielding)

Important network tools
wget: command-line tool for HTTP GET, POST  documentation download for PC  To install, unzip and copy *.dll and wget.exe to %CATALINA_HOME%/bin, which is on your Path. wget is available on our UNIX systems too.
tcpmon: shows TCP messages "on the wire" for a certain port. More info. Screenshots

tomcat:  Linux installation info Assigned ports Installing Tomcat on PC
 

Servlets (part of Java EE): We are using version 2.4 servlets (spec), servlet tutorial, esp. the sections "First Servlets", "Processing The Request Form Data" (i.e. the query string from the URL), and "Processing the Request: HTTP Request Headers"  servlet1 example with build.xml  (zip) servlet2 (zip)

Servlet API: Javadoc in JEE API. The corresponding jar is servlet-api.jar, which we use from the tomcat lib area, so you need to install tomcat and set the CATALINA_HOME environment variable before trying to compile a servlet using a supplied b  uild.xml. Also you'll need the TOMCAT_URL environment variable for ant testx targets.

Java: We are using Java 6 or 7, aka Java 1.6:/1.7 API  language spec tutorial  Packages tutorial.

Ant: We are using version 1.7.0. Documentation: for the first tutorial, follow "Developing with Ant", then "Hello World with Ant", for a second tutorial, start over from the Table of Contents, select "Using Ant", then "Writing a Simple Buildfile".

JUnit4: We are using version 4.8, included in eclipse3.6.home at sourceforge.net  (read JUnit Cookbook, look at FAQ) CS636 Intro handout Vogel's JUnit4 in eclipse tutorial

Java EE: API intro list of technologies docs

REST in general: Wikipedia article, Fielding's thesis

JAX-RS version 1.1 (Java API for RESTful Web services, part of Java EE, API linked above): spec in Java EE Tutorial

Jersey, the implementation of JAX-RS we're using: Jersey REST:  API Jersey User Guide  Vogel's REST with Java (JAX-RS) using Jersey - Tutorial  Project without Sec. 5: firstRest (zip), Project with REST service of Sec. 5 as well: firstRest2 (zipform for new Todo list of Todos by GET to /todos

JAXB (part of Java 1.6 SE, API linked above) home overview  spec Tutorials: see Sec. 4 of Vogel's REST with Java (JAX-RS) using Jersey - Tutorial and his JAXB tutorial linked from there.Long tutorial

JAX-WS: for SOAP Web Services spec  implementation