CS639- XML and Semi-structured Data

Spring, 2013


Official Description: A study of the international standard eXtended Markup Language (XML) and related semi-structured data technologies for application to Web programming. Special attention will be given to combining data from multiple sites and on-line data bases, and to the transformation, display, and extraction of data from XML documents for data exchange, resource discovery, and the building of interactive web applications. 

More specifically, for 2012: XML parsing, generation, querying, mostly with Java. Web services including RESTful services, implemented with Java.

Professor: Betty O’Neil  (eoneil at cs.umb.edu,)
Class meets  TuTh 4:00-5:15 in M-2-419 (changed from MW 4:00)

Office Hours: TuTh 2:30-3:30, 6:15-6:45 in S/3/169

Prerequistes:  
Significant Java experience including use of the Java Collection classes, and one of CS451/651, CS636, or CS437/637. (or other compiler-related or database application experience, or CS420 or CS450 or CS430/630)

Syllabus

Textbooks:
1. Mostly for the first part of the term:

Processing XML with Java, by Elliotte Rusty Harold, Addison0Wesley, ISBN 0-201-77186-1. Available free at http://www.cafeconleche.org/books/xmljava/, but worth paying for in hardcopy (1071 pages!)

2. Mostly for the later part of  the term, but has useful intro topics:

REST in Practice, by Jim Webber, Savas Parastatidis, and Ian Robinson, O'Reilly, ISBN 978-0-596-80582-1 (at Amazon

Recommended books:
XML 1.1 Bible, 3rd ed., by Elliotte Rusty Harold, Wiley, 2004, ISBN 0-7645-4986-3. XML from first priniciples, i.e., more basic than our text. Five chapters are free online: Chap. 15 on XSL  Chap 20 on XML Schemas  all of them

Core Java 2, Volume II--Advanced Features, by Cay Hortstmann and Gary Cornell, Sun/Prentice Hall, ISBN 0-13-111826-9.  This book has a good chapter on JDBC, plus many other useful Java topics.  Get the latest version (J2SE 5.0.)  Volume I is great too, and especially relevant if you are coming from C/C++ because of its little side notes on the differences between Java and C++.

Getting Ready:
Check out your development PC: my old PC, a 3Ghz Pentium 4 Windows XP Professional system with 1.5GB of memory, is fine running the development tools we need. Your system should have at least 1GB of memory, and Windows XP or Windows 7 (or 8? untested). See software development setup instructions for UNIX and your home PC 

First week (Jan. 28,30):
Get a UNIX account for cs639 by running apply, even if you already have a UNIX account here. Set up your software environment for UNIX and your home PC as detailed above in Getting Ready.  Read Chap. 1 of the text to pg. 40, plus  Chap 20 of Harold's XML Bible 1.1  to the section heading "Complex Types."

Mon, Jan 28 notes Intro, XML Validation
Class switched to TuTh 4:00 in M-2-419
Thurs, Jan 31 notes Networking Basics
Tues., Feb. 5 notes Reading the XML Standard (linked below)
Thurs, Feb. 7 notes Recursive XML (handout)
Tues., Feb 12 notes XPath (link for current KBOS data (uses XSL for display), also see KBOS.xml in $cs639/xpath)
Thurs., Feb 14 notes Character Encodings, start on Chap 2.
Tues., Feb.19 notes Intro to SOAP, REST, tomcat
Thurs., Feb. 21 notes Servlets, tomcat, XML Namespaces
Tues., Feb. 26 notes Namespaces and XML Schemas (handout)
Thurs., Feb 28 notes Servlets, SVG, Global Attributes, start on reading XML.
Tues., Mar. 5 notes XML Parsers, SAX in particular
Thurs., Mar. 7 notes SAX
Tues., Mar 12 notes DOM (handout)
Thurs, Mar. 14 notes DOM data model (and vs. XPath), validation, etc.
Tues., Mar 26 notes Advanced DOM/XPath: handling prefixes (handout)
Thurs., Mar 28 notes Intro to REST
Tues., Apr. 2 Midterm Review
Thurs., Apr. 4 Midterm Exam Midterm Reading  Practice Midterm Exam  Solution
Tues., Apr. 9 notes firstRest Project (handout)
Thurs., Apr. 11 notes Annotations (handout) (files) firstRest Java clients using Jersey (handout)
Tues., Apr. 16 notes orderService and Chap. 4
Thurs., Apr. 18 notes JAXB (handout)
Tues., Apr. 23 notes GET to Collection URI, hierachical URI handling (handout, handout)
Thurs., Apr. 25 notes orderService software architecture, examples of REST services
Tues., Apr. 30 notes orderDap project
Thurs., May 2 notes orderDap DAP
Tues., May 7 notes SOAP, XML to/from DBs
Thurs., May 9 Student presentations on REST APIs for Twitter, FaceBook, Netflix, and Amazon S3
Tues., May 14 notes Final Review Reading for Final

Final Exam: Thurs., May 23, 6:30-9:30 (we'll try to start at 6:15) in W-1-048
Practice Final  Solution

Assignments

hw1, due Thurs., Feb. 7 in class, on paper , Web basics, XML Validation, Intro to ant, tomcat hw1 solution
pa1: pa1a dueWed., Feb. 13 in your cs639/pa1a directory, pa1b  due Fri, Feb. 22 in cs639/pa1b
pa1a starter project (zipNotes on validation and schemas for pa1b. Solution (zip)
hw2, due Thurs, Feb. 28 in class, on paper.  Install Linux tomcat, XPath, Namespaces. hw2 solution
pa2 due Sunday, Mar. 24  in your cs639/pa2 directory POX Servlets, Reading XML via XPath and SAX Solution (zip)
hw3, due Tues., Apr. 16 Getting Started with REST Web Services hw3 solution
pa3, due Monday, Apr. 29 REST Web Service Clients orderService (zip) (README) (image of Java Build Path Source Tab) Solution(zip)
 New! orderDap (zip), implementation of Chap. 5 Hypermedia Service (you can just read Chap. 5 if you want)

Resources

Using Putty Tunnels to access ports of cs.umb.edu systems.

Last year's CS 639 notes, etc.

XML: We are using version 1.0 (5th ed.) Standard: W3C Recommendation, (the "XML Standard") referenced in text, pg. 13
XML Schema: We are using the second edition, with namespace http://www.w3.org/2001/XMLSchema. W3C primer   Chap 20 of Harold's XML Bible 1.1  XML validator (Counter), with examples: without namespaces with namespaces with namespaces
Survey of XML standards
XPath: We are using version 1.0. Standard: W3C Recommendation (v1.0)  XPath processor from JDK (TestXPath)  XPath tutorial with "lab", interactive XPath processor Harold's 2008 article

JAXP: Java 6 has version 1.4, based on the Apache Xerces library  JAXP chapter in J2EE 1.4 tutorial at Sun
Java6 JAXP/DOM/SAX compatibility
SAX: Java 6 has version 2.0.2. SAX chapter in the J2EE 1.4 tutorial at Sun (not in JEE5 tutorial at all, or in Java 6 Tutorials either)
DOM: Java 6 supports Level 3 DOM APIs.  DOM chapter in the J2EE 1.4 tutorial at Sun

HTML, especially links and forms: Read Basic HTML tutorial, specifically through Tables, then tackle  HTML Forms tutorial

URLs: See Ian Graham's tutorial at UToronto for full (absolute) URLs.  Then Web Diner's tutorial for the important idea of relative URLs and their use in HTML.
HTTP: Tutorial,  read through section 3, Sample HTTP Exchange spec (RFC 2616, by Fielding)

Important network tools
wget: command-line tool for HTTP GET, POST  documentation download for PC  To install, unzip and copy *.dll and wget.exe to %CATALINA_HOME%/bin, which is on your Path. wget is available on our UNIX systems too.
tcpmon: shows TCP messages "on the wire" for a certain port. More info. Screenshots

tomcat:  Linux installation info Assigned ports (New) Installing Tomcat on PC
 

Servlets (part of Java EE): We are using version 2.4 servlets (spec), servlet tutorial, esp. the sections "First Servlets", "Processing The Request Form Data" (i.e. the query string from the URL), and "Processing the Request: HTTP Request Headers"  servlet1 example with build.xml  (zip) servlet2 (zip)

Servlet API: Javadoc in JEE API. The corresponding jar is servlet-api.jar, which we use from the tomcat lib area, so you need to install tomcat and set the CATALINA_HOME environment variable before trying to compile a servlet using a supplied b  uild.xml. Also you'll need the TOMCAT_URL environment variable for ant testx targets.

Java: We are using Java 6 or 7, aka Java 1.6:/1.7 API  language spec tutorial  Packages tutorial.

Ant: We are using version 1.7.0. Documentation: for the first tutorial, follow "Developing with Ant", then "Hello World with Ant", for a second tutorial, start over from the Table of Contents, select "Using Ant", then "Writing a Simple Buildfile".

JUnit4: We are using version 4.8, included in eclipse3.6.home at sourceforge.net  (read JUnit Cookbook, look at FAQ) CS636 Intro handout Vogel's JUnit4 in eclipse tutorial

Java EE: API intro list of technologies docs

REST in general: Wikipedia article, Fielding's thesis

JAX-RS version 1.1 (Java API for RESTful Web services, part of Java EE, API linked above): spec in Java EE Tutorial

Jersey, the implementation of JAX-RS we're using: Jersey REST:  API Jersey User Guide  Vogel's REST with Java (JAX-RS) using Jersey - Tutorial  Project without Sec. 5: firstRest (zip), Project with REST service of Sec. 5 as well: firstRest2 (zipform for new Todo list of Todos by GET to /todos

JAXB (part of Java 1.6 SE, API linked above) home overview  spec Tutorials: see Sec. 4 of Vogel's REST with Java (JAX-RS) using Jersey - Tutorial and his JAXB tutorial linked from there.Long tutorial

JAX-WS: for SOAP Web Services spec  implementation