CS639 Class 8

HW2 is available, due week from Tuesday.

Good intro to HTTP headers: HTTP Headers For Dummies.

Try out the Linux tomcat install by this coming Tuesday to leave time for help.

Look at the HelloWorld servlet, which can be found by browsing to http://users2.cs.umb.edu:11600 (with tunnel if from off-site) and following the link to Example Servlets

import java.io.*;

import javax.servlet.*;

import javax.servlet.http.*;

public class HelloWorld extends HttpServlet {

public void doGet(HttpServletRequest request, HttpServletResponse response)

throws IOException, ServletException

{

response.setContentType("text/html");

PrintWriter out = response.getWriter();

out.println("<html>");

out.println("<head>");

out.println("<title>Hello World!</title>");

out.println("</head>");

out.println("<body>");

out.println("<h1>Hello World!</h1>");

out.println("</body>");

out.println("</html>");

}

· The HttpServlet object provides a framework for constructing servlets and provides many services

· In the HelloWorld servlet, the code overrides the the doGet method for handling GET requests

· The HttpServletRequest and HttpServletResponse provided as arguments to doGet are important objects, ready to use

· The first thing to do is to set the content type. This is a form of typing for the output message.

· This content type sets encoding character set for response object's Writer: Latin-1 for text/html, UTF-8 for text/xml.

· Here we see text/html, so the Latin1 character set (ISO 8859) is used here.

Execution:

The user browses to http://users2.cs.umb.edu:11600/examples/servlets/servlet/HelloWorldExample,

The browser will connect to sf08.cs.umb.edu on port 11600 and send the following:

GET /examples/servlets/servlet/HelloWorldExample HTTP/1.1

(header)

and receives back from this servlet:

(header)

<html>

<head>

<title>Hello World</title>

</head>

<body>

<h1>Hello World!</h1>

</body>

</html>

+ the second example is Request Info. Here doPost calls doGet, which is often done in servlets, as explained below

import java.io.*;

import javax.servlet.*;

import javax.servlet.http.*;

public class RequestInfo extends HttpServlet {

public void doGet(HttpServletRequest request, HttpServletResponse response)

throws IOException, ServletException

{

response.setContentType("text/html");

PrintWriter out = response.getWriter();

out.println("<html>");

out.println("<body>");

out.println("<head>");

out.println("<title>Request Information Example</title>");

out.println("</head>");

out.println("<body>");

out.println("<h3>Request Information Example</h3>");

out.println("Method: " + request.getMethod());

out.println("Request URI: " + request.getRequestURI());

out.println("Protocol: " + request.getProtocol());

out.println("PathInfo: " + request.getPathInfo());

out.println("Remote Address: " + request.getRemoteAddr());

out.println("</body>");

out.println("</html>");

}

/**

* We are going to perform the same operations for POST requests

* as for GET methods, so this method just sends the request to

* the doGet method.

public void doPost(HttpServletRequest request, HttpServletResponse response)

throws IOException, ServletException

{

doGet(request, response);

}

doPost can call doGet because the query string appears after the ? in the URI in a GET

- is contained in the body of a POST

- both are encoded the same way

- the web container parses the query string and puts the results in the request object as parameters

- this enables both doGet and doPost to access the parameters in the same way

For REST, we’ll want to use doDelete and doPut as well—follow the “JEE API” link on the class web page for details.

Using telnet for HTTP

telnet is the bare-bones way for a user to use a TCP/IP stream connection. You fire telnet up and then what you type goes out on the stream connection, and what comes back is shown on the screen. You can talk directly to various servers.

You don't need a browser to send an HTTP request. you can use telnet:

telnet users2.cs.umb.edu 11600

GET / HTTP/1.0

(blank line to signify the end of the header)

Note: use HTTP/1.0 to avoid having to fill in anything real in the header. Browsers use HTTP/1.1.

Tomcat Site Layout

CATALINA_HOME is an environment variable that specifies the top level directory for Tomcat

we can get Tomcat from $cs639/tomcat.zip for home use, directions to come.

the Tomcat directory structure looks like this:

$CATALINA_HOME or %CATALINA_HOME%

bin - for Tomcat itself: we use startup.bat, shutdown.bat, startup.sh, shutdown.sh

conf - contains server.xml, which we'll need to edit to put in the port numbers

logs - log files. look here when in trouble

webapps - web applications go here

… (ignore for now)

Tomcat Site Layout

CATALINA_HOME is an environment variable that specifies the top level directory for Tomcat

we can get Tomcat from $cs639/tomcat.zip for home use, directions to come.

the Tomcat directory structure looks like this:

$CATALINA_HOME or %CATALINA_HOME%

bin - for Tomcat itself: we use startup.bat, shutdown.bat, startup.sh, shutdown.sh

conf - contains server.xml, which we'll need to edit to put in the port numbers

logs - log files. look here when in trouble

lib – libraries needed by tomcat, incl. JEE classes we need to build with

webapps - web applications go here

… (ignore for now)

webapps is the root of this tomcat's website, I.e. the URI space, like /data/htdocs, which is the root of the departmental website running under Apache

eoneil’s top level directory is /home/eoneil/cs639/tomcat6.0, public in the UNIX filesystem (unlike your own cs639 dir) Under /home/eoneil/cs639/tomcat6.0/webapps/examples/ you will find:

- jsp, servlets directories

- WEB-INF

- classes - contains .class and .java files

- web.xml - file containing configuration information for the servlets of this app

- lib – libraries for this web app

Tomcat uses the URI (like /examples/servlets/servlet/HelloWorldExample) to find the class file it needs to run the servlet

A simpler example: hello world in its own webapp: servlet1

/home/eoneil/cs639/tomcat6.0/webapps/servlet1/ you will find:

- META-INF (one file generated by eclipse)

- WEB-INF

- classes - contains .class files in a package tree (cs639/xml/servlet/HelloWorld.class)

- lib (empty)

- web.xml - file containing configuration information for the servlet of this app

Important Config files: we need to know more about these:

conf/server.xml: system-wide config (also context.xml)

webapps/webappname/WEB-INF/web.xml: application-level config

Webapp Deployment

Each webapp has a directory in tomcat’s webapps directory. In fact, webapp deployment simply involves copying the webapp’s files/directories into its directory in webapps. Tomcat is on the lookout for new or updated webapps, checking once a second, and will start using a newly deployed webapp within seconds. This is assuming tomcat set up for development, as ours is. Production tomcat servers can be told not to check so often, to save CPU.

Tomcat in execution

+ Tomcat is just a Java program, though as a server program it is long lived

+ Tomcat runs inside the JVM, which becomes the "web container" or "servlet container"

+ when Tomcat receives a GET or POST it creates a request object

+ now Tomcat locates the class file for the servlet, loads class if necessary, creates new objects of that class, creates a new running thread in which the servlet object will execute.

+ in the new thread, doGet or doPost is called, fills in the response object, Tomcat sends it off to the requester

+ when the servlet finishes execution, Tomcat reverts to its idle state. garbage collection will remove unneeded objects

+ Note: the servlet object may remain, to be used again on the next request for this web app

Recall Pics of tomcat JVM last time…tomcat lives on, requests come and go, each gets a thread running in the JVM.

The URL of the request coming in is examined by tomcat

Ex. http://users2.cs.umb.edu:11600/servlets-examples/servlet/HelloWorldExample

Server hostname port ß----------------request URI----------------------à
ß----------------à

context path, first part of the URI: tomcat processes, determines webapp

· tomcat uses the context path to find the servlet's webapps directory, by default, just that named directory.

· So in this example, tomcat's webapps/servlets-examples is the directory in which tomcat looks for HelloWorld's servlet class file

· We haven't yet covered the step of how tomcat locates the web app's class file in this filesystem area.

· We looked at one request-response cycle, running in one thread

· multiple threads allows the execution of more than one request

· when we are writing servlets, we are writing multithreaded code

Try to install Tomcat on users2 by next class. See the Install guide linked to the class web page under Resources/tomcat.

Ports have been allocated to all students in the class: see "Assigned ports" link from tomcat Resources on class web page. The ports are TCP ports for our Unix/Linux systems. When you use a PC, you are (at least temporarily) in charge of the whole system and can use any port you want, i.e., the default 8080 is fine.

Want to try out a servlet from sources? See the posted servlet1 project, to be covered next week.

We need to use namespaces when dealing with different sets of names, as happens in SOAP and REST.

XML Namespaces: Intro

· Namespaces were not in the first version of XML, they were added later

· They are needed for SOAP and REST, other things that use names from multiple sources

· Namespaces are used to disambiguate names

for example, we could use the name "section" when referring to a book or to a course, by using "bk:section" for the first and "cls:section" for the latter

· in such a construction, the prefixes are local abbreviations, aliases, for the longer ID which is unique across the Internet.

· to get this uniqueness, we piggyback on the Internet domain name system

Using the Internet domain name system for unique ids

· All systems on the Internet have names such as "www.cs.umb.edu". our departmental site owns the "cs.umb.edu" part of the umb.edu domain name. we map all names using this ending fragment to a unique IP address (on possibly many different physical networks)

· The domain name system is hierarchical. "umb.edu" is controlled by Academic Computing, while "cs.umb.edu" is run by the CS Department (via a trust agreement with Academic Computing)

· No other school can use umb.edu, U of Maryland at Baltimore has to use another (umaryland.edu), even though it is known as UMB locally. Bob Morris claimed “umb” in 1979 for us, making us an early adopter of the new international email system.

· No other department at umb.edu can use “cs.umb.edu”, by the admin of umb.edu.

· In CS, all hosts are given different names, so sf08.cs.umb.edu is a unique host id across the whole Internet.

· Within the users of sf08, different URIs can be made up by using the request URI part of the full URI

· Thus http://users2.cs.umb.edu/myproject is a unique id (URI) for a project or resource across the Internet.

· Java defines namespaces though its package system. It has a well-known convention in package naming based on domain names. For example, our Java games package uses package name edu.umb.cs.games, so even if RPI also has a games package, theirs would be edu.rpi.cs.games. Both could be used in a project without name clashes between .class files and methods, etc.

· C# has namespaces, but I don’t know of any convention in their naming to avoid name clashes.

URLs vs. URIs

The URL "http://www.umb.edu/cs639" identifies a particular resource

A URL specifies a location (L in URL)

A URI is a name that is unique across the internet (I for identifier) that is in URL format.

A URI can be a URL, but it does not have to work in a browser, since it is simply a unique id

A URI might not be URL, for example "http://www.cs.umb.edu/games" to identify Bolker's game project, even though this does not work as a URL.

Note that the term URI is also used in “request URI”, to refer to the site-specific part of a URL, such as /cs639/index.html. Can be confusing.

A resource can have several URLs. For example the following 3 URLs all locate the same resource

- www.umb.edu/cs639

- www.umb.edu/cs639/

- www.umb.edu/cs639/index.html

URIs can't be this loose. each of the above URLs would be a different URI

A URI is a specific string in a URL format

URIs are used as global ids for a namespace. In a given document we use a prefix as a placeholder for the URI, like a dummy variable

In the following xml (no attributes: they work somewhat differently, we’ll consider them later)

<?xml version="1.0"?>

<bk:book xmlns:bk="http://schemas.cs.umb.edu/cs639/book">

<bk:title>Data on the Web</bk:title>

<bk:author>Serge Abiteboul</bk:author>

<bk:author>Peter Buneman</bk:author>

<bk:author>Dan Suciu</bk:author>

</bk:book>

This is similar to book4.xml in $cs639/validate-ns

The special attribute xmlns:bk="http://schemas.cs.umb.edu/book defines a prefix that can be used in this and descendent elements

Here “http://www.cs.umb.edu/book" is a unique id for this namespace---but there is no "namespace document", (later will see possible association with XML Schema doc, but a schema is not required for using a namespace)

Note that we were able to use "bk" before it was defined, since its definition was in the same element

Names of names: prefix, local name, qualified name

In the tag name "bk:title", "bk" is the prefix and "title" is the local name

A qualified name (or qname for short) is a prefix + a local name

We see boring repetition of the prefix in a single-namespace document, as above.

To avoid boring repetition, you can use a default namespace, where the namespace is implied at each element, rather than explicitly specified by a prefix.

The following xml file uses a default namespace:

<?xml version="1.0"?>

<author>Serge Abiteboul</author>

<author>Peter Buneman</author>

<author>Dan Suciu</author>

</book>

Notice that there is no colon in the attribute name "xmlns"

Here, all the elements have local names which are now considered local names in the “http://www.cs.umb.edu/cs639/book" namespace, even though there is no prefix.

Examples directory $cs639/validate-ns

See this directory for examples of XML files and related schemas for cases with namespaces in use.

In particular:

book4.xml: like the bk: example, but using b:

book3.xml: like the default namespace example.

DTDs: Can use if the document is using a default namespace, but if prefixes are in use, those prefixes have to be in the names in the DTD, which is really strange. So we won’t worry about DTDs for such cases.

Natural combo: Namespaces and XML Schemas.

Recall that book.xml has a schema book.xsd, but this schema can only be used with no-namespace XML. When we add a namespace to the XML, we need to edit the XSD to say what namespace is in use for its vocabulary. We looked at $cs639/validate-ns/book1.xsd, which has “targetNamespace” set to “http://www.cs.umb.edu/book", plus somewhat mysterious “elementFormDefault” setting. We need to return to this detail.

Also, the form of the linkage from the XML to the XSD changes: see $cs639/validate-ns/book5.xml to see linkage to book1.xsd using “schemaLocation” rather than “noNamespaceSchemaLocation” as we have seen earlier.

What characters are allowed in prefixes? Local names? In URIs/URLs? (We covered this earlier, so skipped in class)

A prefix is an “XML name”, under same rules as an element name or an attribute name.

A URI/URL is under a much looser set of rules which allows for example the / character that is disallowed in an XML name. See Harold pg. 77.

What characters are allowed in an XML name (element name or attribute name)?

From the XML 1.0 spec: here is the formal definition:

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender

Name ::= (Letter | '_' | ':' ) (NameChar)*

· CombiningChar and Extender are Unicode characters used in non-western-european languages

· Letter is any language's idea of a letter, [A-Za-z] within ASCII or that plus [c0-d6], [d8-f6], [f8-ff] in Latin-1. See Appendix B in the XML 1.0 doc.

Note: don't use colons in XML local names even though it is legal. It's too easy to confuse with the colon in a fully qualified name. The colon in this list just allows XML names like “bk:title”

In practice, then, a local name of an XML element or attribute can start with a letter or underscore, followed by letters, digits, underscores, periods and hyphens. The treatment of underscore as "almost a letter" is like Java identifiers. But in Java, we are not allowed to use hyphens or periods anywhere in a name. Like XML, Java also allows Unicode letters of other languages in its identifiers (the Java language spec has an example identifier ).

Is there a namespace document? No!

We will see that we can use XML Schemas for a namespace, but it is possible to use namespaces without schemas. In that case, the set of names in the namespace is just implied by the existence of those names in the XML document that has the namespace identification.

For example, is “animal” a local name in the namespace for books in use above? We don’t really know, because it doesn’t show up in the XML files we’ve seen, but maybe another file has it. We can say that “book” and “title” are local names in this namespace.