CS639 - Class 02
HW1 is available.
PA1 is coming soon.
From last time: Using JDK support to check for DTD validity:
Working in a copy of $cs639/validate:
java sax.Counter -v valid_greeting1.xml checks for DTD validity
// you use –s –v for schema validation
// if we use only –v is for DTD validation. The default validation is DTD.
XML Schema validation:
Here is the schema, in greeting.xsd:
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="greeting" type="xsd:string"/>
</xsd:schema>
Lots of boilerplate, and XML format.. See an element named "xsd:element". That's describing the greeting element: it has name "greeting" and type "xsd:string".
Here is the XML document with linkage to the schema, in valid_greeting2.xml
<?xml version="1.0"?>
<greeting xsi:noNamespaceSchemaLocation="greeting.xsd"
<--extra attribute
stuck in greeting's start tag to link to schema
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
Hello XML!
</greeting>
java sax.Counter -v -s valid_greeting2.xml
checks for XML Schema validity
If you want to see the XML support classes in the JDK in use here---
java -verbose sax.counter –s –v valid_greeting.xml // shows all the classes loaded
XML for Moving
Data
Basic idea: XML is great for getting important data from point A to point B and back. It is not so great for holding data for heavy-duty analysis: that is where databases shine. Luckily we can whisk data in and out of databases and convert it to XML for transport.
Our basic model for this class is the following, where ∆ stands for XML data:
Program A Program B
We can call this idea POX, for plain old XML.
Note that Program A could be Java, program B C# or even C++.
Although XML is great for transporting data, it is not so useful in persistent storage of data. Relational databases still hold the dominant place in persistent data storage. That means there are three data representations in current big programs, in-memory objects, XML, and database tables. We will be considering how to move data between XML and objects. The job of moving data to and from objects and relational tables is tackled in CS636 and CS637. It is also possible to move data directly between database tables and XML—we could cover this topic briefly.
Web Services
The Web Services give us a framework for this data transfer via XML.
- how to find your partner, setup the connection, ...
- we HAVE to understand the connection between program A and program B (this is what we will do in this class today).
We will use tomcat + JAX-WS for SOAP Web Services. tomcat + JAX-RS for RESTful web services.
Note for RESTful web services, in some cases there is no XML in the request, just a HTTP GET. There is XML in the response, however, except in the cases that use JSON instead of XML.
JSON is the new competition to XML, made popular by Ajax. We should look at it at some point.
Network Basics
TCP/IP gives us a TCP stream connection which is the data pipe in the above picture. This type of connection was perfected in '80s.
- underlies many services like remote login, file transfer, web access, email ... with some exceptions ( except NFS and streaming protocols and Voice Over IP - these ones are using UDP/IP). TCP in turn uses IP, the internet packet protocol.
TCP - provides a two way data pipe from a process on one system to a process on another system (or the same).
Process A, on host X Process B, on host Y
Process = program in execution, live on the system.
- reliable: no data is corrupted, dropped or reordered.
- flow - controlled.
- it is unencrypted.
- the connection runs all the way across Internet, over many different many networks.
Internet Addresses for hosts, i.e., systems.
Each host directly connected to the Internet has a unique IP address (32 bits, IPv4, gradually being replaced by IPv6 to allow more ids). linux2.cs.umb.edu has IP address 158.121.106.237 (as seen from outside the departmental firewall, 192.168.106.237 from inside, not unique). Each dotted part stands for a byte, value 0-255. You can see that linux2 is almost directly connected to the Internet since it has its own true IP address and can be contacted from anywhere in the Internet, at least on port 22 for SSH.
Each host also has a name like "linux2.cs.umb.edu" and this is also unique as a name across the Internet. However, a host can have several such names, as this one does: vm72.cs.umb.edu and users2.cs.umb.edu. Each of these names is unique across the Internet. Use the name so you don’t have to worry about where you’re working, since inside the firewall the IP address is different.
There is a mapping between the names and the IP Addresses. We use the names, and library software looks them up for the corresponding IP address. The IP address is used in the system calls, the lowest level calls into the operating system. This system of name->IP is supported by DNS, the domain name system, a distributed database with its own DNS servers.
Each system running TCP has an array of TCP "ports". #port = 16 bits (so 64K ports).
Some of these ports have specific services. Ex: port 80 is assigned to HTTP web service, port 21 – telnet, port 22-ssh ...
A server (process) can "listen" on a certain port. It has told the OS it is doing so, so when a connection from a client comes in, the OS arranges the active connection from the client to the server.
Ex: web server listens on port 80. SSH server listens on port 22
Server processes are long-lived, so they are there when the clients need them. Clients can come and go.
Only one server process can be listening to a certain port on a certain host, at each point in time. The main web server listens on port 80, the ssh daemon on port 22. We can run other web servers on other ports—my tomcat is running on port 11600 on users2.cs.umb.edu = linux2. You will get an assigned port for your tomcat.
Putty tunnels: see “Using
Putty Tunnels to access ports of cs.umb.edu systems” linked to the class web
page under Resources.
Because of network security requirements, most of the ports of linux2, etc. are blocked from access from outside the firewall. Inside the firewall, all the ports are available on all the departmental hosts.
Luckily, SSH service includes “tunneling”, which allows us to reach any particular port inside the firewall from outside, once we have set up that particular tunnel.
So for example, we can tunnel from localhost:11600 to users2.cs.umb.edu:11600, with the help of a login on linux1.cs.umb.edu. The login provides a channel from localhost to linux1.cs.umb.edu, and then the ssh server on linux1 can easily create a connection (inside the firewall) to users2 on port 11600.
We will need such a tunnel ro access my tomcat running on users2.cs.umb.edu on port 11600—
Browser accesses localhost:11600
Tunnel accesses users2.cs.umb.edu:11600, the actual tomcat port.
A client is the other end of a potential connection can connect to this port on the server host, thus connecting to the server process. Multiple client processes can run on the same client host, because the client host has plenty of ports, and assigns a different port # for each client end.
Basic cycle that happens in a Web Service or plain HTTP or other service:
1. Client connects to port X on host Y.
2. Server is listening to that port, accepts that connection and that makes them connected.
3. The Client now sends a request over the connection.
4. The server sends a response back over the same connection.
(3 and 4 can be repeated over and over)
5. Disconnect.
This pattern applies to remote login, file transfer, web access ....