CS639 Class 27 XML and DBs

Amazon S3, cont.

S3 also supports BitTorrent, which is good for popular downloads, but apparently not much help for private files.

Can download a Java little program to access S3, and it was able to see my archived files from DragonDisk, each with key = pathname.

Surprisingly, I was able to make a bucket named simply “bku1” (supposed to be a global name).

Note how the server uses the first part of the hostname to find the bucket, rather than the path in the GET. You use the path in the GET to access objects in a bucket.

Looking at the jars that came with the sample, and the corresponding imports in the sources--

Shows they are using Apache’s HTTP Client and logging, and StAX, and Jackson for JSON.

Also looks like all sources are provided: about 2000 Java files, but this covers all of AWS, not just S3. 116 in S3 area. No StAX here.

XML in DBs and XML Generated from Relational DBs

We have been working with 3 kinds of data: Draw pic with three areas, arrows between representing translation of data between them:

XML: triangles, Java objects: ovals, DB tables: rectangles

All three are considered very different from each other, with an “impedance mismatch” between them, i.e. it’s hard to translate data from one form to another.

We have studied the transition XML data ß> Java objects the most: parsing into our own objects with SAX, parsing into a DOM tree, and back out (via XSL, recall trick), more recently data binding with JAXB and its annotations in the POJOs.

CS636 studies Java objects ß> DB data using JPA-2, Java Persistence Architecture. It uses annotations in the POJOs too.

Web app layers:

· Presentation: handling XML to/from web services

· Service: doing core app programming with POJOs

· Data access: getting data to/from DB

Here we need the POJOs to keep the programmers happy. We can use JPA-2 to turn DB data into POJOs, and JAXB to turn POJOs into XML. It’s even possibly to annotate the same POJO two ways, one for JAXB and one for JPA-2, since each annotation processor only looks for its own annotations.

Data-driven Web Services

But what if we don’t need Java programming for our web services? In some pure data-driven cases, the desired web service data is determined completely by the database data.

Publishing Database Data Directly in XML, with no intervening application

On pages 203-208 of the text there is code that manually produces XML from database tables, but this is ugly

Today, many databases know about XML and can generate XML from relational tables

The SQL/XML standard is supported Oracle and DB2 UDB, among others, but not SqlServer, which has SQLXML, something similar.

SQL/XML was supposed to be part of SQL 2003, but they dropped it, apparently because it wasn’t integrated properly with XQuery yet.

Consider the following table structure

Dept (table name)

-----------------------------------------

dname | location (column names)

-----------------------------------------

Accounting | New York (row data)

-----------------------------------------

Operations | Boston

Easy to set up on our Oracle running on dbs2:

SQL> create table dept(dname char(20), location char(20));

Table created.

SQL> insert into dept values ('Accounting', 'New York');

1 row created.

SQL> insert into dept values ('Operations', 'Boston');

1 row created.

Here is an example SQL/XML query;

select XMLElement ("Department",

XMLElement("DeptName", dname),

XMLElement("Location", location))

from dept

The above SQL statement would generate the following XML:

<DeptName>Accounting</DeptName>

</Department>

<DeptName>Operations</DeptName>

<Location>Boston</Location>

</Department>

Actual output from SQLPlus—truncated, need better settings in SQLPlus to see everything

<Department><DeptName>Accounting </DeptName><Location>New York

<Department><DeptName>Operations </DeptName><Location>Boston

Of course there is more to this—see Oracle docs.

XML stored in a relational database

It is also possible to hold XML *in* the database, specifically in column values, using a new datatype for XML documents. Oracle calls this new datatype XMLType, DB2 just XML. We can have ordinary columns, like varchar and float and int, and then one or more columns that hold whole XML documents.

Then ordinary SQL can’t be used on the XMLType column value, but XPath can.

From notes15--

Another example of XPath use: in databases

Databases can now handle XML along with their ordinary relational data. A very useful way is by defining an XML datatype, so that a single column value can hold a whole XML document. Then a certain order, for example, can have a row in the orders table with relational columns order_id, qty, price, etc. , and also (say in column “req”) the XML document that came in from the web to make the order.

Oracle has good XML support in this way (since v 9.2 I think). Once an XML document is held in a column value, it can be queried by the Oracle SQL function

extractValue(tab.col, ‘xpathexpr’)

where the xpathexpr selects a certain element or attribute node. There are other related functions as well.

For example the following could get the qty from the XML in the req column of orders:

select extractValue(o.req, ‘/order/qty’) from orders o

where order_id = 100;

This kind of extraction of information from the XML can be done to populate relational columns in the database from the incoming XML, while also preserving that original XML.

Using Namespace case

Oracle case: just use an optional third argument for extractValue:

select extractValue(o.req, ‘/ord:order/ord:qty’,’xmlns:ord=”http:/…”’) from orders o

where order_id = 100;

The third argument sets up ord: as the prefix for the xpath expression. Multiple prefixes can be set up by separating them by spaces in the string.

Now to flesh that example out, at least the no-namespace case. The following example is from

Getting into SQL/XML by Tim Quinlan

Available at http://www.oracle.com/technetwork/articles/quinlan-xml-095823.html

1. Create a table with an XML column.

create table invoiceXML_col (

inv_id number primary key,

inv_doc XMLType);

2. or, Create an XML table.

create table invoiceXML_tbl of XMLtype;

We’ll stick to case 1. No reason to give up the option of having ordinary columns as well as XML ones. I’ve converted the examples from case 2 to case 1.

Add XML from a string:

Insert into invoicexml_col values (1, XMLType(‘<Invoice> … </Invoice>’));   --specify XML with string

Or add XML from a file (XMLDIR is a directory alias set up previously, see tutorial)

Insert into invoicexml_col values (1,

XMLType(bfilename('XMLDIR', 'invoicexml.txt'),

nls_charset_id('AL32UTF8') )); --I think this is the default charset, a UTF-8 descendent

Here is the file (no <?xml prolog, but it works)

<Invoice>

    <MailAddressTo id="PA">

        <Person>Joe Smith</Person>

        <Street>10 Apple Tree Lane</Street>

        <City>New York</City>

        <State>NY</State>

        <Zipcode>12345</Zipcode>

    </MailAddressTo>

    <MailAddressFrom id="PA">

        <Person>Ed Jones</Person>

        <Street>11 Cherry Lane</Street>

        <City>Newark</City>

        <State>NJ</State>

        <Zipcode>67890</Zipcode>

    </MailAddressFrom>

    <Details id="2006Sept1to30PA">

        <FromTo>Sept 1, 2006 to Sept 30, 2006</FromTo>

        <Hours>70</Hours>

        <Rate>30</Rate>

        <Taxes>210</Taxes>

        <TotalDue>2310</TotalDue>

        <InvDate>Oct 1, 2006</InvDate>

        <Contractor>Ed Jones</Contractor>

    </Details>

</Invoice>

Using XPath on this data—

select extract(inv_doc, '/Invoice/MailAddressTo') from invoicexml_col;

EXTRACT(INV_DOC,'/INVOICE/MAILADDRESSTO')

<MailAddressTo id="PA"><Person>Joe Smith</Person><Street>10

Apple Tree Lane</Street><City>New York</City><State>NY</Stat

e><Zipcode>12345</Zipcode></MailAddressTo>

Select count(*) from invoicexml_col

where existsNode(

inv_doc, '/Invoice/MailAddressTo[Person="Joe Smith"]') = 1;

COUNT(*)

With XML in the database, you can register XSD for it, that is, in our case associate a schema with invoicexml_col.  This XML has no namespace, but it is possible to work with namespaces too, even multiple schemas and namespaces.

An XMLType column can only hold well-formed XML in each column value: single root element, etc. With a registered schema, it can only hold valid documents.

Oracle docs point out that XML Schema and database schema can work together to keep data conforming to rules.  XML Schemas can’t by themselves enforce unique keys or relational integrity, but the database can.

DB2 and MS SQLServer have similar capabilities.

MySQL: behind the others for XML, but free and wildly popular. Can do boilerplate XML output:

sf08$ mysql -u eoneil1 -D eoneil1db -p --xml -e 'select * from customers'

Enter password:

<?xml version="1.0"?>

<resultset statement="select * from customers

" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <row>

        <field name="cid">c001</field>

        <field name="cname">Tiptop</field>

        <field name="city">Duluth</field>

        <field name="discnt">12</field>

  </row>

  <row>

        <field name="cid">c002</field>

        <field name="cname">Basics</field>

        <field name="city">Dallas</field>

        <field name="discnt">12</field>

  </row>

</resultset>

mysql has an ExtractValue function that works on string data using XPath:

mysql> SELECT ExtractValue('<a><b>cc</b></a>', '/a/b');

+------------------------------------------+

| ExtractValue('<a><b>cc</b></a>', '/a/b') |

+------------------------------------------+

| cc |

+------------------------------------------+

1 row in set (0.00 sec)

Similarly, put this XML in a string column named col1 and

mysql> SELECT ExtractValue(col1, '/a/b') from T;

to see the same result

So you can store XML in string columns and query it with XPath in mysql too.

There is also an UpdateXML function that can replace parts of XML.

Oracle can update XML with its UPDATEXML function: example from Oracle docs--

SELECT warehouse_name,

   EXTRACT(warehouse_spec, '/Warehouse/Docks')

   "Number of Docks"

   FROM warehouses

   WHERE warehouse_name = 'San Francisco';

WAREHOUSE_NAME       Number of Docks

-------------------- --------------------

San Francisco        <Docks>1</Docks>

UPDATE warehouses SET warehouse_spec =

   UPDATEXML(warehouse_spec,

   '/Warehouse/Docks/text()',4)

   WHERE warehouse_name = 'San Francisco';

1 row updated.

SELECT warehouse_name,

   EXTRACT(warehouse_spec, '/Warehouse/Docks')

   "Number of Docks"

   FROM warehouses

   WHERE warehouse_name = 'San Francisco';

WAREHOUSE_NAME       Number of Docks

-------------------- --------------------

San Francisco        <Docks>4</Docks>

Last topic: XML from DBs to Java via JDBC

Java 6 has SQLXML classes to help with newer JDBC (JDBC 4.0) accessing DBs that are storing XML in column values.

Supported in DB2 (DB2 v 9.5 (2007) or later), and MS SqlServer (2005 or later) for several years and more recently in Oracle 11.2 (2009)—we have v 10.1, unfortunately.

Mysql—can store XML in string types and use SQLXML in Java.