CS639 Class 09

CS639 Class 9

Handout: XML Schemas and namespaces

Assigning namespaces to parts of XML

Example 1.7 on page 30 has two default namespaces that operate on different regions of the XML tree

Namespaces have scope: in this example, the Orders ns has scope of everything (all elements) at or below (decendents In the tree) the Orders element. The Address ns has scope at or below the ShipTo element

When contending namespaces have no prefix (i.e. are default namespaces), the more local namespace is used. If they can be discriminated by prefix, then either can be used if in scope.

Attributes. Also note that a default namespace only defaults the namespace for element names, not attribute names (among names). All the attributes in Example 1.7 have no prefix, so they are in no namespace at all (pg. 31). The namespace does not include the whole “vocabulary” of the XML (element and attribute names), but a subset of names important for reference purposes.

Note on “among names”: we’ll see that content typed as QName can also use prefixed names that use default namespaces.

What about “really plain” XML, without any namespaces, like we have been using?

Then the element names are in no namespace at all, an awkward state if other namespaces are in use. We’ll set up namespaces for our tagnames when we need to work with other namespaces. Our schema documents so far have been using the xsd: prefixes, for names of the XML Schema. At the same time, such a schema describes the application-level elements whose names are outside any namespace.

Namespaces and DTDs

· These can be made to work together but not too many people try. The DTD needs to use qualified names (a:b), and the prefix on the DTD needs to match the prefix on the XML document, an awkward requirement.

· We’ll avoid this combo

· Note that SOAP (basic protocol of web services) disallows DTDs in the message, although you could extract the contents and then use a DTD on it.

Namespaces without XML Schemas

· This works fine. There is no “namespace document”—the namespace is just id’d by its URI. Pretty common in REST: try to find a schema in our REST book! Can use a namespace to say “this XML document format is for company abc”, and never have a DTD or XML Schema for it.

Namespaces and XML Schema

· This is the natural combo for serious work with multiple sources of XML, where you need namespaces to keep things straight, and want validation.

· With namespaces in use, an XML Schema document defines local element names for a certain namespace, the “target namespace”, specified by targetNamespace=”…” in the schema element at the start of the doc.

· Without namespaces in use, the names of a schema are considered outside any namespace, and the schema document has no “target namespace.”

See handout for first example of XML+XMLSchema based on Example 2.21, pg. 103

Note that the NS URI is used twice, once to spec the target NS, and again to spec the default NS for this XML file. This is the most convenient way to do it, avoiding repetitive prefixes for the app tagnames such as getQuote and symbol, and app typenames such as StockSymbol.

Each name=”something” for elements and named types in the schema is putting that name into the namespace.

Here getQuote, symbol, and StockSymbol are put in the NS as local names. The schema says more than this, of course.

(Also, identical names can be put in the NS from different parts of the schema, so a NS is not a pure set of names.)

When we use prefixes for both namespaces in the XML schema, it shows that the name=xxx have xxx the local name. This makes sense because name=xxx introduces a new name into the target NS, and the target NS only, so there’s no way to “cheat” and put a name into another NS.

This is enforced by the schema of schemas (i.e., to be useful, a schema needs to follow this master schema). The type of name’s value is NCName: see pg. 62 of Fig. Table 2.1. On the other hand, the other uses of the local names (here only one, type=”x:StockSymbol”) come with prefixes, and are typed QName.

The fine print: we are assuming that if a schema is in use, it falls in one of two categories:

It has no targetNamespace= This is the case of schema but no associated namespace, like pa1 and orders.xsd, pg 38. The XML documents that can be validated have no namespaces themselves (unless namespace processing is turned off, possible with many parsers.)
It follows the form of Example 2.21, pg. 103: targetNamespace=…, and also “elementFormDefault=”qualified” (overriding the default) and possibly also attributeFormDefault=”unqualified”, agreeing with the default. This is the schema + namespace case we want to study.

Otherwise we would have 3 more cases to study. If we ever bump into one of the other cases (targetNamespace=…, with “elementFormDefault=”unqualified” or “attributeFormDefault=”qualified”, or both), we can discuss it.

Use of Example 2.21 schema quote.xsd

This schema only has one top-level xsd:element, for getQuote, so this schema can only validate XML docs with a getQuote element at the root, such as quote.xml. The other top-level construct is a type declaration, which does not match directly to XML, but helps with the getQuote declaration. The type name StockSymbol is a local name in the NS.

Example 2.21, pg. 103: Case of XML Schema with a target NS.

This example shows how we want to do such schemas, as discussed last class.

Here getQuote, symbol, and StockSymbol are put in the NS. The schema says more than this, of course.

Note new directory of examples, $cs639/validate-ns, for XML Schema with namespaces, including quote*.*.

Note that all the element tagnames of the request XML (see pg. 97, inside the SOAP envelope, quoted below), getQuote and symbol, are described in the schema, and additionally, the type-name StockSymbol is put in the namespace by the schema. Thus the namespace has more names than are actually used in the XML request document.

Request XML from pg. 97, with default N.S:

</getQuote>

Response XML from pg. 97:

<Quote xmlns=”http://namespaces.cafeconleche.org/xmljava/ch2/> <--same namespace!

</Quote>

The response XML uses element tagnames Quote and Price, and although the XML has the same namespace http://.../ch2 associated with it, the corresponding schema with that target NS (pg 103) does not describe Quote and Price. This is OK. There is no rule that the schema has to declare all the names in the namespace if it is associated with it by targetNS. Additional names are brought in by having the NS on XML docs. It just means that this schema can only be used to validate the request XML, not the response XML.

However, we could expand the schema so that it does cover both request and response, by adding a top-level (child of <schema>) <xsd:element name=”Quote”>, etc. to it. With two top-level <xsd:element> elements in the schema, the same schema can validate XML with root tag <getQuote> or root tag <Quote>. This is how I would do it. The schema then describes the whole XML document interchange, the arrangement between the sender and receiver.

More Fine print: if we put “elementFormDefault=”unqualified” (or nothing about elementFormDefault, since this is the default) in the schema, then we would not use the prefix on symbol. But we would still need to put it on getQuote, because it’s a “global element”, one defined at top level in the schema (its element node is a direct child of the schema’s schema node.) This need to know whether each element is global or not, while writing a conforming XML document, is what makes this setting of elementFormDefault so hard to use. Because it’s the default, however, you may see it in practice. Some authors go to the extreme of suggesting that to avoid this debate, you should make all elements global. See pa1bsoln/JavaSourceUsingRefs.xsd for a schema that makes all elements global. It has the advantage that all subtrees of the original document tree are valid, but that can also be a disadvantage if you want to make sure all documents give the full tree.

Namespaces and attributes

There are two kinds of attributes: ones belonging to a certain element (the examples we’ve seen) and “global attributes” defined at top level in the schema—look at next time.

Consider attributes belonging to an element now.

We never prefix an attribute name even if we are prefixing its element name, because our schema says attributeFormDefault=”unqualified”. (Actually, we can override this for a particular attribute if we want, but there’s no reason to.)
Unprefixed attribute names are not in any namespaces (pg. 31). They just tag along with their elements.
We do use a prefix when the attribute is in another namespace from the element, because it’s a global attribute.

Also note that a default namespace only defaults the NS for elements. That is OK, since we do not need prefixes for element attributes.

Attribute Example

Example: book4.xml in $cs639/validate-ns:

<?xml version="1.0"?>

<b:book xmlns:b="http://schemas.cs.umb.edu/book"> ß our URI for book NS

<b:title>Data on the Web</bk:title>

<b:image source="csearch.gif"/> ß no b: on attribute name, because it’s “unqualified” .

</b:book>

The attribute names belonging to elements are in no namespace, but of course the schema knows about them and checks them. They can be thought of as “tagging along” with their element.

From book.xsd in $cs639/validate:

<xsd:complexType name="ImageType">

<xsd:attribute name="source" type="xsd:string"/>

</xsd:complexType>

That’s too simple. Suppose an image had a child element:

<b:image source="csearch.gif">

<b:size> 30 </b:size>

</b:image>

<xsd:complexType name="ImageType">

<xsd:sequence>

<xsd:element name=”size” type=”xsd:string”/>

</xsd:sequence>

<xsd:attribute name="source" type="xsd:string"/>

</xsd:complexType>

There are attributes that don’t belong to certain elements. See example on pg. 31. Next time.

Other examples in $cs639/validate-ns:

book6.xml has this namespace with linkage to XML Schema via xsi:schemaLocation="http://schemas.cs.umb.edu/book book1.xsd".

book3.xml has default NS

book5.xml has default NS + linkage to XML Schema

Many cases to deal with: XML + DTD or not + schema or not + namespace or not—

No Namespaces: our previous coverage:

XML + DTD

XML + XSD (with no targetNS)

XML+ DTD+XSD: uncommon

With Namespaces: DTDs don’t play well with namespaces, so we only consider XSDs in combo with NSs

XML + NS

XML + NS +XSD (with targetNS = NS)

However, note that a NS doesn’t by itself have a NS document the way the DTD and XSD have docs. It’s just an identifier URI attached to XML to disambiguate names.

XML + NS case: only the XML is a document. The NS is just a construct, holding all the prefixed local names, plus element names if it’s being used as a default namespaces.

In the XML+NS+XSD case, the XSD has the NS URI as its target namespace, so the XSD serves as a document for the NS, along with its job to express the XML’s structure. Thus the NS URI bridges between the XML doc and the XSD doc.

Well-known schemas are found by parsers at well-known places

Parsers need help finding application schemas. Need to cover global attributes for this.

Example of a standard namespace in the book – XInclude

- pg 29 (confusing because early)

->this is the XInclude “recommendation”, i.e., standard.

<xi:include href=”order_details.xml”/>

</Order>

Here “include” is a name in this XInclude namespace, and if it has a schema, then href should appear as this element’s attribute (and not in any namespace). The include element says where to find an XML doc to include in this one, like “#include in C”.

Can we run this through sax.Counter? No, the SAX parser doesn’t handle XInclude. You need a XInclude tool to turn this into XML with the inclusion done. You can find a XInclude tool written as an XSL app on the Internet. Might be useful someday.

Global Attributes: These can be attached to elements of other namespaces. Look at XLink example, pg. 31. xlink:type and xlink:href are global attributes, belonging to the “http://.../xlink” namespace but added to an element of the “http:/.../Address” namespace.

<ShipTo xmlns="http://ns.cafeconleche.org/Address/"

xmlns:xlink="http://www.w3.org/1999/xlink"

xlink:type="simple" xlink:href="mailto:chezfred@yahoo.com"> <--global attributes

<GiftRecipient>Samuel Johnson</GiftRecipient>

<Street>271 Old Homestead Way</Street >

<City>Woonsocket</City> <State>RI</State> <Zip>02895</Zip>

</ShipTo>

If there is no schema in use for the Address NS, this is easily done. Again, you need a tool or framework that understands XLink to really put this to use.

If there is a schema for the Address NS, it must give permission for the “extra” attributes, or validation will fail. The simplest way is to allow any attribute: See address.xsd in $cs639/validate-ns for this:

<?xml version="1.0"?>

<xsd:schema targetNamespace="http://ns.cafeconleche.org/Address/"

xmlns="http://schemas.cs.umb.edu/book"

xmlns:xsd="http://www.w3.org/2001/XMLSchema"

elementFormDefault="qualified">

<xsd:element name="ShipTo">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="GiftRecipient" type="xsd:string” minOccurs="0" maxOccurs="unbounded"/>

<xsd:element name="Street" type="xsd:string"/>

<xsd:element name="City" type="xsd:string"/>

<xsd:element name="State" type="xsd:string"/>

<xsd:element name="Zip" type="xsd:string"/>

</xsd:sequence>

<xsd:anyAttribute/> <---add to schema to allow global attributes (or any others)

</xsd:complexType>

</xsd:element>

</xsd:schema>

As of class time, I had not made this work. With the help of the validation service at http://www.w3.org/2001/03/webdata/xsv, with its somewhat more useful error messages, I have succeeded.

You also must provide the parser with access to the schema that establishes the attribute as a global attribute, namely, the XLink schema. A Google search found it at http://www.loc.gov/standards/mets/xlink.xsd. We tell this important fact to the parser by linkage or other means (here linkage). However, the validation still failed, reporting that “type” was not allowed as an attribute for ShipTo.

Turns out “type” is in fact not a global attribute of the XLink schema but href is, so the final XML that works is:

address.xml in $cs639/validate-ns:

<?xml version="1.0"?>

<ShipTo xmlns="http://ns.cafeconleche.org/Address/"

xmlns:xlink="http://www.w3.org/1999/xlink"

xsi:schemaLocation="http://www.w3.org/1999/xlink http://www.loc.gov/standards/mets/xlink.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xlink:href="mailto:chezfred@yahoo.com">

<GiftRecipient>Samuel Johnson</GiftRecipient>

<Street>271 Old Homestead Way</Street >

<City>Woonsocket</City> <State>RI</State> <Zip>02895</Zip>

</ShipTo>

java sax.Counter -s -v -schema address.xsd address.xml

address.xml: 598 ms (6 elems, 2 attrs, 0 spaces, 79 chars)

To test it at http://www.w3.org/2001/03/webdata/xsv: enter the URLs for these files in the form:

http://www.cs.umb.edu/cs639/validate-ns/address.xml http://www.cs.umb.edu/cs639/validate-ns/address.xsd

Linkage to Schema from XML: done with global attributes

Recall book2.xml, with its linkage to schema. This is done with the help of the XMLInstance namespace, another standard namespace, and its global attribute noNamespaceSchemaLocation:

<?xml version="1.0" encoding="ISO-8859-1"?>

<book xsi:noNamespaceSchemaLocation="book.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<title>Data on the Web</title> ...

This is an example of a global attribute (noNamespaceSchemaLocation) that doesn’t need to be in the schema, since it’s part of the infrastructure.

Here the xsi prefix for the XML instance NS is set up with the xmlns:xsi=”URI_of_XMLInstance”. By XML instance we mean the XML document itself, rather than the schema. The XML document needs to point to its schema, which it does with the help of the XSI namespace.

Obviously we need a different construct for the XSD linkage for the case of NS+XSD

· Without namespace: xsi:noNamespaceSchemaLocation with value “URL”

· With namespace: xsi:schemaLocation with value “URI URL”, two URL-syntax strings separated by whitespace, the first for the namespace URI and the second for the schema’s URL.

Example of second form: address.xml above.