cs636 class 24

CS636 Class 24 Domain Object Design, Internationalization, Web Services

More on Microservices

Although there is little discussion of layering in microservice articles, there is a strong notion of the application's API, the most important layer division in our systems. That's what the "service" part of the name means. And services here mean stateless services, although that is usually assumed, not stated.

Transactions in microservices. The scope of transactions is limited to a single service call, as in our setup, but this one call is accessing only part of the persistent data, so is less powerful. In some cases, data may become inconsistent between the databases, at least for a while.

Eventual Consistency. There is an idea of "eventual consistency" to help out. Changes seen in one microservice may be notified to other microservices, so the other component can eventually fix up its part. Obviously this increases the complexity of the code.

Using Microservices means having a distributed system, with all the complexities that brings: connection issues on every important in-app call, complex outages, as well as only eventual consistency at best.

With Microservices, can't follow the user around by simply using session variables. A session variable is specific to one tomcat (or other app server), and now the app is typically running multiple tomcats, one for each microservice instance. Need a user identity service working with all the incoming requests, and need real authentication as well.

Article on downside of microservices: https://dzone.com/articles/dont-do-microservices-if-you-can

Advanced Domain Object Design

There is a whole theory of this, predating microservices but relevant to their design. The first book to read is Domain Driven Design (or DDD for short, 2004) by Eric Evans. He explains how to identify and define entities and value objects, two kinds of domain objects. Then object graphs that hang together to describe something are called aggregates. Then aggregates can be grouped into a "bounded context" which has a consistent model and language and is usually supported by a certain development team. The idea of bounded context is used for components in the microservice architecture, a later development.

Then read Implementing Domain-Driven Design, IDDD for short, by Vaughn Vernon, 2013, to see the more current thought on system design using DDD. The newer architecture is called Hexagon or Ports and Adapters. As in our systems, the domain objects are simple POJOs, unaware of how they are persisted, etc. The DAO code is now relegated to "adapters". On close inspection, this DB adapter for a component is called as needed, so is in effect a lower layer. In architecture diagrams, you usually see a hexagon, representing the various adapters as faces (not necessarily 6 of them, by the way), and inside it, another hexagon around the core, which contains the domain code. The inner hexagon represents the service API, or set of services. Following DDD, there are aggregates and bounded contexts. The transactions are expected to change only one aggregate at a time, so again there can be inconsistency, and events are used to notify other aggregates as necessary to obtain eventual consistency. A transaction may read related aggregates, of course.

Clearly DDD and microservice architecture can be combined, based on decomposition of the whole app into bounded contexts.

Example of Music Project

Aggregates: Product with Tracks, Invoice with items, User, Cart

Rule from IDDD: Preferably, don't use object reference from one aggregate to another. This saves memory in the server as less data is dragged around.

What about Downloads--are they details of Tracks or Users or separate? There are no object refs currently from Track or User to bind them in. So perhaps best to consider them as a separate aggregate. But then the ref from Download to Track breaks another rule of IDDD that refs (even ids) from another aggregate should only point to the root object of an aggregate. So that argues for attaching the Downloads to the Track as details, and replacing the User ref of Download with a user id or email.. This also aligns with the use of Downloads in the app: tracking popularity of songs, not activity of a certain user.

What about Cart? It's not persistent in our system, does that matter? Well, although not in the database, it does last for multiple requests, so is long-lived, and thus could easily be in the database. All we need to know is a unique id to store it under, and that can be established by a cookie, as we have seen with tomcat, even before a user has registered.

The cart could be stored in MongoDB for example, as shown in this diagram. In MongoDB, there is typically one "document" under each id, so here it would have the collection of cart items, in a JSON structure (actually a Swagger API doc) There's a whole microservice for carts. It does use ids for items (products), so second lookups are needed to find out product attributes.

Rule from IDDD: change only one (persistent) aggregate in a service call. This involves transaction scope. If you must change two aggregates, use two disjoint transactions, one for each (for example, by calling through the service API of a microservice). Of course this can cause inconsistencies in data that need to be fixed up eventually.

Mutator service calls in music:

processInvoice: only changes an Invoice

addDownload(userid, Track) --changes only Downloads

addItemToCart, etc.--changes only Cart

checkout(Cart, userid)-- saves Invoices and clears Cart, so needs multiple transactions if using multiple databases. Use calls to the Catalog service API to fill in Product info in the new LineItems (product code) and the product price needed to calculate the invoiceTotal before doing the Invoice insert transaction. These might be worked around.

registerUser(String, String, String)-- changes only Users

We have largely been following this rule without trying, in the monolithic case.

Caching

Our basic system gets fresh data from the database to create new POJOs on each request. This is made quite fast by the fact that the database has memory buffers holding all its "warm" data, data that has been accessed in the last few minutes or so. However, the round trip to the database server takes some time (under 1ms if local).

The database buffering system can be called the "database cache". It holds data in memory, avoiding disk reads of the table data. The database carefully manages the data so that it always is using the correct current data.

It is important to give the database enough memory to hold all the app's commonly used data. If you have a server with 64GB of memory, you could use 32GB for this buffering system. This is particularly important for Oracle, which is installed typically with way-too-small buffering memory, like 200MB, and strictly lives within this budget. Mysql uses OS buffering by default, so can use the whole memory of the system it's on, sometimes adversely affecting other programs on the same machine.

Immutable domain objects don't really need new copies on each request. Even with a plain JDBC implementation, we could set up our own app cache for these objects, using the "application" scope: the ServletContext has get/setAttribute just like request and session. Such variables last the lifetime of the servlet. But there's no need to do this until performance is a problem. Don't forget KISS.

In an enterprise app, we would more probably using JPA, and it supplies this kind of caching service for us.

Using JPA's shared cache (instead of the default of providing a private copy of each domain object on each request)

Shared Cache : owned by the EMF (EntityManagerFactory, the object that gives out the EMs (EntityManagers)

One per app server (web case)
One per client (client-server)
Clearly OK to cache immutable objects in shared cache

Mutable objects: if only one application server and the JPA app is the only app accessing the database, OK to cache mutable objects, because the JPA runtime updates the shared cache on committed changes to entities.

Then the cache represents the database state to the extent it knows it. It is smart enough to make a private copy of a domain object and put it in the em when a transaction updates an object gotten from the shared cache.

Can’t cache mutable objects in general because there is no mechanism to notify the cache about changes in the database.

However, when the system gets busy enough to worry about caching, the usual first step is to multiply the app servers, while keeping a single database server (and then the trick of letting the shared cache handle mutable objects stops working). The Java of the app is a heavier load than the database actions it causes, so it is common to have a dozen or more app servers all running against one database.

In some cases, it’s better not to cache immutable objects – if there are millions of them, the shared cache gets bigger & bigger… a performance problem

Another approach: We can load the immutable objects before we start the Tx. – still serializable

end up with shorter Tx, shorter lock periods, better performance

Create em with no (explicit) Tx yet
Em.find(id) for immutable objects, end up with detached objects
Do Tx: immutable objects will be detached, but usable for their data

Distributed caches – for multiple app servers: use only if database gets overloaded, and database has no other apps. Then the distributed cache is holding the database state for the system

Separate software
Tricky, need consultant
But you’ve got a very busy website at this point, hopefully plenty of money...

Example: InfinitySpan, which has a Spring Boot starter, but many others

Internationalization

We want to be able to display web pages in Chinese, Arabic, Russian, etc., so we need to use Unicode for text. Java uses Unicode for String, so we’re OK there.

Look at http://kermitproject.org/utf8.html to see snippets of many languages, all encoded in UTF-8 and displayed by your browser.

To fit all these language characters in one coding, Unicode codes are 18 bits long. Java uses 16 bit chars, so sometimes it needs to use multiple chars to hold one Unicode char, but this is extremely rare in practice. Being a little sloppy, we say “Java uses Unicode for chars”, but it really uses UTF-16, a slightly compressed Unicode. Some obscure characters need four bytes.

UTF-8 is the more common encoding outside Java. The 8 stands for the 8 bits in each byte used in the representation: one character takes 1-6 bytes, but for our use, most characters take only one byte (the ASCII characters). Thus this paragraph could be called ASCII or UTF-8.

Note: HTML5 defaults to UTF-8 encoding, a big improvement over HTML4, which defaulted to encoding "Latin-1" AKA "8859".

In both UTF-8 and Latin-1, ASCII characters qualify as valid characters. It only takes 7 bits to encode an ASCII character, and the 8th bit is 0 to fill one byte. In Latin-1, the 8th bit can be a one, so that 128 more characters can be encoded for European (Latin related) languages (accented chars, euro sign, etc.). In UTF-8, if the 8th bit is 0, it's an ASCII char, and if not, it's the start of a multi-byte char.

How this works is pretty impressive. Here is the compression table for Unicode values up to FFFF (see linked doc for values above FFFF)

From the standard http://www.faqs.org/rfcs/rfc3629.html

    Unicode value range         |        UTF-8 octet sequence

     hexadecimal range(bits)    |              (binary)

   -----------------------------|------------------------------------

   0000-007F (000000000xxxxxxx) | 0xxxxxxx

   0080-07FF (00000xxxxxxxxxxx) | 110xxxxx 10xxxxxx

   0800-FFFF (xxxxxxxxxxxxxxxx) | 1110xxxx 10xxxxxx 10xxxxxx

The first range above are the pure ASCII characters, all 7 bits long. Those same 7 bits become the UTF-8 value, along with a leading 0 bit. That's a nice feature: A pure ASCII text qualifies as UTF-8 without change at all (as long as the leading bits are 0).

The second range encodes non-ASCII codes that cover things like accented chars of European languages, special symbols, etc. These take two bytes of UTF-8 to hold their 11 significant bits.

The third range encodes non-ASCII codes of all the languages of the world except some added too late to fit (these are in the extended range not shown above.). Their 16 significant bits are encoded in 3 bytes of UTF-8.

This encoding preserves binary order: Note how the bitstrings on both sides of the table are in bitstring order (like char strings but using bits).

Also, each byte starts off with 2 bits that say whether this is a leading byte or a trailing byte. If a UTF-8 data stream is broken, the reader can figure out how to restart decoding by looking at these header bits and skipping trailing bytes until it reaches a leading byte, and then restart decoding characters.

End of material covered in class.

Example Characters

For example, the euro sign has Unicode 0x20ac (this is 16 bits, add two more binary 0s on the left to make 18 bits). Its UTF-8 encoding takes 3 bytes: e2 82 ac. Note that the high bit is on in these bytes, marking non-ASCII. In Java, we can use ‘\u20ac’.

To compare, ASCII ‘A’ has code 0x41, and its UTF-8 representation is just the same one-byte code 0x41, with high bit off.

More examples, showing bits:

a (0x0061) is coded by the first rule: 0061 = 00000000 0|1100001, UTF-8 code = 01100001 (8 bits)

€ (0x20ac) is coded by the third rule: 20ac = 0010|0000|1010|1100, UTF-8 code = 11100010 10000010 10101100 (24 bits)

™ (0x2122) is coded by the third rule: 2122 = 0010|0001 00|100010, UTF-8 code = 11100010 10000100 10100010 (24 bits)

Using UTF-8 in our JSP pages (so they produce HTML in UTF-8 for sure)

If we want our pages to work with older browsers, we need to tell the browser that we are using UTF-8 to make our UTF-8 HTML actually work for old browsers (soon this may not be necessary as HTML5 takes over). This is done by putting the content-type response header in the response that carries the HTML document back to the browser. See Murach, pg. 553 for response headers, including content-type. We tell JSP to do this as follows, as you can see in many jsps in our projects. This will work for HTML4 pages as well as HTML5.

<%@page content-type=”text/html; charset=UTF-8”%>

Note: Normally Java outputs using the local char set of the command environment, often not UTF-8, even though it is using Unicode internally. To override this default behavior, you can specify the char encoding in the various output classes (also in input classes), for example

OutputStreamWriter out2 = new OutputStreamWriter(new FileOutputStream("foo.txt"), "UTF-8"); // make a printwriter, so we can use println, etc PrintWriter out = new PrintWriter(out2);

To find the right character to use, the "character map" program of Windows is very nice. Find it in Windows Accessories in the start menu. Try looking up the euro sign, etc. Try typing 20ac in the “Go to Unicode” box.

Also, see the UTF-8 test page at www.fileformat.info.

To put the euro sign in HTML, you can specify it by its Unicode value as in € (using decimal value of 0x20ac) or € using the hex value. At least in some environments, you can use € or you can put in the 3 bytes of actual UTF-8. Your editor may switch representations on you.

For easy editing, you need a Unicode-supporting editor. MS Word and post-1998 Windows in general supports Unicode (though not in the command line environment, which derives from ancient DOS days). Eclipse supports it, at lease for HTML and JSP that declares its char-coding as above, although it doesn’t have a UI to show us what various codes look like (that I’ve found, anyway). For text files in eclipse, Project Properties>Resource allows setting the encoding.

There’s more work to do for internationalization: Need to translate app’s results too, error messages.

Note: we haven’t covered character sets in use in databases—we’ve been using using ASCII there. But we could use UTF-8. Handling multiple natural languages in one app is even more challenging. You need to be able to detect the user’s language (“locale”) and then switch to that language. Too complicated to cover here.

REST Web Services in Java

Web Services : two types : old SOAP and more recent REST, now more important, and simpler.

REST design Principles (assumes HTTP transport)

“Stateless” Client/Server Protocol: Each message contains all the information needed by the receiver to understand and process it. Simplest way. Note that SOAP Web services also follow this design point. So do microservices.
Use a set of uniquely addressable (by URIs) resources: “everything is a resource”, the RESTful way.
Use a set of well-defined operations that can be applied to all resources: use various HTTP verbs, the RESTful way

More advanced : use “hypermedia”: return a document with links to the user, giving the user choices of what to do next

Good book:REST in Practice, by Jim Webber, Savas Parastatidis, and Ian Robinson, O'Reilly, ISBN 978-0-596-80582-1 (at Amazon)

Can we recycle our webapp code for Web Services?

Our webapp services are stateless in a certain sense, but our apps are allowed to track a user, using session variables in the presentation layer.
Web Services are very much like our service API calls, which are "stateless" in the same sense.

Yes! Our whole architecture works for supporting web services (SOAP or REST) : just stop using sessions. Use the database for all data that lives more than one request cycle.

Example: Pizza Project:

One request: user specifies room, saved in session, in the presentation layer
Another request: user orders a pizza for that room, without saying what room it is. But the service API call (makeOrder) has the room specified, filled by presentation layer code from the session variable.

Turn the project into web services for pizzas: Drop the session variable.

Just specify the room on every request coming in via HTTP
makeOrder is already ready to do it this way: we can use the service layer as is.

Example: Music Project. Here we use a much bigger session variable for the Cart, which belongs to the user.

Turn the project into web services for selling guitars: We can put the cart in the database under an id and return the id, and provide an API for examining and changing the cart contents. There's actually a whole standard protocol for handling carts, part of cXML, discussed in cs637.

But for a more complete example, let's start with a simple CRUD service.

Example: Order Service

A CRUD service:

Create an order
Retrieve the order to check status
Update/replace the order
Delete the order

Maps into HTTP verbs as follows:

Verb URI Use

POST /orders Create new order, response header Location gives new URI

GET /orders/1234 Request state of order 1234

PUT /orders/1234 Update order 1234 in entirety

DELETE /orders/1234 Delete order 1234

From client viewpoint: Order a product:

POST XML or JSON for order to http://topcat.cs.umb.edu/orderService/rest/order

get back XML or JSON for order with id filled in, say order 22, status = PREPARING

This means this order’s resource is /orderService/rest/orders/22.

Find out the order status:

GET to /orderService/rest/orders/22, to get the current “resource” there, see same old status in the XML response.

GET to /orderService/rest/orders/22, to see if it’s ready, see status=READY in the XML, time to pick it up.

The idea of REST is to use HTTP directly, rather than reducing it to a carrier of SOAP messages. With REST, we use multiple HTTP verbs:

GET for reading data (no changes allowed in server!)
POST for creating new data items
PUT for updating old data items (in whole)
DELETE for deleting old data items

How do we specify the data item to work on?

In REST, we use individual URLs for data items, i.e. “resources”.

GET /orders/1234 read order # 1234 (in XML or JSON)

POST /orders add a new order (server determines new id).

(also PUT and DELETE)

How does the client find out the new id after a POST?

The HTTP Location header in the response gives this.

Example from the official Atom Spec RFC 5023. The client sends a POST request containing an Atom Entry representation using the URI /edit:

 POST /edit/ HTTP/1.1
    Host: example.org
    User-Agent: Thingio/1.0
    Authorization: Basic ZGFmZnk6c2VjZXJldA==
    Content-Type: application/atom+xml;type=entry
    Content-Length: nnn
    Slug: First Post

    <?xml version="1.0"?>
    <entry xmlns="http://www.w3.org/2005/Atom">
      <title>Atom-Powered Robots Run Amok</title>
      <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
      <updated>2003-12-13T18:30:02Z</updated>
      <author><name>John Doe</name></author>
      <content>Some text.</content>
    </entry>

The server signals a successful creation with a status code of 201. The response includes a Location header indicating the Member Entry URI of the new Atom Entry saved on the server, and a representation of that Entry in the body of the response.

    HTTP/1.1 201 Created
    Date: Fri, 7 Oct 2005 17:17:11 GMT
    Content-Length: nnn
    Content-Type: application/atom+xml;type=entry;charset="utf-8"
    Location: http://example.org/edit/first-post.atom
    ETag: "c180de84f991g8"  

    <?xml version="1.0"?>
    <entry xmlns="http://www.w3.org/2005/Atom">
      <title>Atom-Powered Robots Run Amok</title>
      <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
      <updated>2003-12-13T18:30:02Z</updated>
      <author><name>John Doe</name></author>
      <content>Some text.</content>
      <link rel="edit"
          href="http://example.org/edit/first-post.atom"/>
    </entry>

So we see that REST uses HTTP directly, including its headers. So we don’t have a “REST protocol” or “REST envelope” in the usual sense, although there are strong conventions on how to use HTTP for REST.

Mystery of the SLUG header. This is a suggestion by the client for a good name for the new resource. The server complied here by naming it .../first-post.atom. Apparently "slug" comes from newspaper lingo for a temporary name for a developing story.

We can say it’s a software architectural style for distributed systems. It was created by Roy Fielding, and described in his widely-read PhD thesis. He got a degree in 2000 after doing a lot of important work on the HTTP and URL specs.

Note the Wikipedia article on REST. See the chart there of HTTP methods and their typical use in REST, originally from Richardson and Ruby, “Restful Web Services”, the first important book with implementations. Since that book, PATCH has been added as an HTTP method to the chart.

Note that REST/HTTP:

uses HTTP verbs for its actions (as detailed in big table)
uses HTTP URLs/URIs for its own resources
uses certain HTTP headers: Content-Type, Accept, Location
uses HTTP/MIME types for content: text/xml, etc., now called Media Types
uses HTTP success/error codes: 200, 201, ... 404, 405, 500, see Chap. 18

Obviously you need to be sure of all these aspects of HTTP to deal intelligently with REST. REST is simple enough to be implemented in almost any language, but let's look at Java cases.

REST in Java

REST service code

We have seen how we can use wildcard URLs in web.xml to send multiple request URIs to a servlet.

Then once there, we can access the URI and interpret it, do the desired action, and send back an appropriate response.

REST client code

For GET, a primitive client can use Java URL class, url.openStream() gives an InputStream of the result.

For GET and POST, can use java.net.HttpURLConnection See example. Also shows Apache HTTP

Want XML/JSON on the wire, so need to use XML/JSON parser, XML generation.

For sizable project, should use a framework

JEE has spec JAX-RS for a server-side REST framework

Jersey provides an implementation of this, with client-side libraries as well. For the server side, it provides a servlet to use in any servlet container (tomcat or Glassfish or whatever) to handle incoming REST requests.

Good tutorial: firstRest at vogella.com. Note you don't need to use gradle: see https://howtodoinjava.com/jersey/jax-rs-jersey-hello-world-example/ for a Maven pom.xml.

Spring Boot makes it easy to support REST web services. See TutorialsPoint tutorial.

@RestController
public class ProductServiceController { 
}

Familiar @RequestMapping:

@RequestMapping(value = "/products")
public ResponseEntity<Object> getProducts() { }

ResponseEntity represents the whole HTTP response: status code, headers, and body.

Just remember that it is straightforward to implement Java web services with the help of JAX-RS or Spring Boot .
Then the client can be written in Javascript, or as a mobile app, or Java.