CS636 Class 23 SQL Injection, More on scaling up

Demo: debugging. We managed to set a breakpoint in a JSP, but had trouble finding the values for request variables--they are well hidden deep inside the request object (itself shown on the Variables display).

Look at musicStoreJPA: see lack of service layer. Controller code calls DAO-code directly. Our controllers should call the service layer API, which in turn calls our DAO.  Be sure to use our service layer and DAO layer in your project.

From Eclipse help, slightly edited to clarify:

Debugging a servlet running on the local system using eclipse (checked on Windows)

The debugger enables you to detect and diagnose errors in your application. It allows you to control the execution of your program by setting breakpoints, suspending threads, stepping through the code, and examining the contents of the variables. You can debug a servlet on a server without losing the state of your application.

To debug a servlet on a server:

Web attack by SQL Injection: What it is and how to avoid it

There is a possible flaw the admin app of music2/3 that should be considered

Login UI takes in username and password

Suppose DAO does  select  count(*) from userpass where username=’andrea‘and password=’sesame‘

Sounds OK, but is prone to “SQL injection” ploy--

                Adding on to app’s SQL by putting the right text in a user input field

My break in :

                Username   ‘ or ‘a’ = ‘a
                Password    ‘ or ‘a’ = ‘a

                Success login!

Made query into

  select  count(*) from userpass
     where username=’‘ or ‘a’ = ‘a‘and password=’‘ or ‘a’ = ‘a

(user input underlined) which counts every line in the table, resulting in a successful login.

Fixes:

or, the most common approach:

In general be wary of using strings from users in SQL!

Notes on Advanced Web Apps, New Trends, etc.

We have been studying the most basic current JPA setup for a Java web app:

The single database may be very powerful, handling up to 100 TPS or somewhat higher. That's 360,000 TP/hour, a huge load. Ref: TPC non-clustered performance data. For high performance, the database server is on a different machine than the application servers.

The single Java executable is not a problem for small or medium-sized sites. All the long-term data is in the database, which usually doesn't need to change for a software upgrade. The only users who could be affected are those with long-running sessions. Web containers are required to maintain session objects across redeployments of the web app (Servlet spec, sec. 10.8). However, objects hanging off the session object may be no longer recognized if their classes were edited (an argument for using just ids, Strings, etc. in session objects).

Scaling up to seriously large apps

There are two directions for scaling up, system size or software complexity.  For 1000 TPS, you need to go to machine clusters for the database, and cacheing.

Code complexity: more code means more developers. For more than 75 developers (one estimate), the monolithic code becomes a problem. One little change by group A of say 8 groups means rebuild and redeployment. At that point, the system needs to be re-architected into multiple executables (components) that cooperate and provide not only services to the world, but also services to each other.  For example, Netflix had a monolithic code for years, and then was broken up into components with services, known as the "microservice architecture".  Here each component has exclusive use of its part of the database data, so can be redeployed with database changes if necessary. The other components can access the data via the services offered by that component.

Microservice architecture vs. our setup

Although there is little discussion of layering in microservice articles, there is a strong notion of the application's API, the most important layer division in our systems. That's what the "service" part of the name means. And services here mean stateless services, although that is usually assumed, not stated. The scope of transactions is limited to a single service call, as in our setup, but this one call is accessing only part of the persistent data, so is less powerful. In some cases, data may become inconsistent, at least for a while. There is an idea of "eventual consistency" to help out.  Changes in one area of data may notify other components about the change, so the other component can eventually fix up its part. Obviously this increases the complexity of the code.

Advanced Domain Object Design

There is a whole theory of this. The first book to read is Domain Driven Design (or DDD for short, 2004) by Eric Evans. He explains how to identify and define entities and value objects, two kinds of domain objects. Then object graphs that hang together to describe something are called aggregates. Then aggregates can be grouped into a "bounded context" which has a consistent model and language and is usually supported by a certain development team. The idea of bounded context is used for components in the microservice architecture, a later development.

Then read Implementing Domain-Driven Design, by Vaughn Vernon, 2013, to see the more current thought on system design using DDD. The newer architecture is called Hexagon or Ports and Adapters. As in our systems, the domain objects are simple POJOs, unaware of how they are persisted, etc. The DAO code is now relegated to "adapters". On close inspection, this DB adapter for a component is called as needed, so is in effect a lower layer. In architecture diagrams, you usually see a hexagon, representing the various adapters as faces (not necessarily 6 of them, by the way), and inside it, another hexagon around the core, which contains the domain code. The inner hexagon represents the service API, or set of services. Following DDD, there are aggregates and bounded contexts. The transactions are expected to change only one aggregate at a time, so again there can be inconsistency, and events are used to notify other aggregates as necessary to obtain eventual consistency. A transaction may read related aggregates, of course.

Clearly DDD and microservice architecture can be combined, based on decomposition of the whole app into bounded contexts.

Music Project

Aggregates: Product with Tracks, Invoice with items, User

What about Downloads--are they details of Tracks or Users or separate? There are no object refs from Track or User to bind them in. So perhaps best to consider them as a separate aggregate. But then the ref from Download to Track breaks another rule of IDD that refs (even ids) from another aggregate should only point to the root object of an aggregate. So that argues for attaching the Downloads to the Track as details, and replacing the User ref with a user id. This also aligns with the use of Downloads in the app: tracking popularity of songs, not activity of a certain user.

What about Cart? It's not persistent, does that matter? Yes, the LineItems in the Cart are new objects, so can't be turned into ids yet, since the DB assigns the ids. Those LineItems could contain product ids instead of Product refs and save quite a bit of memory, and this memory is longer term than usual domain objects. Good idea for larger sites.  The User ref could be turned into a user id, with less impact.

Rule from IDDD: Preferably, don't use object reference from one aggregate to another. This saves memory in the server as less data is dragged around. We've been using a ref from LineItem to Product. What changes would be needed to change this to a product id in the LineItem object? 

Use eclipse to find uses of LineItem.getProduct(). Find one in PresentationUtils.displayCart. Clearly can use the product id to get the product here, using new getProduct(pid) call. Also a few in Cart, but only to get the id, so easier with other approach.

Similarly, we have Download with refs to Track and User. Find use of getUser() in PresentationUtils.downloadReport.  Also getProduct() there.

Conclusion: Straightforward to replace object ref with id

Advantage: Smaller object graphs coming from DAO finders.

Rule from IDDD: change only one (persistent) aggregate in a service call. This involves transaction scope.

Mutator service calls:

processInvoice: only changes an Invoice

addDownload(userid, Track) --changes only Downloads

addItemToCart, etc.--changes no persistent objects

checkout(Cart, userid)-- changes only Invoices

registerUser(String, String, String)-- changes only Users

We have been following this rule without trying.