Daily Course Notes for CS430/630

Database Management

Betty O'Neil

Notes based on Pat O’Neil’s notes

 

Class 1.

 

Hand out:  Syllabus. Go over Syllabus.  Grader, DBA.  Text. If you don't have a text, sit with someone who does.

 

Prerequisites:  UNDERGRADUATES:  CS240, taking CS310 (must finish).

     Graduate students, to learn C, UNIX, sit in on CS240. If you know C but not our UNIX, a UNIX Guide is on-line on our course Web Page:

     http://www.cs.umb.edu/cs630/ This Web page also has notes

 

Homework:  Read Chapt. 1 (quickly <= 1 hr), Chapt. 2 through Section 2.4.

    Homework due in two+ weeksAssignment on class web page.

     Note: names using cs630 include stuff for cs430. Basically hw1 is problems at end of Chapter 2 that are unsolved in the back of the book. NOTE SOLVED PROBLEMS IN TEXT CAN HELP YOU.

Submitting hw: legible hardcopy.

 

Apply for course CS430/CS630 BY NEXT TIME WITHOUT FAIL. (Even if you already have a UNIX account.)  Show ID if new to school. If already have UNIX Account, this will put you on the right machine, allow you to use Oracle Database System and receive class mail.

 

BUY THE TEXT AND BRING IT TO CLASS NEXT TIME!!!

 

Take Roll.

 

GET BUDDY PHONE #, IF MISS CLASS — CATCH UP.

 

Now, Quickly, through ideas of Chapter 1, what we will learn.  Just try to get the idea for now — helps later when will cover again.

 

DEF.  A database management system, or DBMS, is a program product for keeping computerized records (on disk); but DBMS is quite complex!

 

What we will be learning in this course is underlying concepts you need to make a database work for you:  not how to build one but how to use one that’s already set up.

Compiler or Operating System is also a program, but database more complex.

 

To begin with we will be dealing with the Relational Model, so all our data will look like tables.  E.g., see database, Figures 2.1 & 2.2, pgs. 27, 28.

 

We will be using these tables throughout the course.  Draw tables & two rows on board.

 

Nomenclature:   tables (or relations): CUSTOMERS, AGENTS, PRODUCTS, or ORDERS, columns (define attributes of the data), rows  (or tuples).

 

Go over column names and meaning.  Note aid unique, aname might dupli­cate, month values in ORDERS.  This is not realistic, e.g., anames are too short, would want more columns, time more accurate.

 

Can answer questions about the data with queries.  SQL Example: List agent ID, agent name, and percent commission of all agents in New York.

 

select aid, aname, percent

  from agents where city = 'New York';  /* leave up */

 

Now list the agent ID and agent name of all agents who placed an order in January.  How would you do this? Need to connect TWO TABLES.  In SQL:

 

select a.aid, a.aname from agents a, orders o

    where a.aid = o.aid and o.month = 'jan';

 

 

Relational databases hold data in tables with rows and columns, that is, in strict tabular structure. This is known as “structured data”.

 

Another way of holding data is in XML, which is tree-like rather than tabular, and can hold structured or somewhat unstructured data, such as marked-up text descriptions.

 

Current databases including Oracle can store and query XML data as well as tables. Hybrids are also possible, where relational tables hold XML documents in column values.

 

Next term: CS639 on XML technologies, including web services.

 

Databases in the real world

 

Databases are used in all big businesses for data involving money, and any other data that is important to keep safe, such as customer contacts, inventory, etc.

 

Huge web sites may or may not use relational databases for web content: eBay does, Amazon and Google do not. Google has championed the “MapReduce” alternative model, but it’s more low-level than the relational model. They have so much data that they are willing to work harder on the programming to save computation. See the Wikipedia article on MapReduce if you’re interested.

 

On the other hand moderate-sized web sites almost always use relational databases for dynamic content.

 

This term: CS636 Java web apps, CS637 .NET web apps, both using relational databases.

 

Chapter 2.  RULES of Relational Model.  Tables have followed rules so all commercial prod­ucts are the same, like typewriter keyboards.

 

For years these rules were as in Sect. 2.3.  E.g., no column of a table should have a complex column value (a record), only a simple type (an integer).  Similarly, can't have multiple values for a column of a row. (See pg 5, students table is illegal by this rule.)

 

But there is a slightly more flexible model, known as the Object-Relational model, that does allow complex column values (and collection types as well:  Can have a Set, Multiset, or List of values in a single row column).

 

We will not go into this alternative in great detail, but it is covered in Chapter 4 of the text.

 

 

Relational Algebra.  A non-machine query language that gives a good idea of power of query languages:  make new tables out of old tables by simple operations.  E.g., List aid, aname, percent for agents residing in New York.

 

Rel. Alg.   (AGENTS where city = 'New York) [aid, aname, percent]

 

Gives us simplest possible approach to understanding what SQL is aiming at. Relational Algebra in Chapter 2, SQL in Chapter 3.

 

Chapter 3.  SQL queries.  Turns out to have more power than Rel. Alg.  Also covers how to Insert new row in table, Delete old rows from table, Update column values of existing rows in table.

 

SQL has changed over time, and products implement a common core plus various extensions of the core (still covered by the standard), plus their own extensions beyond the standard. This book attempts to show you what is in the common core and what is more advanced, so you can write portable apps.

 

There are still data applications that don't fit the Relational Model, for example, scientific computations involving huge arrays that evolve: too many data changes!

 

Chapter 4.  Object-Relational SQL (Informix, Oracle).  We will skip object-relational SQL, instead cover procedural lan­guage within the ORACLE database in which one can write programs, PL-SQL.

 

Chapter 5.  C (or Java) programs with embedded SQL statements:  Idea is to present a Menu to naive users so they never have to un­derstand SQL. (SQL of Chapter 3 is called ad-hoc, or interactive SQL). 

 

Reason:  error-prone job to compose SQL, bank tellers and airline reserva­tion clerks don't want to have to do complex stuff like this.  DANGEROUS to let people in a hurry update data using SQL: might corrupt database!

 

To avoid problems of concurrency,  Update transactions needed.

 

Term "User Friendly" is too vague;  there are several different types of DBMS users:

    -End Users:

         -Naive Users                            (Use Menu interface)

         -Casual Users                         (Compose SQL)

    -Application Programmers         (Write Menu applications)

    -Database Administrators          (DBAs)

 

 

Chapter 6.  Some of what DBAs do.  Design database.  Lots of complex commands to construct a database.  Begin with Logical Design.

 

Logical design:  break down all the data into tables so tables have good properties.  E.g., how handle employee with unknown number of depen­dents?  Can get subtle.

 

E.g., relatively simple school application:  Departments contain teachers and are responsible for offering subjects, made up of multiple classes.

 

The subjects have prerequisites.  The classes are offered in given periods in specific rooms.  Students have schedules (each term), drop and add classes, get grades, have library records, tuition payments, health insur­ance, etc.  How break all these up?

 

One basic idea is that entities (real world objects:  students, rooms, even class-periods) each deserve their own table.  They have attributes (student name and id, room number and floor, period length and how grouped:  3/week, 2/week).

 

There are relationships between entities:  above, enrollment is a rela­tionship between student and course.  But in a more complex scheme, need to relate teachers with periods, rooms, and subjects to form a class of­fering, then the class offering and student are related in a student sched­ule and a distinct class schedule.

 

Chapter 7 and following, put off to CS634. How to create databases so as to guarantee certain constraints hold, security, Views (creating virtual tables), etc. Physical design:  how to place tables on disk so minimize ac­cess time;  hard problem, because of trade-offs. Index creation.  Design for shared data.

 

All performance related stuff is put off to CS634.  But note idea of data sharing during update, something we run into in pro­gramming Embedded SQL.  Problem where one person looks at data while another person is changing it.  Sum up two account balances (rows of dif­ferent tables), might get sum that is false if other application has just subtracted from one to transfer money.  Need something clever to handle this, call it a "Transaction".

 

Next time start on Chapter 2 in detail.

 

NEXT time, bring text