Wed., Sept. 4--first class: Intro, System Calls, (Process) Virtual Machine concept

Syllabus

Goals—

Why is this material important to you?

Ø  As a programmer, you need to understand the environment your programs run in. How are your programs protected from other programs running on the same system? If you write an app for a smart phone or PC that handles money, can another program use the account numbers your program got from the user? If not, why not?

Ø  If your boss asks you how hard it would be to port a Windows program to Linux, are you ready to think about this?  What’s the same and what’s different between these two major OSs?

Ø  If you consider a job in embedded systems, and see that it uses a real time OS, is that a show stopper? Or can you think: no problem, I’ll just find out the system API and go from there.

Reading: Tanenbaum, Chap 1, specifically, Sec. 1.1, [1.2 optional history], 1.4, 1.5, 1.6, 1.8

Note: pg. 38 on “PDA” OS’s: i.e., smart phones, add iOS (iPhone OS), Android

Quick history:

1970: UNIX born at AT&T

1979: UNIX installed here for CS courses, etc.

1991: Linux born, reimplementation of system architecture, API

1996: SAPC environment developed here from Linux sources

2003: Android Linux born, offshoot of Linux

2008: first Android phone available, now largest base.

First topic: What is an OS?

Ref: Tanenbaum, p. 2, simplified pic:  showing how apps request system services as they run on the OS, i.e., the kernel. The app does a system call to the OS requesting a specific service (read something, write something, etc.), which manages the system’s hardware to do the request.

-----------
apps     <-- programs running on the system (includes compilers, shells, etc.)

-----------     ßsystem calls allow apps to request services: the system API
OS       <-- the kernel, doing the work requested by system calls
-----------    ß the kernel has to manage the actual “ugly” hardware
hardware
-----------

In fact, the OS provides apps with a virtual machine (process virtual machine) and itself is a program working with the actual hardware.

Reference on virtual machine terminology: Wikipedia Virtual_Machine

The virtual machine is the OS-provided program execution environment, for example, what you have already been using for your C programs in CS341.  The actual hardware was covered in CS341, so you have some background on the two sides.

What about the Java “virtual machine”?  It is a virtual machine in the same sense, that is, it provides an execution environment for programs, in this case just Java programs.  The Java VM sits on top of the OS VM, so there is another layer in the picture for Java programs.  C programs run right on the OS, with the help of the C library.  (You could draw a layer for the C library, but the C/C-lib layer boundary wouldn’t be as strong a division as the others.)

We will concentrate on the C programming environment, since it is so close to the OS.

UNIX/Win32 virtual machine (app execution environment)

Note: Win32 (or WinAPI, a synonym) is the name of the system API for Windows NT/2000/XP/Vista/7/8.  Win32/WinAPI includes 64-bit address support, which is sometimes called, to try to make it clear, "Win32 for 64-bit Windows." A Wikipedia article says the proper name is “Windows API”, or WinAPI for short. But Tanenbaum and a lot of people use Win32, so we will.

The other OS family is UNIX/Linux, grouped as “UNIX”, since Linux is truly a kind of UNIX. UNIX also supports 32 or 64 bit addresses. Android is here.

What’s a non-flat memory?  In a non-flat memory, various pieces, usually called segments, are only separately usable. Windows 3.1 (80s, early 90s) had this system. The programmer had to fix a segment of 64KB before using it. It was horrible for programming sizable programs.  In UNIX or Windows, with flat memory, we can malloc 10 MB of memory (say) at a time and if it succeeds, we are guaranteed one stretch of addresses covering 10 MB of memory, and each byte of it has an address different from any address we are already using.

In our department, we have some old Solaris UNIX machines and a growing number of Linux and Windows XP machines.  The homework will be done on the Solaris system “ulab”, also known as blade57.cs.umb.edu.  All the “blades”, blade01.cs.umb.edu, …, blade77, are also running Solaris.

User Memory Layout, flat address space. I like to draw it horizontally, because I think of it as the floor on which things are built, but the text draws it vertically (pg. 51 for example): Each byte of memory usable by the program running in the virtual machine has a unique address. We can think of all these addresses as a sequence, and there is more structure—code uses the lower addresses and static data somewhat higher addresses, for a simple C program:

                                                                                                                <add cloud up here holding kernel>

                 code      data         C lib DLL       stack
                 ---       ---            ----          <----
        |--------|--------|------------ … -------------------|

Address 0        A1        A2                             Amax       0 < A1 < A2 < Amax

 

First consider a 32-bit system, one with 32-bit addresses, UNIX/Linux/Windows.

0xf = 1111 binary, so 4 binary 1’s for each f in an address.  0xffffffff has 8 f’s, so has 32 bits of 1s.  This is the highest possible 32-bit address.

Thus for a 32-bit system, Amax <= 0xffffffff.  64 bit systems can have higher Amax. More on this case later.

Note: this is how far we got on 9/4/13, which had some interruption due to an apparent projector problem. 

Important powers of 2:  1G = 230, 1M = 220, 1K = 210

So 0xffffffff = 232 - 1 = 4G -1.  Thus the maximum possible 32-bit user address space is 4 G bytes in size, the full 32-bit address space size. Of our example OS’s, only Sun Solaris UNIX provides this maximum possible size.

There can be holes in the available memory for a program, stretches of addresses that cause segmentation faults when referenced.  We still call the memory "flat," because one sequence of memory addresses still can describe the whole thing, and every byte of usable memory has its own unique address.  

Note: malloc is not a system call. It is a C library call implemented by appropriate system calls that request the OS to assign more usable memory to the program.

The OS code, the kernel, is not in this space but off somewhere else—shown in a cloud on the board.  The system call causes execution to jump right out of this user space into the kernel.  In the kernel, the system call implementation code executes to do the service, and then returns to the next user instruction after the system call instruction.

Solaris 32-bit UNIX, gives user space the entire 32-bit address space. Thus the Solaris user address space is 4 G bytes in size. Other UNIX implementations provide 3-4G.  32-bit Linux provides 3G. 32-bit Windows provides 2G by default, 3G by special boot command for Advanced Servers.  The size of the user memory space (above 1G) is only relevant for the largest apps, notably huge database systems.  Nice diagrams for Linux and Windows 32 bit systems.

DLL: dynamic-link library, or just dynamic library in UNIX parlance, code that can be called by a program but is not stored in the program’s executable file, Instead, it is brought into user memory at runtime.  Functions are located in the DLL via “dynamic linkage” at runtime.  Once this linkage is done, calls are direct, since the DLL is in user memory.

User Memory Layout for Solaris UNIX (32 bit): 4G user address space (the first 0x10000 bytes are purposely made unavailable to trap null pointer accesses)

                                                                                                                <add cloud up here holding kernel>

                 code      data              C lib DLL    stack
                 ---       ---                -----      <----
        |--------|--------|------------ … -------------------|

Address 0     0x10000    0x20000                       0xffffffff

 

User Memory Layout for Win32 (32 bit): 2G user address space Amax = 0x7fffffff, which has the leading bit = 0, rest 1s, so only half of 0xffffffff

                                                                                                                <add cloud up here holding kernel>

                 code      data            C lib DLL    stack

                 ---       ---               -----      <----

        |--------|--------|------------ … -------------------|

Address 0                                                 0x7fffffff

 

32-bit Linux: Amax = 0xbfffffff, so 3GB of user space. (sf08.cs.umb.edu for example)

 64-bit systems: much bigger user space, no longer bottled up in the 32-bit address space.  But not really “64 bit” addresses, more like 48 bit.

Example: linux1.cs.umb.edu, a 64-bit Linux system you have access to.

Finding out the layout of user memory with a simple experiment you can try.

Create hello.c, a trivial C program

vm22$ gcc hello.c

vm22$ gdb a.out

GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08

Copyright (C) 2011 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

For bug reporting instructions, please see:

<http://bugs.launchpad.net/gdb-linaro/>...

Reading symbols from /home/eoneil/444/test/a.out...(no debugging symbols found)...done.

(gdb) b main

Breakpoint 1 at 0x4004b8

(gdb) r

Starting program: /home/eoneil/444/test/a.out

 

Breakpoint 1, 0x00000000004004b8 in main ()

(gdb) p/x $sp                    ßprint stack pointer in hex

$1 = 0x7fffffffe6d0

(gdb) p/x &main

$2 = 0x4004b4

(gdb) p/x &_end

$3 = 0x601028

(gdb) p/x &printf

$4 = 0x7ffff7a8d6d0

(gdb) q

A debugging session is active.

 

        Inferior 1 [process 22822] will be killed.

 

Quit anyway? (y or n) y

From this, we see that the stack grows down from 0x7fff ffff ffff, the code starts at 0x400000, and data starts at 0x600000, and the C DLL is around 0x7ffff7a8d6d0, below the stack but at the high end of user memory.

0x7fff ffff ffff has 15 bits of 1s from 7fff, plus 32 bits of 1s from ffff ffff, for a total of 47 bits in use in user space addresses.  The 32 bits provide 4GB of user space, and the additional 15 bits a factor of 32K (0xffff is 64K and this is half of that), so the total user address size is 32*4 G*K = 128 TB of user space.  That should be enough for anything we might need!  At least for the next 20 years...

Of course this is just user space, not allocated memory. The OS does a “shell game” to put memory where it’s needed under the user space. We’ll study that in more detail under memory management.

Next time: what about Android?