CS444 class 22

CS444 Class 22

Finished with memory management coverage. Skipping Chap 4, on to Chap. 5, already partly covered.

Intro to hw4—go over handout distributed last class

semaphores—new syscalls, but that should be no problem: new code in ulib.s, sysup, sysdown, etc. in kernel

See Tanenbaum pg. 128, discussed in class 12 for definitions of up and down.

Need sem struct in kernel, with sem count and a list of waiting processes or equivalent

—can use intqueue, supplied queue of ints (pids), or use proctab[i].waitcode values. (with a unique waitcode for each semaphore)

sysdown(int semid): decrement semid’s count if it’s positive, sleep if appropriate

sysup(int semid): increment semid’s count as appropriate, call wakeup_one

For testing with testmutex, is not a full test because it uses only one waiter. Prodcons is a better test once testmutex is working.

Semaphore Implementation

To user programs, these provide mutex and process synchronization, but to the kernel, they are software objects with certain properties.

Luckily, we can support semaphores without preemption, so this project is independent of optional part preemptive scheduler. Note: use Tanenbaum’s definition of up and down, pg. 128.

The kernel data structures needed for semaphores: an array of Semaphore structs, one for each possible semaphore, with sem. count and list of processes (pids) waiting on it (using intqueue, provided), or equivalently, a decision to scan proctab[i] looking for the waiters on a certain semaphore when necessary.

With the array, we have a simple way of keeping track of each semaphore: just use the index of its semaphore struct in the array. Thus we have sem #0, sem #1, and so on. We can return these id’s to the user.

Pseudocode for sysup:

Make sure this semaphore has been created, else fail
If there are any waiters on this semaphore, wake one up
Else increment sem’s count (this is the “saved wakeup” mentioned by Tanenbaum)

Psuedocode for sysdown:

Make sure this semaphore has been created, else fail
If sem’s count == 0, call sleep(event)
Else decrement sem’s count.

For example, suppose processes 2 and 3 are waiting on sem #2, and then process 1 does an up on sem 2. The code in sysup can decide that process 3 should unblock and call wakeup_one(3). Then in sysdown, where process 2 and 3 are in sleep calls, process 3 returns from sleep, but process 2 continues to sleep.

Recall from hw3 that it’s fine to call sleep with IF=0, in fact, it’s required. But the kernel mutex only lasts until another process is chosen to run, since that process will restore IF to 1 when it finishes its system call and returns to user (if not sooner.) Thus the call to sleep represents a “mutex hole”, and the global variables need to be in a consistent state at this point.

You can define a new field of PEntry to record a semaphore waiter’s semaphore, for example p_sem. This is only valid if the process is blocked in semaphore wait (need a new waitcode for this). This is the simplest way to know who the waiters are. Alternatively, maintain a queue of waiter-pids in the semaphore struct.

Chap. 5 I/O

Reading: Chap 5 to pg. 332, 339-347, skip DMA, 348-360, Chap 10, Sec. 10.5: 771-773, 775-776

Block vs. char devices (mainly a UNIX idea)

Each device under UNIX has a special file, also known as “device node”. Tan., pg. 734 example is “/dev/lp” for a line printer device. Classically, device nodes were kept in directory /dev. They are not ordinary files, but rather filenames and associated information about a device.

When you display them with “ls –l”, you see a “c” for char device or “b” for block device as the first character of the listing, as you would see a directory marked “d”. For example a line printer would be a char device:

ls –l /dev/lp

crw-rw-rw 1 root ... /dev/lp

On Solaris, the devices have been reorganized into many subdirectories by device type, and with symbolic links to other names, so it’s a bit hard to find the actual device nodes. For example, on ulab, we have /dev/board1, the serial line to mtip system 1. We have to follow two symbolic links before we find the device node:

blade57(6)% ls -l /dev/board1

lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf

blade57(7)% ls -l /dev/board1

lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf

blade57(8)% ls -l /dev/ttyrf

lrwxrwxrwx 1 root 30 Sep 19 2002 /dev/ttyrf -> ../devices/pseudo/ptsl@0:ttyrf

blade57(9)% ls -l /dev/../devices/pseudo/ptsl@0:ttyrf

crw-rw-rw- 1 root 26, 47 Dec 3 08:59 /dev/../devices/pseudo/ptsl@0:ttyrf

The c shows that we finally found the device node.

How about a disk? We do “df –l” to see local disks (the disk-free command), and then trace through the symbolic links:

blade57(10)% df -l

/ (/dev/dsk/c0t0d0s0 ): 3308702 blocks 625943 files

/proc (/proc ): 0 blocks 7876 files

/dev/fd (fd ): 0 blocks 0 files

/etc/mnttab (mnttab ): 0 blocks 0 files

/var/run (swap ): 1487776 blocks 53320 files

/tmp (swap ): 1487776 blocks 53320 files

/space (/dev/dsk/c0t0d0s3 ):21629310 blocks 1585108 files

blade57(14)% ls -l /dev/dsk/c0t0d0s3

lrwxrwxrwx 1 root 38 Sep 19 2002 /dev/dsk/c0t0d0s3 -> ../../devices/pci@1f,0/ide@d/dad@0,0:d

blade57(15)% ls -l /dev/dsk/../../devices/pci@1f,0/ide@d/dad@0,0:d

brw-rw-rw- 1 root 136, 3 Jul 9 2004 /dev/dsk/../../devices/pci@1f,0/ide@d/dad@0,0:d

There’s the b for block device. We see that this device node is readable, which exposes the data on this disk to reading—should be protected. Luckily there is no valuable data on this disk, just some old logging files. I’ll report this problem to operator.

Special files can be used like filenames in open. Devices are folded into the filesystem name space, so that open, read, write, close can be used with devices just as if they were files.

For Tan’s simple device node, we could

fd = open(“/dev/lp”, ...)

write(fd, “testing”, 7);

close(fd);

And write the message on the line printer. Since the cp command is doing such an open (possibly through fopen()), the cp command can be used with the device node too:

cp file /dev/lp

This is not recommended for normal use, however, as it “jumps the queue” set up by lpr. You can try it on your home Linux system.

This is part of the device-independent i/o story. We can do the same to a floppy drive, /dev/fd, although it will not be in a filesystem on the floppy disk this way. We are treating the floppy disk as a sequential stream of bytes.

We can even do it with a hard drive, though it will wipe out any filesystem on that partition. The other direction is much safer:

more /dev/hda0

Where /dev/hda0 stands for the first partition on drive hda, the first IDE drive on a Linux system. Of course you need to run as root to do this successfully on a normally protected system.

Windows: for list of device names for disk devices use “mountvol”. This includes floppy and CD drives. Here’s the volume name of my floppy disk:

\\?\Volume{cb15a4be-b302-11d7-a922-806d6172696f}\

Android Notes

Android, being Linux, uses device nodes. For example:

ls –l /dev/binder

crw-rw-rw---- 1 root 10, 53 <date> /dev/binder

This shows the device underlying the Binder IPC service invented for fast IPC in Android. It uses memory mapping tricks to whisk data from one process to another. It’s a char device, shown by the leading “c”, and thus has system calls open, close, read, write, ioctl, …. Of these, open and ioctl are the ones in common use. The messages are all handled by ioctl, the miscellaneous-services call, rather than by read or write, possibly to indicate that you can’t send arbitrary blocks of data through this system.

Other device nodes: /dev/graphics/fb0 for frame buffer, /dev/event0 for keyboard.

Also, you can use remote gdb to debug C programs (“native code”, running on the Linux level). After setting up things with “adb” and starting the appropriate gdb on the development host, you can use the familiar “target remote” command to connect to the Android device over TCP:

(gdb) target remote :5039 to connect to the Android device using TCP port 5039

Note corr. to pg. 340: line 6, “interrupt vector” should be “interrupt vector number”, so that “interrupt vector” can be saved for use as the address of the interrupt handler, held in the interrupt gate on x86, in IDT[nn], where nn is the interrupt vector number.

p. 332 Memory-mapped I/O (not to be confused with memory-mapped files!!)

We are familiar with I/O instructions of the x86, not memory-mapped i/o. For example,

inb %dx, %al with 0x3f8 in dx

This instruction reads one byte from i/o port 0x3f8, the COM1 UART’s receiver register, into CPU register AL, the low 8 bits of EAX.

Compare this to the memory-reading instruction

movb 0x3f8, %al

This instruction reads one byte from memory location 0x3f8 into CPU register AL.

The x86 architecture specifies two spaces, a 32-bit memory space and a 16-bit i/o port space, and it is always clear from the instructions which one is being used.

When the reads or writes go over the bus, the memory address or i/o port number ride on the address bus, which of course has to be big enough for the 32-bit memory addresses, and is half idle in the case of an i/o port number. In addition to the address lines, there is a control line called M/IO# which says which kind of address it is.

Thus it is easy for devices to know that an i/o port # is current on the address bus and they should check further to see if it’s asking for them.

On the other hand we’re using 33 lines (32 address lines plus M/IO#) somewhat inefficiently—we could carve off a little bit of the memory 32-bit space for i/o addresses and drop the M/IO# line. That’s the idea of memory-mapped i/o.

Aside: Memory-mapped i/o was invented by the genius Gordon Bell, who masterminded the PDP11. He also engineered the Unibus backplane, with a standardized bus protocol that allowed new boards to be plugged in. Then UNIX came along with its plug-in device drivers, at no charge, just sign this nondisclosure. The result was dynamite for University researchers and others who wanted to use computers for measurement and control as well as general computing. Our department’s first computer was a PDP11 with 256K of memory and 10M of disk, bought in 1978 for about $30K, soon upgraded with a giant 80M disk. We taught several classes at once on this system, using version 6 and then version 7 of the original UNIX sequence.

In memory-mapped i/o, a certain region of memory addresses are turned into i/o register addresses. Under UNIX/Linux or other sealed-up OSs, this region is available only to the kernel, except possibly for the video memory. I drew a picture of kernel address space with the memory-mapped region in the high addresses. Then a mov instruction to 0xffffffe4 for example, would be putting data in a device register.

Tan ex. “test port_4” in assembler, where port_4 is some particular address in the hardware register region. One nice thing about memory-mapped i/o is that you can use C directly for accessing device registers.

char * printer_status_reg = 0xffff10cc

char * printer_data_reg = 0xffff10ce;

while ( *printer_status_reg != READY) /*loop until ready */

*printer_data_reg = p[i]; /* send one char to printer */

With memory-mapped i/o, you can write a whole device driver in C, except for that little assembler envelope around the interrupt handler. Of course we put little routines around inb and outb, so we can write mostly in C in spite of the need to use special i/o instructions.

The x86 actually does memory-mapped i/o as well as i/o instructions. The PC system architecture reserves the upper part of the first 1M of memory space for this. The video memory is in this area.

p. 336 DMA—can skip coverage in Tan. Used for network interfaces as well as disk-like devices.

Goals of i/o software, pg. 343.

device independent i/o, helped by uniform naming, also known as the filesystem “namespace”, a syntax of string names for files and also devices represented by “special files” like /dev/lp.

synchronous vs. asynchronous—we’ve covered this before:

synchronous output: request output, wait (block) until it’s done
asynch. output: request output, do other things, get notified that output is done
but “output done” can have two meanings, at least for user output using an OS:

Ø actually out on device

Ø fully accepted by OS, on way to device—normal meaning, allowing write-behind

pg 344: Programmed i/o—polling device status, no interrupts.

Note polling loop on pg. 285 is using memory-mapped i/o—be sure you know the x86 equivalent. Also move the semicolon for the empty while body to it own line!

pg. 346: Interrupt-driven i/o—covered in detail already. Note again memory-mapped i/o in example. Don’t try to interpret this code exactly, as it is not exactly right.

pg. 347: I/O using DMA (can skip): a whole block of data gets moved to or from a device and memory, with just one interrupt. As with simpler devices, the data gets copied from the user to a kernel data area before being sent to the device, and is received first in kernel memory, then copied back to user. The kernel blocks the user process as needed in read or write. The interrupt handler calls wakeup.

pg. 351: Device Drivers—one for each kind of device.

pg. 348: Layers pic. Fig. 5-11. Note that “interrupt handlers” are actually parts of device drivers, not a separate kind of software. Each device driver has code called during read/write system calls, and in different functions, the device’s interrupt handler(s). Between them is a shared data buffer, itself in kernel data. Producer-consumer relationship here. In other words, there are two layers here that belong to device drivers, the upper layer system call code and the lower layer interrupt handlers.

pg. 349 Steps in interrupt handler—mostly review, but some comments re real OS’s. More detailed than list on pg. 93.

save regs in assembler envelope, call C int handler
set up context—not necessary for trusted interrupt handlers that simply borrow the environment of the current process (our assumption). But dynamically loaded drivers may not be trusted, and should be “bottled up” (more below)
set up stack—also not necessary for trusted int. handlers—can borrow kernel stack from current process (our assumption).
Ack the PIC. Reenable interrupts if necessary in the PIC (ignore this). Still have IF=0 in Linux here.
Copy regs to proc. entry—not necessary unless doing a process switch, move to process switch list
Run ISR, the interrupt handler of the device driver. In some cases, ISR will make IF=1 for parts of it execution.
Check for another process to run, do process switch if indicated (preemption)
part of process switch
part of process switch
part of process switch
Return to assembler envelope, restore regs saved in 1., do iret. (added)

Steps of process switch—should be separated from interrupt handler description. It’s called from there and many other places. Note that it is called at the end of the interrupt handler, as we are doing in the tick handler for hw4, for the same reason: the main body of the interrupt handler is supposed to finish very fast, to be responsive to its hardware and not prevent the rest of the system from progressing.

Process switch steps: our schedule() in hw3, but add MMU actions for real OS

1. Save old process’s CPU registers in its process table entry, including MMU variables (missing from pg. 349)

2. Setup MMU/TLB to map in new process’s VA space. Also, flush caches, including TLB. (step 8)

3. Load newly chosen process’s CPU registers from its process table entry (step 9)

4. Now the new process is running (no need to explicitly “start it” as listed in Tan’s step 10.

At Tan., pg 353, Device-indep. i/o—implementation

At syscall level with UNIX, we see write(fd, buffer, nbytes) where the first argument can correspond to many different devices or files, specified at open time. fd = open(“/dev/lp”....) for example.

Similarly, in the hw, we have write(dev, buffer, nbytes), where dev can correspond to different devices: 0 and 1 for serial devices, and we can add device 2 for LPR. Then we have in ioconf.c:

struct device devtab[] = {

{0, ttyinit, ttyread, ttywrite, ttycontrol, 0x3f8,(int)&ttytab[0]}, /* TTY0 */

{1, ttyinit, ttyread, ttywrite, ttycontrol, 0x2f8,(int)&ttytab[1]},/* TTY1*/

{2, lpinit, lpread, lpwrite, lpcontrol, ... }

};

This is the core data structure to a device-independent i/o system, the device switch table. Classic UNIX has “cdevsw” for character devices and “bdevsw” for block devices that have similarly tabulated function pointers for device drivers. It allows a new device driver to be added without changing the kernel source except for this one file, plus the additional device driver code.

Linux reimplemented the same basic idea, though uses a hash table rather than an array, allowing expansion at runtime. See pg. 776 for a representation of the Linux char device switch.

Execution of device-independent i/o calls.

1. The user program calls read or write or … with a buffer in user data to supply or receive the data.

2. The device-independent i/o layer contains the “sysread” and “syswrite” or equivalent functions called from the system call dispatch code. In these functions, the appropriate device driver function is looked up in the device switch table and called, causing execution to arrive in the right device driver.

To be more exact, execution arrives in the specific function of the device driver, in “ttyread” or “ttywrite” for a serial device, or “lpread” or “lpwrite” for the LPR device. There it enqueues or dequeues data between the user and kernel data areas, and calls sleep if appropriate, waiting for the actual i/o to be reported by interrupts. In some cases, this code has to “start” the i/o in the device.

3. The interrupt handler handles the actual i/o with the hardware device, and calls wakeup when needed to get the read/write code going again.

4. The hardware is at the bottom.

In the UNIX/Linux case, covered pp. 775-776, the device number used to look up the right row in the device switch table (or hash table) is called the major device number. We can see these two numbers by doing “ls –l” on the special file:

ls –l /dev/lp

crw-rw-rw- 1 root 4, 1 <data> /dev/lp

Here the major device number is 4, so cdevsw[4].write will be accessed by a write to this device, and will be “lp_write” or some similar name in the lp device driver.

pg 355 Buffering. There are so many ways described here it’s confusing. Concentrate on case (c), the most common case, where the kernel has a data buffer, and the user has another one.