CS444 class 24

CS444 Class 23

More on memory management: Kevin Amaral described his “depth bomb” program to severely exercise ulab’s memory management, which we had previously observed to be hardly used. It had had only had 22 revolutions of the clock daemon in almost 3 years of uptime. His program created about 2000 processes, each malloc’ing 4KB of memory and then using it, forcing the OS to provide memory for them. That’s 8GB of memory, but ulab has only about 400MB of physical memory, so it had to do a lot of paging to keep up. It did survive, and after these runs shows 39 revolutions of the clock daemon. I suspect that many malloc’s failed, because ulab has only about 1GB of swap space, and the brk system call used in malloc allocates swap space for the malloc’d memory. But even if only 800MB of mallocs succeeded, that’s still much more than the 400MB of physical memory, so has the same effect.

I assume Kevin checked that no other users were working on ulab at the time he ran the program, since this would be significant “denial of service” to them.

Chap. 5 I/O

Reading: Chap 5 to pg. 332, 339-347, skip DMA, 348-360, Chap 10, Sec. 10.5: 771-773, 775-776

Block vs. char devices (mainly a UNIX idea)

Each device under UNIX has a special file, also known as “device node”. Tan., pg. 734 example is “/dev/lp” for a line printer device. Classically, device nodes were kept in directory /dev. They are not ordinary files, but rather filenames and associated information about a device.

When you display them with “ls –l”, you see a “c” for char device or “b” for block device as the first character of the listing, as you would see a directory marked “d”. For example a line printer would be a char device:

ls –l /dev/lp

crw-rw-rw 1 root ... /dev/lp

On Solaris, the devices have been reorganized into many subdirectories by device type, and with symbolic links to other names, so it’s a bit hard to find the actual device nodes. For example, on ulab, we have /dev/board1, the serial line to mtip system 1. We have to follow two symbolic links before we find the device node:

blade57(6)% ls -l /dev/board1

lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf

blade57(7)% ls -l /dev/board1

lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf

blade57(8)% ls -l /dev/ttyrf

lrwxrwxrwx 1 root 30 Sep 19 2002 /dev/ttyrf -> ../devices/pseudo/ptsl@0:ttyrf

blade57(9)% ls -l /dev/../devices/pseudo/ptsl@0:ttyrf

crw-rw-rw- 1 root 26, 47 Dec 3 08:59 /dev/../devices/pseudo/ptsl@0:ttyrf

The c shows that we finally found the device node.

How about a disk? We do “df –l” to see local disks (the disk-free command), and then trace through the symbolic links:

blade57(10)% df -l

/ (/dev/dsk/c0t0d0s0 ): 3308702 blocks 625943 files

/proc (/proc ): 0 blocks 7876 files

/dev/fd (fd ): 0 blocks 0 files

/etc/mnttab (mnttab ): 0 blocks 0 files

/var/run (swap ): 1487776 blocks 53320 files

/tmp (swap ): 1487776 blocks 53320 files

/space (/dev/dsk/c0t0d0s3 ):21629310 blocks 1585108 files

blade57(14)% ls -l /dev/dsk/c0t0d0s3

lrwxrwxrwx 1 root 38 Sep 19 2002 /dev/dsk/c0t0d0s3 -> ../../devices/pci@1f,0/ide@d/dad@0,0:d

blade57(15)% ls -l /dev/dsk/../../devices/pci@1f,0/ide@d/dad@0,0:d

brw-rw-rw- 1 root 136, 3 Jul 9 2004 /dev/dsk/../../devices/pci@1f,0/ide@d/dad@0,0:d

There’s the b for block device. We see that this device node is readable, which exposes the data on this disk to reading—should be protected. Luckily there is no valuable data on this disk, just some old logging files. I’ll report this problem to operator.

Special files can be used like filenames in open. Devices are folded into the filesystem name space, so that open, read, write, close can be used with devices just as if they were files.

For Tan’s simple device node, we could

fd = open(“/dev/lp”, ...)

write(fd, “testing”, 7);

close(fd);

And write the message on the line printer. Since the cp command is doing such an open (possibly through fopen()), the cp command can be used with the device node too:

cp file /dev/lp

This is not recommended for normal use, however, as it “jumps the queue” set up by lpr. You can try it on your home Linux system.

This is part of the device-independent i/o story. We can do the same to a floppy drive, /dev/fd, although it will not be in a filesystem on the floppy disk this way. We are treating the floppy disk as a sequential stream of bytes.

We can even do it with a hard drive, though it will wipe out any filesystem on that partition. The other direction is much safer:

more /dev/hda0

Where /dev/hda0 stands for the first partition on drive hda, the first IDE drive on a Linux system. Of course you need to run as root to do this successfully on a normally protected system.

Windows: for list of device names for disk devices use “mountvol”. This includes floppy and CD drives. Here’s the volume name of my floppy disk:

\\?\Volume{cb15a4be-b302-11d7-a922-806d6172696f}\

We’re at Tan., pg. 339. Quick review of material we already looked at near the start of the term

Note corr. to pg. 340: line 6, “interrupt vector” should be “interrupt vector number”, so that “interrupt vector” can be saved for use as the address of the interrupt handler, held in the interrupt gate on x86, in IDT[nn], where nn is the interrupt vector number.

Precise Interrupts—we’ve covered this already, but reread.

We’ll finish Chap 5 next time.

Timer Device: needed for hw5’s preemptive scheduler (optional part)

Recall $pcex/timetest.c from CS341: shows how to set up periodic timer interrupts.

The periodic interrupts from the timer are called ticks, and the interrupt handler is called the tick handler. The kernel uses it for:

timekeeping, incrementing the kernel-global system time variable
counting down the quantum and causing preemption when it reaches zero
handling periodic housecleaning actions (garbage collection of various kinds)
other time-delayed actions such as network timeouts

Now the second point is crucial to hw5. More exactly, the quantum is the tick count left for this process to use before it gets preempted. We’ll add a new member of PEntry, p_quantum for it.

Then we have (first draft of code) at the end of the tick handler: Here QUANTUM should be 4, for 4 10-ms ticks.

if (--curproc->p_quantum == 0) {

/* preempting, later start over with full quantum */

curproc->p_quantum = QUANTUM;

scheduler(); /* find another process to run */

}

Note that we don’t need to preempt process 0, because it runs as long as no user processes are runnable. But schedule() will choose it again if necessary, so no special test for it is really needed.

Note that this code is part of an interrupt handler, so it runs with IF=0, since we never use sti in our interrupt handlers (Linux does use sti in interrupt handlers, to allow recursive interrupts.) Thus IF = 0 at the call to scheduler(), as it requires.

Note that with this simple re-initialization of p_quantum, when schedule() runs, all the preempted processes have the same p_quantum value, so it isn’t useful to use p_quantum in the decision-making (which process to schedule next.) You can leave scheduler() itself as before.

Question from Vy Nguyen: I thought interrupt handlers were supposed to finish very quickly, but preemption will mean that one interrupt handler execution will not be finished until the process is rescheduled—isn’t that a problem?

Answer: This is a real concern and the safety of allowing a call to scheduler() in an interrupt handler does depend on calling the scheduler at the end of the int. handler, after it has done all its hardware actions and kernel global variable accesses other than schedule itself. At this call-to-schedule point, all that is left of the int. handler execution is a little bit of stack, built on top of the process's stack (kernel stack for real UNIX, combo stack for hw4). This stays on the system while other processes execute, and finally gets popped off when this process is rescheduled. This little bit of stack is passive, not a real problem.

Note that we can’t tell whether a tick happened from user code execution or kernel code execution, unless we add an “in-kernel” flag. So we’re going to be preempting kernel execution (parts that are running with IF=1) as well as user execution. All user code runs with IF=1. That’s OK, we just have to be careful to do cli() where needed.

Note that user-code preemption is the main goal of preemption. The extreme case is an infinite loop in user code. That would hang the whole system without user-code preemption.

Is it bad to preempt kernel execution? No, it’s what Linux and (parts of) Windows does, in order to responsive. Consider process A running a long system call and process B becoming ready because of a disk-done interrupt. If kernel preemption is allowed, B can run before A finishes that system call, a good thing if B is more important than A.

Classic UNIX (pre-90s) disallowed preemption of kernel code, a practice known as “limited preemption.” Linux is now using unlimited preemption, but evolved from limited preemption (up to and including v. 2.4) . Linux 2.6 onwards is “fully preemptive”, meaning that that (a subset of) kernel code can be preempted.

What’s the downside of allowing kernel preemption? More forms of race conditions. Consider the line of code in hw3.soln, incrementing a kernel global variable:

number_of_zombie++;

Suppose this is implemented by two instructions, one to read the variable, and another to write the new value. Then suppose process 1 executed the first instruction and then got preempted. Process 2 executed both instructions, rereading the same value that process 1 saw (old) and writing old + 1. But when process 1 resumes, it also writes old + 1. One increment action got lost. This is called the lost update problem.

Solution? Use kernel mutex. For hw5, make IF=0 for this code. (Or verify that it is only one instruction, since interrupts appear to occur strictly between instructions.)

But before you start worrying too much, note that this problem required a global variable, whereas most of the increments occur to local variables. Local variables have homes on the process stack, and each process has its own private stack. Thus local variables are process-private, and can’t suffer lost updates. It would be good to do an audit of updates to global variables in your hw5 kernel code to check for such problems.

Note that using curproc in an interrupt handler is unusual, not done in i/o (not timer) interrupt handlers. These execute as a “guest” in a “random” process image, and shouldn’t access anything about the current process. But the timer interrupt is being used to time the current process, so this access is just right.

Hw4 Preemption (optional part for implementation, but be sure to understand it)

First, let’s look at the hw3 (no preemption case) output, and then see how it changes for hw4 with preemption. Note that one char takes about 1 ms to be output at 9600 baud, the baudrate of systems 1-10. Systems 11-14 have much slower processors, so not recommended for testing sched.

For uprog[123].c, recall that the processes run like this, where ● denotes the computation loop:

Main 1 runs like this: aaaaaaaaaazzz●AAAAAAAAAA

main 2 runs like this: bbbbbbbbbbb

main 3 runs like this: ●ccccccccccc

Using hw4 provided files: no preemption

Actual output, except kprintf “SHUTTING DOWN”:

*aaaaaa**********cccccccc*ccaaaazz*z*********AAA*AAAAAAAbb*bb*bbbbbb*

We see ten stars for a loop execution, each 10ms, so 100 ms. When a process starts outputting to an empty output queue, it can add 6 chars to it without blocking, so stays CPU-bound for that time. After 6 chars, it blocks, and so do other proceses, so the system becomes i/o bound. Thus the first 6 chars of output in the strings above correspond to CPU-bound time for the system, followed by i/o bound time until the system starts executing a loop again—I’ll mark those output characters:

|-----------| |-------------| i/o bound periods

aaaaaa●cccccccccccaaaazzz●AAAAAAAAAAbbbbbbbbbbb

^ note pause here with no output, as main3 does CPU loop

^main1 does its CPU loop

^proc 3 exit (6 c’s in Q)

^proc 1 exit (3 A’s in Q)

^proc 2 exit (6 b’s in Q)

18 chars output during write-behind, overlapped with other activities

25 chars output during i/o bound time, CPU nearly idle (about 25 ms)

Total runtime: 26 stars, so 260 ms

When the system is i/o bound, every time an output interrupt happens, wakeup unblocks all the user processes (that haven’t yet exited), and they all go promptly back to sleep in ttywrite (one gets to enqueue another char, but this takes very little time). Then process 0 runs for most of the millesecond that the next char takes to output.

Hw4 Execution with preemption

Preemption of process 1 by process 2 is marked by %|(1-2) in the debug log. The tick occurs every 10ms. In the i/o bound case, the tick happens once every 10 chars, or possibly every 9 chars (allowing for some overhead.) We’ve settled on QUANTUM = 4, so at the 4^th tick in a CPU-bound process (for example, process 1 of uprog123), preemption occurs. So a quantum lasts 40 ms. Each loop now run as ●●● (two executions of 40 ms followed by one of 20 ms) instead of ● (one execution of 100 ms)

The following is a run for this case. We can see an effect of preemption from watching the run: instead of the system stopping output, stuck in the computation loop in main1 or main3, it keeps doing output from the other processes while doing that computation. Note smaller dots for smaller CPU bursts.

Actual output, except kprintf “SHUTTING DOWN”: no i/o bound time!

*aaaaaa****aaaazz****z****bbbbbb**ccccc*cc***bbbbcc***AAAAA*AAAAAc*

aaaaaa●aaaazz●z●bbbbbb●ccccc…

^proc 3 starts running CPU loop, with 6 a’s in Q, gets preempted

^proc 1 enq’s 6 chars, blocks, 2 blocks too

^proc 3 continues running CPU loop, gets preempted

^proc 1 enq’s last char, start its loop, gets preempted

^proc 2 runs, enqueues 6 b’s, gets blocked

^proc 3 runs loop again, finishes it, enq’s c’s

No i/o bound time:

18 chars output as write-behind

25 chars output overlapped with CPU-bound processes because of preemption

Total runtime: 24 stars, so 240 ms

Faster because of more i/o overlap with computation.

In the preemptive case, output is allowed during the CPU bound execution. Since the 6 chars take 6 ms to output, and the loop in main1 runs 40 ms before being preempted, there are (about) 34 ms of CPU expended at the spots on the CPU-bound bar, just before the 2 preemptions, and another spot where the loop is finishing up.

What we’re seeing here is that preemption makes the system as a whole more responsive and more efficient, allowing more overlap between computing with the CPU and doing i/o.