CS444 class 24

Thursday, Nov. 30 More on hw5

Semaphore Option

To user programs, these provide mutex and process synchronization, but to the kernel, they are software objects with certain properties.

Luckily, we can support semaphores without preemption, so this project is independent of the other. Note: use Tanenbaum’s definition of up and down, pg. 110.

The kernel data structures needed for semaphores: an array of Semaphore structs, one for each possible semaphore, with sem. count and list of processes (pids) waiting on it (using intqueue, provided), or equivalently, a decision to scan proctab[i] looking for the waiters on a certain semaphore when necessary.

With the array, we have a simple way of keeping track of each semaphore: just use the index of its semaphore struct in the array. Thus we have sem #0, sem #1, and so on. We can return these id’s to the user.

Each semaphore can have its own waitcode, or event number to use with sleep() and wakeup(). Just use the numbers above the other event codes already in use.

We also discussed the challenge of using wakeup, which wakes up all processes waiting on a waitcode, when up should only let one process run. Clearly sysdown needs to have a loop of sleep calls, so that some processes go back to sleep after a wakeup, while one gets to return to the user.

We can have sysup “appoint” the one process that will return. You can define a new field of PEntry to record a semaphore waiter’s status, for example p_semstatus, i.e., this process is OK to return from down, or not. There are other ways to do this too, but make sure that multiple semaphores don’t interfere with each other, and multiple up’s unblock different processes, even in the case that the first one has not yet run.

For example, suppose processes 2 and 3 are waiting on sem #2, and then process 1 does an up on sem 2. The code in sysup can decide that process 3 should unblock and mark proctab[3].p_semstatus = TRUE, proctab[2].p_semstatus = FALSE, and then call wakeup(waitcode-for-sem-2). Then in sysdown, where process 2 and 3 are looping over sleep calls, they both return from sleep, but process 2 goes back to sleep, while process 3 returns to the user.

Pseudocode for up:

determine event number for this semaphore
increment sem count as appropriate
when appropriate based on the count, locate a waiter on this semaphore, and “appoint” it to return from down by setting its sem-status member of PEntry (or whatever you have designed). Make sure the other waiters on this semaphore are not so marked.
call wakeup(event)

Psuedocode for down:

determine event number for this semaphore
decrement sem count as appropriate
if appropriate, do loop over sleep(event) calls while there are waiters for this semaphore and this process is not “appointed” by its sem-status field in PEntry..

Recall from hw3 that it’s fine to call sleep with IF=0, in fact, it’s required. But the kernel mutex only lasts until another process is chosen to run, since that process will restore IF to 1 when it finishes its system call and returns to user (if not sooner.) Thus the call to sleep represents a “mutex hole”, and the global variables need to be in a consistent state at this point.

Timer Device: needed for hw5’s preemptive scheduler

Last time: hardware review on timer.

The periodic interrupts from the timer are called ticks, and the interrupt handler is called the tick handler. The kernel uses it for:

timekeeping, incrementing the kernel-global system time variable
counting down the quantum and causing preemption when it reaches zero
handling periodic housecleaning actions (garbage collection of various kinds)
other time-delayed actions such as network timeouts

Now the second point is crucial to hw5. More exactly, the quantum is the tick count left for this process to use before it gets preempted. We’ll add a new member of PEntry, p_quantum for it.

Then we have (first draft of code) in the tick handler: Here QUANTUM should be 4, for 4 10-ms ticks.

if (--curproc->p_quantum == 0) {

/* preempting, later start over with full quantum */

curproc->p_quantum = QUANTUM;

schedule(); /* find another process to run */

}

Note that we don’t need to preempt process 0, because it runs as long as no user processes are runnable. But schedule() will choose it again if necessary, so no special test for it is really needed.

Note that this code is part of an interrupt handler, so it runs with IF=0, since we never use sti in our interrupt handlers (Linux does use sti in interrupt handlers, to allow recursive interrupts.) Thus IF = 0 at the call to schedule(), as it requires.

Note that with this simple re-initialization of p_quantum, when schedule() runs, all the preempted processes have the same p_quantum value, so it isn’t useful to use p_quantum in the decision-making (which process to schedule next.) You can leave schedule() itself as before.

Note that we can’t tell whether a tick happened from user code execution or kernel code execution, unless we add an “in-kernel” flag. So we’re going to be preempting kernel execution (parts that are running with IF=1) as well as user execution. All user code runs with IF=1.

Note that user-code preemption is the main goal of preemption. The extreme case is an infinite loop in user code. That would hang the whole system without user-code preemption.

Is it bad to preempt kernel execution? No, it’s what Solaris and (parts of) Win2K does, in order to responsive. Consider process A running a long system call and process B becoming ready because of a disk-done interrupt. If kernel preemption is allowed, B can run before A finishes that system call, a good thing if B is more important than A.

Classic UNIX (pre-90s) disallowed preemption of kernel code, a practice known as “limited preemption.” Linux is in-between on this, evolving from limited preemption (up to and including v. 2.4) to unlimited preemption (v. 2.6, just released end of ’03.) (There is no released v. 2.5, because they use odd numbers for development versions.)

What’s the downside of allowing kernel preemption? More forms of race conditions. Consider the line of code in hw3.soln, incrementing a kernel global variable:

number_of_zombie++;

Suppose this is implemented by two instructions, one to read the variable, and another to write the new value. Then suppose process 1 executed the first instruction and then got preempted. Process 2 executed both instructions, rereading the same value that process 1 saw (old) and writing old + 1. But when process 1 resumes, it also writes old + 1. One increment action got lost. This is called the lost update problem.

Solution? Use kernel mutex. For hw5, make IF=0 for this code. (Or verify that it is only one instruction, since interrupts appear to occur strictly between instructions.)

But before you start worrying too much, note that this problem required a global variable, whereas most of the increments occur to local variables. Local variables have homes on the process stack, and each process has its own private stack. Thus local variables are process-private, and can’t suffer lost updates. It would be good to do an audit of updates to global variables in your hw5 kernel code to check for such problems.

Note that using curproc in an interrupt handler is unusual, not done in i/o (not timer) interrupt handlers. These execute as a “guest” in a “random” process image, and shouldn’t access anything about the current process. But the timer interrupt is being used to time the current process, so this access is just right.

Last time we discussed hw5’s uprog123.c, which uses the same user programs as in hw3, and how it runs now and after preemption is added. Recall that we expect a preemption during main1’s computation loop, which runs more than 40 ms.