CS444 class 16

Class 16: handout on proc.h and asmswtch.s

Scheduling and intro to hw3. Tan., pg. 145-163, needed for hw3, coming

We’ve already discussed CPU-bound = compute-bound processes. Similarly i/o-bound. Also preemption and preemptive schedulers.

Scheduling is about sharing resources appropriately among processes. There are 3 major resources to focus on:

CPU
memory
I/O system

Each process wants some fraction of total resource in each direction, and if these fractions (in some direction) add up to more than 100%, then things slow down, even with good scheduling.

Tanenbaum concentrates on CPU scheduling here. Let’s downplay batch and real-time systems and concentrate on Interactive scheduling.

Interactive systems have users inputting commands and expecting fast response. Response Time, pg. 151.

Round-robin—simplest scheme for interactive systems. We’ll use this in hw3.

Quantum = CPU time a process/thread gets before it is preempted, about 50 ms. (pg. 155 20-50 ms)

Often a thread blocks before it reaches this point.

Quantum is set at about 50 ms, as compromise between throughput and delay in responsiveness.

A preemption causes a “process switch” or “context switch”. The second term is too vague, since context just means state. And sometimes it is used for just a mode switch, which happens in a system call. A system call causes execution to go from user mode to kernel mode in the same process. So it’s not a process switch, and in fact is a much lighter-weight action than a process switch. Good thing, since system calls (and mode switches in interrupts) happen much more often.

Process Switch is a mysterious operation. Recall that the CPU is handed over to a process for their 50 ms, then an interrupt occurs (or a system call) and the kernel decides to hand the CPU over to another process. But the kernel is using the CPU to make this decision!

Priority scheduling. Note the UNIX “nice” command, that allows you to run a user process at a lowered priority

Real-time systems—just know what they are.

Thread scheduling: In both Solaris and Win2K, threads are the primary scheduling unit. For each thread, the process it belongs to is known, of course.

Solaris has different scheduling classes, with different policies: SYS, TS (timesharing), RT (real-time), IA (interactive). In a multi-CPU system, you can bind a CPU to a thread. In the RT class, you can even turn off interrupts on that CPU.

Look at proc.h in handout. Very simple process entry, since no “current directory” or “open files” to worry about.

Mainly CPU state, saved at process switch, plus process status, and what the process is waiting for, if waiting (blocked). Also the exit code, just for zombies.

Process switch: from process i to process j: a function implemented in assembly language.

Recall idea of C scratch registers: allowed to be overwritten in any called function. Gives us some leeway here.

Job of asmswtch from process i to process j:

1. Save CPU non-scratch registers in process entry proctab[i] (here *oldpentryp)

1. Load CPU non-scratch registers from process entry proctab[j] (here *newpentryp)

What about EIP, also known as PC? It is changing as this code is running, so doesn’t have a clear value. But what we really need to know is where this function was called from, the “return PC”. So that’s what’s saved here, and used in the return from here.

Look at asmswtch.s in handout: Here is the actual CPU-state switch. It’s tricky to save and restore the CPU registers while using the CPU to do it, but that’s what’s going on here. The ESP points to the whole stack (with kernel stack built on top of user stack), so the swap of ESP represents a swap of program-execution state. EFLAGS is also swapped, with its important IF flag.

I drew a picture of memory in a horizontal line, with the proctab array showing the four PEntry structs, and inside each, the saved-register area. The CPU was represented as an oval with a set of registers inside. We considered a process switch from process 1 to process 2. Asmswtch copies the actual-CPU-registers out of the CPU and into the saved-register area of process 1's PEntry, and then copies the registers from process 2's saved-register area into the actual CPU. These were shown with big fat arrows up and down. The four process stacks were put on the memory line too. The ESP register in the CPU pointed into process 1's stack before the switch, but into process 2's stack after the switch. This way, the whole execution state is switched off, that is, what the current function should return to, etc.

The “saved PC” (i.e. EIP) is actually asmswtch’s return PC. This trick allows us to start a new process with asmswtch as well as switch processes once we’re going. We can put ustart1 in the “saved-PC” spot in a PEntry, and it will be pushed onto the stack in asmswtch and then used in ret, causing a jump to ustart1.

We see that asmswtch takes two arguments, pointers to the old and new PEntry’s. A first try at using it could be asmswtch(&proctab[0],&proctab[1]), to switch from process 0 to process 1, once kernel execution has turned itself into process 0 and set up proctab[1]. But we need good stuff in proctab[1] to load into the CPU.

We can use asmswtch to start a user process as well as switch processes later. Just set up the processes PEntry with:

“saved” ESP = appropriate stack start (more on this soon)

“saved” PC = code’s start address, for example ustart1

“saved” EFLAGS = 0 except IF = 1, so user code runs with interrupts enabled.

“saved” EBP = 0 for clean debugging backtraces

p_status = RUN.

and it will be started by an asmswtch call with second argument pointing to this PEntry. The very first call will be from process 0, the kernel itself, to user process 1, like this, from C code:

asmswtch(p0,p1)

where

p0 = &proctab[0] = pointer to PEntry for process 0

p1 = &proctab[1] = pointer to PEntry for process 1

This will switch from process 0 (the kernel initialization turned process) to process 1 (the first user process). It copies the “old” CPU registers to the p_saved_regs array in proctab[0], and gets new values for them from proctab[1].p_saved_regs.

To get started, code it like this and see it actually work, and later morph it into better general purpose code. Note that you don’t need to explicitly set up saved registers for process 0, because they will just be overwritten in this first call anyway.

Note that each process needs its own stack, so you need to decide on where these are, for example, growing down from 0x380000, 0x390000, 0x3a0000, and the original one at 0x3ffff0. For example:

proctab[1].saved_regs[SAVED_ESP] = 0x380000;

But it’s better to name this constant with #define.

hw3’s scheduler

Scheduler setup, kernel global variables:

proctab[], curproc: global, used by tunix.c and sched.c

Recall our discussion of PEntry structs, one for each process in the proctab array. There are 4 spots in the array, one for process 0 and 3 for the user processes 1, 2, and 3.

process 0 derives from the startup execution, i.e., the kernel initialization in tunix.c morphs itself into a process by filling in proctab[0] and curproc. It is always runnable, so the scheduler can always find a process to run.

Scheduler API (not encapsulated: uses global proctab[], as do other functions in the kernel)

sched.c will contain the scheduler code—here are its major functions.

schedule()—look for a user process to run, else choose process 0, call asmswtch
sleep(event)—block the calling process by changing p_status to BLOCKED, p_waitcode to event, and then calling schedule().
wakeup(event)—loop through proctab, changing all processes blocked on this event to p_status = RUN.

Where are these called from?

sleep—called from ttywrite (to replace the busy wait)

wakeup—called from the tty output interrupt handler, when a new spot in the output queue becomes available.

schedule—called from process 0 code, to get started, and as an “idle loop” to try to hand off the CPU to a user process. Also from sysexit.

These are all critical code, so should run with interrupts off. If you call wakeup only from int handlers, of course it will get IF=0 automatically, since we never turn ints back on in int handlers. You could put checks in to make sure IF=0 ((get_eflags()>>9)&1 gives IF)

Note that wakeup is a “broadcast” action, so multiple processes may wake up for just one char-spot in the tty output queue, for example. Also note that wakeup doesn’t call schedule, so it doesn’t actually get a process running, it just marks it runnable. Sometime later, another process calls sleep, and that calls schedule, and chooses this one to run. Or process 0 calls schedule, and that does the trick.

The only tricky coding in sched.c is schedule(), because it calls asmswtch, a completely weird function. I’d code it like this:

Decision code, end up setting new curproc, but also knowing oldproc, pointers to proctab entries.
asmswtch(oldproc, curproc)
return

You don’t have to do it this way, but be aware that asmswtch does return after the given process is rescheduled, and then it should return from schedule and go on with its business (unless the process is exiting). Don’t let it fall into inappropriate code because you think it never returns!

The calls into the scheduler are straightforward to code except for the sleep call from ttywrite. There we had a busy loop of attempted enqueues into the output queue, and we need to replace it by blocking the process.

Might try:

if (enqueue(...) == FULLQ)
sleep(OUTPUT);
/* can we enqueue now? Not for sure! */

Suppose two processes, A and B, were unblocked by wakeup. If A gets to run first, it will use up the space in the output queue, but B will still be marked RUN, and will be chosen to run a little while later. It will return from sleep, but there will be no space in the output queue for it to use. It needs to call sleep again. So we really need something like:

while (enqueue(...) == FULLQ)
sleep(OUTPUT);

But what about mutex? We know from hw2 that enqueue needs mutex (cli()) to protect it against concurrent modifications to the queue data structure by itself and the dequeue in the interrupt handler, which can run in the middle of an enqueue unless we turn off interrupts.

Is that enough? Suppose we write guarded_enqueue(), which is just enqueue’s body surrounded by cli and sti (or set_eflags back to original value.), and similarly guarded_sleep(). Then we would have:

while (guarded_enqueue(...) == FULLQ)
guarded_sleep(OUTPUT);

But there’s a sneaky problem here called the lost wakeup problem. It was discussed in Tan, pg. 110 in the context of user-level sleep and wakeup calls. Luckily we are working inside the kernel here, so we are allowed to turn off interrupts as needed.

The problem is this. Suppose that just after guarded_enqueue returns with FULLQ, an interrupt happens that causes a dequeue and a call to wakeup. But this process is not yet blocked, so it is not unblocked. Instead, the wakeup does nothing at all to proctab[] (that’s OK in itself, maybe there’s no more output.) Then this code resumes and calls guarded_sleep, and this process may sleep forever (if it’s the last non-zombie process, for example.)

The solution is to put the cli()—sti() mutex region across more code:

cli();
while (enqueue(...) == FULLQ)
sleep(OUTPUT);
sti();

It’s a little scary to call sleep with interrupts off, but in fact they won’t be off long, because the sleep immediately calls schedule, which finds another process to run, and asmswtch swaps the EFLAGS with the IF in it, and the other process resumes execution, typically with IF=0 inside asmswtch, but it soon restores IF to 1 as it returns to user code. So the mutex (IF=0) is only holding for the execution in this process. The important thing for us here is that an interrupt can no longer occur in that bad spot described above. So even though there is a “mutex hole” at the call to sleep, we have what we need here.

More on the mutex hole: Suppose we have a global kernel variable x and we put:

cli();
x = 1;
while (enqueue(...) == FULLQ)
sleep(OUTPUT);
/* is x still = 1 here? Not necessarily, because of the mutex hole at sleep() */
sti();

You can see that the mutex hole allows another process to run in the middle of this sequence, and it could set x to some other value. So the use of cli—sti mutex along with sleep has to be thought out carefully. Basically there are two mutex regions, one before and the other after the call to sleep.