Thurs, Oct. 26

Next time: midterm review.  Next Thursday, Nov. 2: midterm exam

Hw3 intro, continued:  Source Files

All the files are compiled and loaded together into one downloaded .lnx file:

Kernel: startup0.s, startup1.c, sysentry.s, tunix.c (with some new code), sched.c (new), asmswtch.s (provided), io.c, ioconf.c, tty.c

User program1: crt01.s, uprog1.c (with main1)

User program2: crt02.s, uprog2.c (with main2)

User program1: crt03.s, uprog3.c (with main3)

User-level library: ulib.s, shared between user programs

Plus debugging support: debuglog.c, which is neither user nor kernel, usable both places.

Scheduling.   Tan.,  pg. 132-146   

We’ve already discussed CPU-bound = compute-bound processes.  Similarly i/o-bound. Also preemption and preemptive schedulers.

Scheduling is about sharing resources appropriately among processes.  There are 3 major resources to focus on:

Each process wants some fraction of total resource in each direction, and if these fractions (in some direction) add up to more than 100%, then things slow down, even with good scheduling.

Study pic on pg. 141.  The number of processes in memory is called the “degree of multiprogramming”, and it is an important number to scheduling.  If it is too high, each process gets only a small slice of the CPU.  But that’s not the only thing.  If the OS has squeezed a lot of runnable processes into memory by reducing their memory footprints (by paging,) then they are all paging like crazy trying to do their work. 

A batch system has the “admissions scheduler” to help control the degree of multiprogramming, but UNIX and Win2K don’t have this facility, and are expected to try to run whatever is started.

The only way out for an overloaded timesharing system (UNIX or Win2K) is to swap out some processes, that is, copy their entire memory image out to disk and back again a little while later.  This is a desperate action and costs a lot of i/o, and is rarely seen today on our big-memory systems.  Processes are swapped out on today’s systems (usually) only when they go idle for a good part of a minute.

Round-robin—see Tan.

Quantum = CPU time a process/thread gets before it is preempted

Often a thread blocks before it reaches this point.

Quantum is set at about 50 ms, as compromise between throughput and delay in responsiveness.

A preemption causes a “process switch” or “context switch”.  The second term is too vague, since context just means state.  And sometimes it is used for just a mode switch, which happens in a system call.  A system call causes execution to go from user mode to kernel mode in the same process.  So it’s not a process switch, and in fact is a much lighter-weight action than a process switch.  Good thing, since system calls (and mode switches in interrupts) happen much more often.

Priority scheduling.  Note the UNIX “nice” command, that allows you to run a user process at a lowered priority

Real-time systems—just know what they are.

Thread scheduling: In both Solaris and Win2K, threads are the primary scheduling unit.  For each thread, the process it belongs to is known, of course.

Solaris has different scheduling classes, with different policies:  SYS, TS (timesharing), RT (real-time), IA (interactive).  In a multi-CPU system, you can bind a CPU to a thread.  In the RT class, you can even turn off interrupts on that CPU.

hw3’s scheduler  

Scheduler setup, kernel global variables:

proctab[], curproc: global, used by tunix.c and sched.c

Recall our discussion of PEntry structs, one for each process in the proctab array.  There are 4 spots in the array, one for process 0 and 3 for the user processes 1, 2, and 3.

process 0 derives from the startup execution, i.e., the kernel initialization in tunix.c morphs itself into a process by filling in proctab[0] and curproc.  It is always runnable, so the scheduler can always find a process to run.

Scheduler API (not encapsulated: uses global proctab[], as do other functions in the kernel)

sched.c will contain the scheduler code—here are its major functions.

Where are these called from?

sleep—called from ttywrite (to replace the busy wait)

wakeup—called from the tty output interrupt handler, when a new spot in the output queue becomes available.

schedule—called from process 0 code, to get started, and as an “idle loop” to try to hand off the CPU to a user process.  Also from sysexit.

These are all critical code, so should run with interrupts off.  If you call wakeup only from int handlers, of course it will get IF=0 automatically, since we never turn ints back on in int handlers.  You could put checks in to make sure IF=0 ((get_eflags()>>9)&1 gives IF)

Note that wakeup is a “broadcast” action, so multiple processes may wake up for just one char-spot in the tty output queue, for example.  Also note that wakeup doesn’t call schedule, so it doesn’t actually get a process running, it just marks it runnable.  Sometime later, another process calls sleep, and that calls schedule, and chooses this one to run.  Or process 0 calls schedule, and that does the trick.

The only tricky coding in sched.c is schedule(), because it calls asmswtch, a completely weird function.  I’d code it like this:

  1. Decision code, end up setting new curproc, but also knowing oldproc, pointers to proctab entries.
  2. asmswtch(oldproc, curproc)
  3. return

You don’t have to do it this way, but be aware that asmswtch does return after the given process is rescheduled, and then it should return from schedule and go on with its business (unless the process is exiting).  Don’t let it fall into inappropriate code because you think it never returns! 

The calls into the scheduler are straightforward to code except for the sleep call from ttywrite.  There we had a busy loop of attempted enqueues into the output queue, and we need to replace it by blocking the process.

Might try:

if (enqueue(...) == FULLQ)
       sleep(OUTPUT);
     /* can we enqueue now?  Not for sure! */

Suppose two processes, A and B, were unblocked by wakeup.  If A gets to run first, it will use up the space in the output queue, but B will still be marked RUN, and will be chosen to run a little while later.  It will return from sleep, but there will be no space in the output queue for it to use.  It needs to call sleep again.  So we really need something like:

while (enqueue(...) == FULLQ)
       sleep(OUTPUT);

But what about mutex?  We know from hw2 that enqueue needs mutex (cli()) to protect it against concurrent modifications to the queue data structure by itself and the dequeue in the interrupt handler, which can run in the middle of an enqueue unless we turn off interrupts.

Is that enough?  Suppose we write guarded_enqueue(), which is just enqueue’s body surrounded by cli and sti (or set_eflags back to original value.), and similarly guarded_sleep().  Then we would have:

while (guarded_enqueue(...) == FULLQ)
       guarded_sleep(OUTPUT);

But there’s a sneaky problem here called the lost wakeup problem.  It was discussed in Tan, pg. 110 in the context of user-level sleep and wakeup calls.  Luckily we are working inside the kernel here, so we are allowed to turn off interrupts as needed.

The problem is this.  Suppose that just after guarded_enqueue returns with FULLQ, an interrupt happens that causes a dequeue and a call to wakeup.  But this process is not yet blocked, so it is not unblocked.  Instead, the wakeup does nothing at all to proctab[] (that’s OK in itself, maybe there’s no more output.)  Then this code resumes and calls guarded_sleep, and this process may sleep forever (if it’s the last non-zombie process, for example.)

The solution is to put the cli()—sti() mutex region across more code:

cli();
while (enqueue(...) == FULLQ)
       sleep(OUTPUT);
sti();

It’s a little scary to call sleep with interrupts off, but in fact they won’t be off long, because the sleep immediately calls schedule, which finds another process to run, and asmswtch swaps the EFLAGS with the IF in it, and the other process resumes execution, typically with IF=0 inside asmswtch, but it soon restores IF to 1 as it returns to user code.  So the mutex (IF=0) is only holding for the execution in this process.  The important thing for us here is that an interrupt can no longer occur in that bad spot described above.  So even though there is a “mutex hole” at the call to sleep, we have what we need here.

More on the mutex hole: Suppose we have a global kernel variable x and we put:

cli();
x = 1;
while (enqueue(...) == FULLQ)
       sleep(OUTPUT);
/* is x still = 1 here?  Not necessarily, because of the mutex hole at sleep() */
sti();

You can see that the mutex hole allows another process to run in the middle of this sequence, and it could set x to some other value.  So the use of clisti mutex along with sleep has to be thought out carefully.  Basically there are two mutex regions, one before and the other after the call to sleep.