Last time:  Basic Kernel Structures for Xinu
Chap 3: All about the q[] array holding the disjoint queues of processes:

·         ready q, clock q, q of waiters for each active semaphore

·         Each process is on at most one of these queues.

·         Note: some processes are not on any queue: current process, suspended proc’s, ones waiting for a message (in receive)

Chap 4: process switch: resched calls ctxsw, in assembler, to do the actual CPU context switch:

·         save of CPU registers to pregs[] of “old” pentry

·         restore of CPU registers from pregs[] of “new” pentry

Note: new just means newly-chosen process. Both processes exist before this happens.

The CPU registers (or CPU context) include the general registers, and EFLAGS (known as “PS” for processor status, in Xinu). ESP is especially important since it points to the whole process stack, the repository of so much execution state for the process.

resched must be called with interrupts off (IF=0 in x86)

How to block a process:  (similar in Linux): make sure interrupts are off, then:

proctab[currpid].pstate = <waitstate>;

resched();
 

When resched sees that the current process can’t run any more, it will choose another to run.

This time: Kernel mutux for Xinu: IF=0 on x86, also understanding calls to resched().

Note that in between the proctab[currpid].pstate setting and the call to resched in the code above, the system is running in a “transient” funny state where the running process (the one with pid = currpid) has pstate that is a <waitstate>.  However, since IF=0, no activity can interrupt this one, which will soon fix up the kernel global variables to be in a “consistent state”. Specifically, the current process will have pstate PRCURR.
How to unblock a process: call ready, again with int’s off.
Note: ready and resched are kernel internal functions, not syscalls, so must not be called from user level code.
The kernel code can call resched (almost) any time to make sure the highest priority code is running.  It is important to know how to read such code:
Example (interrupts are off already here):

int x = 1;  /* local var, on process stack, private to process */

resched();

/* is x=1 here? */

Answer: yes. Although resched causes other processes to run, when we read kernel code we are following the lifetime of a single process, with its own set of private variables.

A certain process p is running, then some other processes run, then the same old process p is running again.
So all the private variables have the same value before and after the call to resched().  However, kernel global variables may change because the other processes that run may change them.

Kernel Mutex by turning off the interrupt system.

In Xinu, kernel mutex is implemented by turning interrupts off.

·         IF = 0: the current execution will continue uninterrupted until done with whatever it is trying to do, unless it calls resched. We can say it has the kernel mutex until it makes IF=1 or calls resched.

·         IF = 1: interrupts can happen between any two instructions, and the interrupt handler constitutes another kernel activity that may access kernel global variables or hardware registers. We say that code running with IF=1 does not have the kernel mutex.

This simple IF = 0/1 mechanism works for Xinu because it is a uniprocessor OS. If there are multiple CPUs, then 2 or more can be executing simultaneously even without interrupts, so another mechanism is needed for kernel mutex. More on this when we discuss Linux.

We know that when user code is executing, IF=1 always. For kernel execution, part is done at IF=1 and part at IF=0.

Turning interrupts on and off in Xinu.

In resume, pg. 67, we see:

char ps;   /* spot to save ps, the processor status word */

disable(ps);

...

restore(ps);

ps is a local variable, thus on the process stack and private to the process. For x86, we need more than a byte for EFLAGS. The online x86 Xinu code has

STATWORD ps;   

to accommodate the larger EFLAGS.

Xinu disable(ps) and restore(ps)
disable (ps) saves the current PS/EFLAGS into the local variable ps and then turns off interrupts (for x86, sets IF=0 with the cli instruction).  This is a C macro for the PDP11, which explains why it can fill ps using disable(ps). (A function call could not change a char argument like ps.)  In the x86 Xinu, STATWORD is a one-entry array, so disable(ps) can be a function call.

The important thing to realize is that disable(ps) saves the PS/EFLAGS in the local variable ps and turns off the interrupt system, thus turning on kernel mutex (it might already by on).

restore(ps) uses the saved PS/EFLAGS to set the CPU EFLAGS back to what it was when disable(ps) was called.

Two cases to think about:

1.       resume was called from user code. In this case IF=1 at the call to resume and disable(ps), so the saved ps has IF=1. Then resume runs with IF=0, and restore(ps) restores the EFLAGS back to have IF=1, on the way back to user code.

2.       resume was called from kernel code already running with IF=0. Then the saved ps has IF=0, and restore(ps) restores the EFLAGS back to have IF=0 at the return from resume, as it was before the call. The caller is running at IF=0, expects to stay that way.

There is also the case that resume is called from kernel code running with IF=1, but this is similar to case 1.

You can see that it would not work to simply use cli and sti in OS code, because in case 2. we need IF=0 at the end, not IF=1 as a sti would do.

Using disable and restore in kernel code

Also, note that disable and restore appear in the same function in Xinu code, bracketing the actions that need mutex. If you want to use a helper function to do part of the work under mutex, you specify that it should run under kernel mutex.

Recall that resched must be called with interrupts off, i.e., under kernel mutex. So it’s common to see the pattern:

char ps;  
disable(ps);
...

resched();
...
restore(ps);
For example, in resume, ready is called, which calls resched, so this sequence is happening there.

How do we think about the execution of this code?  It is easy to go astray here, worrying about all the processes that could run at the call to resched. To understand the action of resume, we want to follow one process doing this action. But we can’t forget the reality that at the “resched” call, a lot of other processes could run.

We consider the following scenario, and go through its details so we know how it works:

1.       process 44 is executing resume, and nothing can interrupt it once it has mutex, all the way to the call to resched, and into resched, and into ctxsw, where bang, it loses the CPU after its CPU context is saved away.

2.       Process 49 has been chosen to run. It previously lost the CPU in ctxsw, so its CPU context is stored in its pentry, and now gets restored. It returns from ctxsw to resched, and returns from resched back to the kernel code that it was previously executing (the code with a call to resched). Eventually it calls resched again, losing the CPU to process 48. Note this is not a recursive call to resched. It can be in a completely different system call, for example.

3.       Process 48 goes through the same sequence as 49, eventually losing the CPU to process 44, the process of interest.

4.       So eventually, process 44 gets its same old CPU context restored, making it execute like it executed before, and it returns from ctxsw, to resched, to ready, to resume as depicted above.

Let r1 stand for first part of resched, r2 for second part (after call to ctxsw)

c1 stand for first part of ctxsw, c2 for second part

Then a process executes r1, then c1, then loses CPU to another process, which executes c2, then r2, then whatever code called resched, and so on.

Timeline:

---------|r1|c1||c2|r2|------|r1|c1||c2|r2|------|r1|c1||c2|r2|---------

<--proc. 44----><------proc. 48----><---proc. 48-------><-----proc. 44-->

This is how timesharing works. It is counterintuitive because in ordinary programming, there are no cases like this that abandon a CPU context and its non-empty stack and start using another one.  We let the OS do this for us!

Summary

When kernel code calls resched(), it is allowing other processes to run for a while before this one runs again. When we read the code, we should think about a certain process (or certain interrupt handler execution, another kind of kernel activity). We know a lot of other processes can run during a call to resched, so the kernel global variables may change there. But the local variables keep their same values.

Because the kernel global variables can change at the call to resched, it means that the kernel mutex established by the disable(ps) does not hold across the call to resched. Instead, it holds from the disable(ps) to the resched, and again from the return from resched to the restore(ps).