·
ready
q, clock q, q of waiters for each active semaphore
· Each process is on at most one of these queues.
·
Note:
some processes are not on any queue: current process, suspended proc’s, ones
waiting for a message (in receive)
Chap 4: process switch: resched calls ctxsw, in
assembler, to do the actual CPU context switch:
· save of CPU registers to pregs[] of “old” pentry
· restore of CPU registers from pregs[] of “new” pentry
Note: new just means newly-chosen process. Both
processes exist before this happens.
The
CPU registers (or CPU context) include the general registers, and EFLAGS (known
as “PS” for processor status, in Xinu). ESP is especially important since it
points to the whole process stack, the repository of so much execution state
for the process.
resched
must be called with interrupts off (IF=0 in x86)
How
to block a process: (similar in Linux): make sure interrupts are off,
then:
proctab[currpid].pstate = <waitstate>;
resched();
When resched sees that the current process can’t run
any more, it will choose another to run.
This time: Kernel mutux for Xinu: IF=0 on x86, also
understanding calls to resched().
Note that in between the proctab[currpid].pstate
setting and the call to resched in the code above, the system is running in a
“transient” funny state where the running process (the one with pid = currpid)
has pstate that is a <waitstate>. However, since IF=0, no activity
can interrupt this one, which will soon fix up the kernel global variables to
be in a “consistent state”. Specifically, the current process will have pstate
PRCURR.
How to unblock a process: call ready, again with int’s off.
Note: ready and resched are kernel internal functions, not syscalls, so
must not be called from user level code.
The kernel code can call resched (almost) any time to make sure the highest
priority code is running. It is important to know how to read such code:
Example (interrupts are off already here):
int x = 1; /* local
var, on process stack, private to process */
resched();
/* is x=1 here? */
Answer: yes. Although resched causes other processes
to run, when we read kernel code we are following the lifetime of a single
process, with its own set of private variables.
A certain process p is running, then some other
processes run, then the same old process p is running again.
So all the private variables have the same value before and after the call to
resched(). However, kernel global variables may change because the
other processes that run may change them.
Kernel Mutex by turning off the interrupt system.
In Xinu, kernel mutex is implemented by turning
interrupts off.
·
IF
= 0: the current execution will continue uninterrupted until done with whatever
it is trying to do, unless it calls resched. We can say it has the kernel mutex
until it makes IF=1 or calls resched.
·
IF
= 1: interrupts can happen between any two instructions, and the interrupt
handler constitutes another kernel activity that may access kernel global
variables or hardware registers. We say that code running with IF=1 does not
have the kernel mutex.
This simple IF = 0/1 mechanism works for Xinu
because it is a uniprocessor OS. If there are multiple CPUs, then 2 or more can
be executing simultaneously even without interrupts, so another mechanism is
needed for kernel mutex. More on this when we discuss Linux.
We know that when user code is executing, IF=1
always. For kernel execution, part is done at IF=1 and part at IF=0.
Turning interrupts on and off in Xinu.
In resume, pg. 67, we see:
char ps; /* spot
to save ps, the processor status word */
disable(ps);
...
restore(ps);
ps is a local variable, thus on the process stack
and private to the process. For x86, we need more than a byte for EFLAGS. The
online x86 Xinu code has
STATWORD ps;
to accommodate the larger EFLAGS.
Xinu disable(ps) and restore(ps)
disable (ps)
saves the current PS/EFLAGS into the local variable ps and then turns off
interrupts (for x86, sets IF=0 with the cli instruction). This is a C
macro for the PDP11, which explains why it can fill ps using disable(ps). (A
function call could not change a char argument like ps.) In the x86 Xinu,
STATWORD is a one-entry array, so disable(ps) can be a function call.
The
important thing to realize is that disable(ps) saves the PS/EFLAGS in the local
variable ps and turns off the interrupt system, thus turning on kernel mutex
(it might already by on).
restore(ps)
uses the saved PS/EFLAGS to set the CPU EFLAGS back to what it was when
disable(ps) was called.
Two
cases to think about:
1.
resume was called
from user code. In this case IF=1 at the call to resume and disable(ps), so the
saved ps has IF=1. Then resume runs with IF=0, and restore(ps) restores the
EFLAGS back to have IF=1, on the way back to user code.
2.
resume was called
from kernel code already running with IF=0. Then the saved ps has IF=0, and
restore(ps) restores the EFLAGS back to have IF=0 at the return from resume, as
it was before the call. The caller is running at IF=0, expects to stay that
way.
There
is also the case that resume is called from kernel code running with IF=1, but
this is similar to case 1.
You
can see that it would not work to simply use cli and sti in OS code, because in
case 2. we need IF=0 at the end, not IF=1 as a sti would do.
Using
disable and restore in kernel code
Also,
note that disable and restore appear in the same function in Xinu code,
bracketing the actions that need mutex. If you want to use a helper function to
do part of the work under mutex, you specify that it should run under kernel
mutex.
Recall that resched must be called with interrupts off, i.e., under kernel mutex. So it’s common to see the pattern:
char ps;
disable(ps);
...
resched();
...
restore(ps);
For example, in
resume, ready is called, which calls resched, so this sequence is happening
there.
How do
we think about the execution of this code? It is easy to go astray here,
worrying about all the processes that could run at the call to resched. To
understand the action of resume, we want to follow one process doing this
action. But we can’t forget the reality that at the “resched” call, a lot of
other processes could run.
We
consider the following scenario, and go through its details so we know how it
works:
1.
process 44 is
executing resume, and nothing can interrupt it once it has mutex, all the way
to the call to resched, and into resched, and into ctxsw, where bang, it loses
the CPU after its CPU context is saved away.
2.
Process 49 has been
chosen to run. It previously lost the CPU in ctxsw, so its CPU context is
stored in its pentry, and now gets restored. It returns from ctxsw to resched,
and returns from resched back to the kernel code that it was previously
executing (the code with a call to resched). Eventually it calls resched again,
losing the CPU to process 48. Note this is not a recursive call to resched. It
can be in a completely different system call, for example.
3.
Process 48 goes
through the same sequence as 49, eventually losing the CPU to process 44, the
process of interest.
4.
So eventually,
process 44 gets its same old CPU context restored, making it execute like it
executed before, and it returns from ctxsw, to resched, to ready, to resume as
depicted above.
Let r1
stand for first part of resched, r2 for second part (after call to ctxsw)
c1
stand for first part of ctxsw, c2 for second part
Then a
process executes r1, then c1, then loses CPU to another process, which executes
c2, then r2, then whatever code called resched, and so on.
Timeline:
---------|r1|c1||c2|r2|------|r1|c1||c2|r2|------|r1|c1||c2|r2|---------
<--proc. 44----><------proc. 48----><---proc. 48-------><-----proc. 44-->
This
is how timesharing works. It is counterintuitive because in ordinary
programming, there are no cases like this that abandon a CPU context and its
non-empty stack and start using another one. We let the OS do this for
us!
Summary
When
kernel code calls resched(), it is allowing other processes to run for a while
before this one runs again. When we read the code, we should think about a
certain process (or certain interrupt handler execution, another kind of kernel
activity). We know a lot of other processes can run during a call to resched,
so the kernel global variables may change there. But the local variables keep
their same values.
Because
the kernel global variables can change at the call to resched, it means that
the kernel mutex established by the disable(ps) does not hold across the
call to resched. Instead, it holds from the disable(ps) to the resched, and
again from the return from resched to the restore(ps).