CS644 Tues, Apr. 20

Xinu Interrupt Handling, then Linux

We have seen that kernel mutex in Xinu is implemented by simply turning off the interrupt system, or “IF=0” in our shorthand for x86. Interrupts can only happen when IF=1, so when an interrupt happens, no transient state is caught, but instead, the kernel global variables are in a “consistent state”, all settled down.

For example, we can be sure that proctab[currpid].pstate == PRCURR at the start of an interrupt handler. And no decision has been made on the kernel state driving a soon-to-be made modification. The kernel is in a settled-down state.

Interrupt handlers start executing with IF=0, and the simple Xinu interrupt handlers just leave IF=0 (unlike Linux), so an interrupt handler itself is running under kernel mutex, and can modify kernel global variables. If it calls resched(), as it is allowed to do, the mutex holds up to resched(), and after resched(), same as other kernel code.

Note that we don’t need a disable(ps) or a restore(ps) in interrupt handler code. IF=0 is set by the interrupt cycle of the CPU, and IF=1 is restored at the end by the iret instruction (IF is a bit in EFLAGS). So an interrupt handler looks simple, but has the kernel mutex protection similar to a system call with disable(ps) and restore(ps).

Stack Handling: an interrupt handler “borrows” a stack (can be true in Linux also)

The Xinu interrupt stack grows on top of the process stack that was current at the time of the interrupt. Recall that the Xinu kernel stack grows on top of the user stack, so one stack per process is all Xinu needs. (Linux has a user and kernel stack for each thread, plus possibly some interrupt stacks on top of the kernel stack). Once the interrupt handler returns, the stack is back to what it was before, so the interrupted process should be unaffected.

Xinu user stack just before an interrupt:

<-----U_A-----|

addresses in memory-->

Xinu user stack with interrupt stack on top of it: an interrupt happened during user code execution, and the interrupt handler borrows the process stack

<-I--<-----U_A-----|

Xinu process stack after the user code has made a system call: the kernel stack grows on top of the user stack.

<---K_A---<-----U_A-----|

Xinu process stack for process A doing a syscall (not currently under mutex), and an interrupt happens: the interrupt borrows the process stack.

<--I--<---K_A---<-----U_A-----|

Linux has separate user and kernel stacks for each thread, so the last diagram would be:

<-----U_A-----| <--I--<---K_A---

user image kernel memory

Interrupt Handlers in Xinu (and Linux) may not block

An Interrupt Handler can call resched() for preemption, but can’t block. (same in Linux)

An interrupt handler is not allowed to block, only call resched() without setting pstate to a waitstate first. It is not itself a process, and the process that it has interrupted is unrelated to it in almost all cases. It is borrowing stack space from that process, but that should be all, and temporary. So there is no place to put the current CPU state, no pregs[] array.

Some OS’s, including Solaris, use threads for interrupt handlers and thus can allow interrupt handlers to block.

We see that in Xinu and Linux, an interrupt handler is not a thread/process, but a simpler execution unit that is expected to be very short-lived, living off of borrowed stack and also borrowed CPU time in many cases. Since all processes get hit by interrupts at the same rate, it works out OK on the average.

A typical interrupt handler executes when some hardware is “ready”, either with new data or ready to transmit more, or with a tick. This often causes the interrupt handler to unblock some process, a quick call to ready() in Xinu. In the tick handler, the quantum of the current process is decremented, and if zero, resched() is called for preemption.

Summary of Xinu so far:

We have now covered a microkernel of Xinu. If we included the clock interrupt handler, we would have a preemptive scheduler and a tiny set of syscalls: create, resume, kill, suspend, wait, signal, etc., send, receive. Still no way to printf to the console, though. Need device drivers for that. But first cover the corresponding material for Linux.

On to Linux...

We have already studied Linux system calls, in Love, Chap 5. Just as in Xinu, just after a syscall from process A, the same process is still running, but now in kernel mode. We wrote

--UA---|---KA----|---UA---

time-->

in a timeline, and this holds for Linux too, for a simple system call. If the system call blocks, there needs to be a process switch, marked || here:

--UA---|---KA----||---KB ---|---UB---... <process A unblocked>---KB --||--KA---|---UA---

Note that a process switch is a bigger transition than a syscall or syscall return.

Ignoring interrupts for a moment, we see that a CPU is always executing some process, maybe process 0 if no user process is runnable (except for a transitional moment inside the process switch itself). That’s true for Xinu or Linux. Since interrupts ride on top of some process, there is a current pid defined at all times (except in transition).

Love, Chap. 3 Process Management

struct task_struct like Xinu pentry, the process descriptor, or thread descriptor, since actually per-thread in Linux.

Fig. 3.1: some sample fields of task_struct, idea of the task list

· struct thread_info: lives at bottom of kernel stack, for access by assembly code

· addr_limit: 3G for 32-bit Linux, top of 47-bit address space for x64 Linux

· supervisor_stack: kernel stack bottom

Each thread/process has an 8KB (or other fixed size) area of kernel memory for it most important memory needs: the process descriptor and its kernel stack. The process descriptor has two parts (for obscure technical reasons), so the memory area looks like this, shown in Fig. 3.2, pg. 26.

|struct thread_info|---<--kernel stack----|-------struct task_struct-------------|

<--------------------8KB, or 4KB, or ...(i.e., not that big)--------------------->

addresses->

This whole thing is in the kernel data area, not in the user image. By a trick, the stack pointer can be used to locate the thread_info, which contains a pointer to the task_struct.

Important Linux Process states

· TASK_RUNNING: runnable, i.e, currently running (Xinu PRCURR) or ready to run (Xinu PRREADY).

· TASK_INTERRUPTIBLE/TASK_UNINTERRUPTIBLE: blocked (Xinu PRWAIT, PRRECV)

· TASK_ZOMBIE: terminated task, parent not yet informed about it (nonexistent in Xinu)

Look at Fig. 3.3 Process State diagram, like Xinu Fig 6.1 except no Suspended state.

Fork() and COW (Copy-on-Write) Fork is not as expensive as it sounds

Fork() calls clone() on Linux, so it’s an honorary syscall on Linux.

Linux Threads—the basic execution unit. Process = set of threads + VA space

Love Chap. 4

Cooperative multitasking vs. preemptive multitasking: both Xinu, Linux in latter category, i.e., they use preemption to grab the CPU away from a process/thread at an interrupt when its timeslice is over. The timer interrupt is the one that counts the quantum down to zero and so is the most common one to cause a preemption.

See Comer, pg. 125, third paragraph, for discussion of preemption in Xinu. See Love, pg. 39, last paragraph, for Linux, although the mechanism (interrupt handler) is not covered here. It is listed in a bullet on pg. 158 as a job for the timer interrupt handler.

I/O Bound vs. CPU-Bound processes: a useful idea

I/O bound: doing syscall after syscall, executing in kernel most of time

CPU-bound: doing user-level computation on and on, gets preempted if others need to run

Timeslice and preemption: read this. Note that on pg. 43 “when a process’s timeslice reaches zero” is detected in the timer interrupt handler.

Runqueues: one per processor listing runnable threads/processes. A thread is running on only one processor at a time, though can migrate now and then to another processor, for load balancing.

So each processor owns its own runqueue, hidden away from the bulk of the kernel (see footnote to pg. 44). There is a scheduler API for the rest of the kernel to use.

The runqueue is a kernel variable, part of the kernel state, and thus needs mutex for access. We see mention here of “spinlocks” for it. Also the possibility of deadlock. Read this carefully.

Priority Arrays and O(1) scheduling. This may be out of date. See Wikipedia O(1)_scheduler:

In versions of Linux kernel 2.6 prior to 2.6.23, the scheduler used is an O(1) scheduler by Ingo Molnár. The scheduler used thereafter is the Completely Fair Scheduler, also by Ingo Molnár, which runs in O(log N) time.

So we won’t worry about the details of priority calculations.

Schedule(): This is almost exactly like resched(), except that it takes care of its own mutex needs, unlike resched(), which requires the caller to have already obtained kernel mutex (IF=0). It can be called with the current process state a blocked state or a runnable state and choose the best thread/process to run next.

Skip section “Calculating Priority and Timeslice”. This may be out of date.

Sleeping and Waking Up. Note that “sleeping” and “blocking” are used interchangeably in Linux/UNIX language. Here we encounter “wait queues”, like Xinu semaphore queues but more general.