Tues., Nov. 28

Intro to hw5—go over handout

Note that this hw5 handout doesn't have the "..."s of the handout last time on hw3 solution. That's just a difference in log output for the |(0-0) report, that is, the case of process 0 calling the scheduler but getting scheduled again because no user process is ready to run. The provided hw5 sources print "." for this case, but this handout shows nothing at all in the log for this nearly-trivial case.

Challenges of two options:

preemptive scheduler—get ticks working in kernel, call schedule from tick handler as appropriate

Instrumentation: . More on this later.

semaphores—new syscalls, but that should be no problem: new code in ulib.s, sysup, sysdown, etc. in kernel

See Tanenbaum pg. 110 for definitions of up and down.

Need sem struct in kernel, with sem count and a list of waiting processes or equivalent

—can use intqueue, supplied queue of ints (pids), or use proctab[i].waitcode values. (with a unique waitcode for each semaphore)

sysdown(int semid): decrement semid’s count if it’s positive, sleep if appropriate, but as usual, may be wakened inappropriately, so loop over sleeps is needed.

sysup(int semid): increment semid’s count as appropriate, call wakeup, which wakes up all waiters on semid’s waitcode

The test for “do I need to go back to sleep?” in sysdown is quite tricky. For testing with testmutex, can be sloppy because at most one waiter. Prodcons is a better test once testmutex is working. You could add another waiter to testmutex by extending main1 with code similar to main2 and main3, that is, contesting for the mutex.

Timer Device: needed for hw5’s preemptive scheduler, review from CS341

The timer device of the PC is a chip (or anyway digital logic) that contains a 16-bit counter and some logic controlling it. A counter is a register (array of bits) plus some logic to increment and/or decrement the number in the counter. In this case the counter decrements, so we talk of “downcounts.”

In addition to the counter register, the timer has a “holding register”, also 16 bits, which holds an initial count for the counter. See Tan, pg. 328.

The timer device is connected to a crystal oscillator that puts out a simple square wave at 1.193 Mhz, i.e., one cycle every microsecond (a little under.) On every cycle of the oscillator, the counter decrements by one. Eventually the count reaches zero. At this point, it reloads the initial count from the holding register and starts downcounting again. In addition, at each zero it sends an IRQ signal out. This happens over and over.

The timer generates a sequence of periodic interrupts, or “ticks”. Its IRQ line is connected to IR0 of the PIC, so the PIC treats the timer as the highest-priority device in the system, with IRQ 0. Using the Linux/SAPC initialization of the PIC, the PIC sends in an interrupt with nn = 0x20, and so we need to put the tick handler address in IDT[0x20].

The highest possible initial count is 64K (2¹⁶). 64Kcyc/1.193x10⁶cyc/sec = 55 ms, the longest possible period between ticks. For shorter periods we use smaller initial counts. For example, for 20ms ticks, we need

20x10^-3sec * 1.193x10⁶cyc/sec = 23860 cyc

i.e., 23860 is the needed initial count in the holding register. Converting to hex, this is 0x5d34, needing two bytes. This 16-bit number is output to an 8-bit port in two parts. Here’s the protocol, at the lowest level:

Timer Initialization for 20ms ticks

outb 00 11 010 0 = 0x34 to i/o port 0x43 (control port)

outb 0x34 to i/o port 0x40 (low byte of inicount to count port)

outb 0x5d to i/o port 0x40 (high byte of inicount to count port)

See $pcinc/timer.h for details on the control register fields. See timetest.c in $pcex for decent C code to borrow. Note the two outb accesses to the same port. The timer detects the write of its control port and expects the two follow-up writes to its count port. It catches each 8-bit value and puts it in the right place in its holding register. The counter keeps downcounting all the while.

You can’t turn the timer off, only change its period. You can’t turn off its interrupts either. Luckily, we can mask off its interrupts at the PIC, and that’s how peace is attained normally on the SAPC. To get interrupts working, we need to set a tick handler address in IDT[0x20] and then unmask the 0 bit in the PIC’s mask register.

Back to Memory Management to finish up

Quick summary of page replacement algorithms

LRU pretty good, too expensive (LRU-2 even better, even more expensive), FIFO no good, some others

--OS’s use clock algorithms (approx LRU) and/or working set algorithms (for inter-process fairness)

Memory Management Reading: Chap 4 to pg. 232 as mentioned before, skip page 233 for sure (or fix it.) You can skip 234-240, Pick up again at 4.6.5 and read to pg. 249. Also read 10.4 to pg. 719 on UNIX MM algorithms. Also 11.5 to pg 820 on Win2K algorithms.

From hw4, you are now more familiar with page tables, PTs, and page directories, PDs. But note that in the SAPC environment, these are very static: just one page directory, available at VA 51000, and one page table, available at VA 52000, set up by Tutor (actually, old Linux code.)

In the real OS execution, PDs and PTs come and go with the processes. Each process has a PD, and at least a few PTs, to support its virtual memory, under code, data, stack, and DLLs. In Linux, there's 1 GB of kernel virtual memory that uses the upper one quarter of the PD, plus PTs

The current process on a certain x86 processor has CPU register CR3 pointing to its PD, and thus mapping in all its virtual memory, including the kernel for x86 Linux. There are typically a group of other processes on the system with their PD and PTs in memory, but not currently provided a CPU, so not mapped in.

Tan, Sec. 4.7 pp 242-243. MMU actions at process switch:

At process switch, the CR3 register is loaded with the PA of the scheduled process, and that causes a whole new process image to be mapped in. The caches need to be flushed, both the instruction/data and TLBs. Just after a process switch, cache misses are frequent until the caches again have the important data in them. This cache flush action is a big performance effect attached to a process switch.

Note that switching between threads of one process does not involve reloading CR3 or flushing caches, and thus is significantly more lightweight.

p. 241: Note copy-on-write (COW) trick to speed up fork.

MMU and system security

The MMU is very important to the job of keeping a user process "bottled up" in its virtual machine. Each address in a user program is tested out on every instruction, so the process can't see anything it shouldn't, in particular, the kernel code and data..

The MMU causes a page fault or general protection exception for addresses that fail its test, and this causes the kernel to execute and figure out what to do. Each page is marked U or S to allow the kernel to see things the user level execution can't, without generating an exception.

Naturally, the instruction to change the CR3 is privileged. The page tables and page directory are hidden away from the user program in the kernel data area.

Page Fault Handling in x86.

The MMU sees P=0 in a needed PTE, and causes the CPU proper to trap via vector number 0xe = 14, so that IDT[14] supplies the trap handler, known as the page fault handler.

The MMU also puts the “linear address” of the fault in CPU register CR2.

What’s a linear address (LA)? Gets us into segmentation...

A little bit about segmentation (optional, not covered in lecture)

The full x86 memory architecture involves segmentation as well as paging, so the address translation goes through two mappings, first by segmentation (VA -> LA) and then by paging (LA -> PA): VA -> LA -> PA. We have been trying to ignore the middle step and say VA -> PA by paging. This is defensible in UNIX or Win2K because both of these OSs utilize “flat address spaces” and trivialized segmentation. The user image is contained in huge 2G or 3G segments, and the OS in say 1G segments.

With a flat address space, the addressing in user memory is simplified to a sequence of 32-bit addresses from some lower limit to some upper limit, holding both code and data. Addressing in the kernel is similarly simplified. (Full VAs under real segmentation look like 0018:00010234 for a code address and 0024:00001234 for a data address, etc., code and data live in different segments with no useful relative position.)

In the SAPC, the 4M of VA space is mapped by segmentation to the beginning of the last 1G of LA space, starting at 3G = 0xc0000000. Then paging maps that 4M of LAs to the first 4M of PA space. Address 0x1234, a VA, is mapped to LA 0xc0001234 and then to PA 0x1234. When you see an LA, just drop the high byte of 0xc, and you’ll see the VA or PA.

F06: the following was covered in the next lecture, but was meant to be here--

Back to the page fault handler: look at Tan., pg. 243 for steps.

MMU causes trap, puts faulting LA in CR2
As routine saves regs (this is part of the OS)
PF handler (in C) figures out which page not present
OS checks if this page is a valid page of this process (else usually kill process), get page from free list.
Drop this step (the free list handling gets dirty pages written out earlier)
OS locates needed data, often in executable file or swap space, schedules disk i/o, blocks this process <-- Note blocking in PF
Disk interrupt signals page in, PTE updated, wakeup process.
Faulting instruction needs reexecuting—process is still in kernel after scheduling, back in PF handler, with user PC on bottom of stack where it can be adjusted (backed up) (this is done in the PF handler, not the disk interrupt as Tan seems to say).
The PF handler returns to the as routine
The as routine does the iret, using the backed-up PC, and resumes user execution.
(added) The user code re-executes the instruction that caused the PF, more successfully this time.

Tan., pg. 244. Instruction backup not such a problem in x86 or Sparc, because there is no auto-incrementation done along with memory access, and there is at most one operand memory address per instruction.

Examples of PFs

First ref to data page—page contents are in executable file. PF handler blocks.
First ref to BSS page (uninitialized data)—no blocking, just assign page from free list.
Ref that extends the user stack—same as 2.
First ref to text page (code)—as in 1, or if this program is in use by another process, arrange sharing of code page already in memory.
Reref after pageout to swap space—block while read in from swap space.
Ref to address outside of program image: fails validity test in step 4 above, causes “segmentation violation” in Solaris, usually kills process.
Ref to malloc’d memory (heap): malloc itself only allocates swap space, not real memory, so the memory is added by PFs, like #2.

Note that a PF is more like a system call than an interrupt. They are both exceptions or “traps.” When the kernel is executing after a trap, it is executing on behalf of the current process, so the process entry and process image are relevant and usable. No problem in blocking. An interrupt is quite different. Interrupt handlers execute as guests of a “random” process. They normally don’t access process data, only kernel global data relevant to their device.

Tan, pg. 710 UNIX memory management.

Discussion of process image regions—don’t forget DLLs too now.

pg. 711 pic. showing 2 processes in memory sharing their code pages. But note that only one of these is “current” in use of the CPU (unless there are multiple CPUs.) The other one is in memory but not scheduled at the moment. On the x86, the current one has the CPU register CR3 pointing at its page directory, which makes its whole process image mapped in. Other CPUs have similar master registers for the top-level of their paging support.

Not covered in class F06--so optional

Memory-mapped files.

A region of a file can be mapped into a process image. Then writes to that part of VA space cause corresponding writes to the file pages. If two processes map the same region of a file, they end up with shared memory with data that persists in the filesystem. However, this is not commonly used in applications, partly because the solution is not portable, and partly because there are subtle issues such as exactly when the file writes occur.

Often, the memory-mapping mechanism is used for shared memory, ignoring the file itself. You can use /dev/zero, the OS-supplied effectively infinite file of zeroes, instead of a real file. See mmap_nofile.c.

Daemons: swapper, page daemon (Solaris pageout, actually a kernel thread)

Core map—array with one entry for each physical page of memory used for paging. (Some memory is used without paging, notably the kernel code under UNIX, and the core kernel code under Win2K) This is used to implement the free list and keep track of where on swap space a page should be written, and other things.

pg. 718: two-handed clock algorithm-- we already covered

Don’t worry about Linux memory management.

pg. 812 Win2K memory management. Read to pg. 820.