CS444 Class 21 Swapping, Page Faults
Handout: Intro to hw4
Intro to
hw4—go over handout
We’ll discuss this further next class.
Back
to Memory Management.
Note that virtual memory can be much larger than physical memory, but when this is true, and processes are actively using the virtual memory, there is a lot of paging going on.
Stressing memory management: Last year a student wrote a “depth bomb” program to severely exercise ulab’s memory management, which we had previously observed to be hardly used. It had had only had 22 revolutions of the clock daemon in almost 3 years of uptime. His program created about 2000 processes, each malloc’ing 4KB of memory and then using it, forcing the OS to provide memory for them. That’s 8GB of memory, but ulab has only about 400MB of physical memory, so it had to do a lot of paging to keep up. The system did survive, and after these runs showed 39 revolutions of the clock daemon. I suspect that many malloc’s failed, because ulab has only about 1GB of swap space, and the brk system call used in malloc allocates swap space for the malloc’d memory. But even if only 800MB of mallocs succeeded, that’s still much more than the 400MB of physical memory, so has the same effect.
I assume he checked that no other users were working on ulab at the time he ran the program, since this would be significant “denial of service” to them.
Swap space or swap area: an area of disk or SSD (solid state disk, aka flash memory) to hold page
data. Typically sized at twice physical memory size. This
area has no file system: it’s just a bunch of page images. Of course the OS has
to track which pages are in use, and by what process. Swapping can be done to a
file at some cost in performance (pg. 769).
Uses of swap space:
UNIX/Linux, Windows:
Page-outs of paging system. When a dirty
page is reclaimed, the old version is written to swap space.
Solaris: swaps whole
process images to swap space if idle too long (20 min.) or need for memory is
high…
Swapping under extreme load on Solaris UNIX
If the clock algorithm is unable to get the #pages on the free list up to even to a low level, it causes swapping to start on actually-active processes, a desperate action. This should never happen on a healthy system. It is discussed at the bottom of pg. 718 and top of the next page. Linux apparently just uses more and more paging.
In the real OS execution, PDs and PTs (page directories and page tables) come and go with the processes. Each process has a PD, and at least a few PTs, to support its virtual memory, under code, data, stack, and DLLs. In Linux, there's 1 GB of kernel virtual memory that uses the upper one quarter of the PD, plus PTs
The current process on a certain x86 processor has CPU register CR3 pointing to its PD, and thus mapping in all its virtual memory, including the kernel for x86 Linux. There are typically a group of other processes on the system with their PD and PTs in memory, but not currently provided a CPU, so not mapped in.
<diagram of physical memory holding PDs and PTs, but one mapped setup pointed to from CR3 in the CPU>
Tan, Sec. 3.6.1 pp 227-230. MMU actions for
processes
Four times the OS has paging-related work:
1. Process creation (fork}: create page directory PD and at least one PT
2. Execution: Exec: map in executable file pages, process switch: make CPU use different PD
3. Page Fault: fix up one page, PTE
4. Process Termination (exit or exception) : deallocate PD, PTs
At process switch, the CR3 register is loaded with the PA of the PD of the newly scheduled process, and that causes a whole new process image to be mapped in. The caches need to be flushed, both the instruction/data and TLBs. Just after a process switch, cache misses are frequent until the caches again have the important data in them. This cache flush action is a big performance effect attached to a process switch.
Note that switching between threads of one process does not involve reloading CR3 or flushing caches, and thus is significantly more lightweight.
MMU and system security
The MMU is very important to the job of keeping a user process "bottled up" in its virtual machine. Each address in a user program is tested out on every instruction, so the process can't see anything it shouldn't, in particular, the kernel code and data..
The MMU causes a page fault or general protection exception for addresses that fail its test, and this causes the kernel to execute and figure out what to do. Each page is marked U or S to allow the kernel to see things the user level execution can't, and generate an exception if a user execution tries to access kernel memory.
Naturally, the instruction to change the CR3 is privileged. The page tables and page directory are hidden away from the user program in the kernel data area.
Page Fault Handling in x86.
Back to the page fault handler: look at Tan., pg. 228 for steps.
Tan., pg. 230. Instruction backup is not such a problem in x86 or Sparc, because there is no auto-incrementation done along with memory access, and there is at most one operand memory address per instruction.
Examples of PFs
Note that a PF is more like a system call than an interrupt. They are both exceptions or “traps.” When the kernel is executing after a trap, it is executing on behalf of the current process, so the process entry and process image are relevant and usable. No problem in blocking. An interrupt is quite different. Interrupt handlers execute as guests of a “random” process. They normally don’t access process data, only kernel global data relevant to their device.
Tan, pg. 758 Linux memory management.
Discussion of process image regions—don’t forget DLLs too now.
pg. 758 pic. showing 2 processes in memory sharing their code pages. But note that only one of these is “current” in use of the CPU (unless there are multiple CPUs.) The other one is in memory but not scheduled at the moment. On the x86, the current one has the CPU register CR3 pointing at its page directory, which makes its whole process image mapped in. Other CPUs have similar master registers for the top-level of their paging support.
Memory-mapped files. Fig. 10-13, pg. 761
A region of a file can be mapped into a process image. Then writes to that part of VA space cause corresponding writes to the file pages. If two processes map the same region of a file, they end up with shared memory with data that persists in the filesystem. However, this is not commonly used in applications, partly because the solution is not portable, and partly because there are subtle issues such as exactly when the file writes occur.
Often, the memory-mapping mechanism is used for shared memory, ignoring the file itself. You can use /dev/zero, the OS-supplied effectively infinite file of zeroes, instead of a real file. See mmap_nofile.c.
Memory-related system
calls.
Most paging actions are done without memory-specific system calls, but malloc does need a system call to get memory assigned to the process. The underlying system call is brk. Memory-mapped
Finished
with memory management coverage.
Skipping Chap 4, on to Chap. 5, already partly covered.
Reading: Chap 5 to
pg. 332, 339-347, skip DMA, 348-360, Chap 10, Sec.
10.5: 771-773, 775-776
Block vs. char
devices (mainly a UNIX idea)
Each device under UNIX has a special file, also known as “device node”. Tan., pg. 734 example is “/dev/lp” for a line printer device. Classically, device nodes were kept in directory /dev. They are not ordinary files, but rather filenames and associated information about a device.
When you display them with “ls –l”, you see a “c” for char device or “b” for block device as the first character of the listing, as you would see a directory marked “d”. For example a line printer would be a char device:
ls –l /dev/lp
crw-rw-rw 1 root ... /dev/lp
On Solaris, the devices have been reorganized into many subdirectories by device type, and with symbolic links to other names, so it’s a bit hard to find the actual device nodes. For example, on ulab, we have /dev/board1, the serial line to mtip system 1. We have to follow two symbolic links before we find the device node:
blade57(6)% ls -l /dev/board1
lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf
blade57(7)% ls -l /dev/board1
lrwxrwxrwx 1 root 5 Apr 25 2008 /dev/board1 -> ttyrf
blade57(8)% ls -l /dev/ttyrf
lrwxrwxrwx 1 root 30 Sep 19 2002 /dev/ttyrf
-> ../devices/pseudo/ptsl@0:ttyrf
blade57(9)% ls -l /dev/../devices/pseudo/ptsl@0:ttyrf
crw-rw-rw- 1
root 26, 47 Dec
3 08:59 /dev/../devices/pseudo/ptsl@0:ttyrf
The c shows that we finally found the device node.