CS444 class 9

CS444 Class 9 Process States, System Calls

UNIX Process Hierarchies (no such concept in Windows)

UNIX example: shell runs mtip. mtip forks, and parent-mtip runs the “keymon” loop, handling user input, while the child-mtip runs the “linemon” loop, shuttling chars from the SAPC to the user. Separate processes for separate types of inputs works nicely in a case like this where the actions for the two types are completely independent: all we have to do for the chars from the SAPC is put them on the screen, regardless of user input. We want two processes so that the hanging read from one input source doesn’t prevent us from reading from the other source.

Resulting process group: mtip is child of shell, itself forks a child--

shell --waiting for child termination, in waitpid

mtip running in keymon function --waiting in read from user (on stdin)

mtip running in linemon function --waiting in read from line to SAPC

Why two processes here? Because we have two input sources to attend to at the same time. With two processes, one can attend to chars coming from the user and the other to the chars coming from the SAPC. Threads could also be used here.

We know that fork created the child mtip and got it running. How does this application cleanly exit? The user types “~q” and keymon sees it. However, as parent, it has no special power over the child to get it to exit. But it does know its pid, from the fork, so it can send the child a kill signal with kill(pid, 9), forcing it to exit. A child could even kill its parent knowing its pid (there is a syscall getppid), but that’s not good practice.

Not covered in class: mtip uses a system call to put stdin into “raw mode” in order to get each user-typed char as soon as possible, and this means it gets control-C as an ordinary data character. If it sees two control-C’s in a row, it exits (and signals the other process to exit too.)

Windows: no process groups or parent-child relationships, but one process can control another via its process handle.

Process States

1. Running (actually using the CPU at that instant).

2. Ready (runnable; temporarily stopped to let another process run)

3. Blocked (unable to run until some event happens, often an i/o event)

For example, suppose a program has a read(...) from the user, and when it’s executed, the user has not yet entered anything. The read blocks on input, that is, the code in the kernel for read does a block action on the process, putting it into a wait, or blocked state. Then the kernel finds another process to run among the Ready processes, and schedules it, making it Running. Or the kernel goes idle, just waiting for an interrupt.

Fig. 2-2 is the classic process-state transition diagram for multiprogramming systems. We can name each arrow with a verb:

Running->Blocked: block (start waiting)

Blocked->Ready: unblock (stop waiting, becoming ready to run)

Ready->Running: schedule (scheduler chooses this process to run)

Running->Ready: preempt (scheduler chooses another process to run, even though this one could use the CPU more)

Later, the user finishes the requested input, and an interrupt handler for the input device runs, and does an unblock action on the process. The process then enters the set of Ready processes, and sometime later will be chosen to run.

Preemption occurs when the CPU is taken away from a process that could continue using it.

Example: Back to Fig. 2-1 we looked at earlier: four processes want to use the CPU constantly, i.e., they are “CPU bound”. Each gets to run for a while in turn, for a time known as the “CPU quantum”, typically 50ms. The point at which one process loses the CPU in this case is a preemption and causes the process to go from Running to Ready, while the other goes from Ready to Running.

Question: where are the interrupts here?

Answer: the interrupts execute between any two instructions of the code of a process (user or kernel code) and the resulting execution of the interrupt handler is a kernel-code execution not part of any process. All interrupt handlers are kernel code in a modern OS. There is no special execution environment set up for the interrupt handler like there is for a process (the virtual machine.) Instead, it “borrows” the current memory set-up from the process that it interrupts, for just the few moments that it executes the interrupt handler.

We didn’t do the following sequence in class, but a related one with a single process.

Example of changing process states: Case of one processor

Process A: CPU-bound the whole time (tight loop in user code)

Process B: about to read from user, block, eventually unblock

Timeline showing process lifetimes and also i/o interrupts.

Key: _____running (in user or kernel code)

-----ready

.....blocked

char \n

input input

Int Int

A _____----________________V________V__----____

B -----____..........................--____....

times: a b c d e f

a: preempt of A, schedule of B

b: block of B, schedule of A

c: interrupt for char input, buffered, not yet given to process, so no effect on B

d: interrupt for char input, buffered, and end of line, so provided to process, B unblocked

e: preempt of A, schedule of B, so it reads input line, computes for a little while

f: B blocks on input again, A scheduled

Note how interrupts ride on the currently-running process, running the interrupt handler execution between two instructions of the currently-running process. When process A is interrupted, the interrupt handler runs with process A all available in (user) memory. The char that B is waiting for is delivered, causing an interrupt, while A is running. This is typical—each process is bombarded with interrupts for other processes’ data, and is usually blocked when interrupts for its own data come in.

The interrupt handler is kernel code and uses only kernel data, and purposely ignores the current process image that it is “borrowing”.

System Calls: handling arguments and return values

Look at details of system call mechanism, like Tanenbaum pg. 51 but using the (old) Linux syscall linkage we will use in hw2:

the syscall # is put in eax
the syscall args are put in ebx (first), ecx (second), and edx (third).
int $0x80 is the syscall instruction, executed with eax, ebx, ecx, and edx set up as above

The resulting success or error code is in eax at the end of this sequence.

The system call instruction is a certain instruction in the CPU instruction set, designed for use as a system call by the CPU designer, i.e. Intel for x86. On x86, it’s the int instruction, or sysenter or syscall on newer processors. Its job is to cause execution to jump out of user execution into the kernel in a safe way, much like an interrupt causes kernel execution of the interrupt handler between two instructions in user code (interrupts can also happen in kernel code.)

Recall slide 6 from the interrupts handout: CPU Interrupt Handling. The Trap cycle is very similar, and in fact the interrupt cycle is implemented in the CPU by creating an int instruction out of thin air and putting it in the CPU’s pipeline.

The system call instruction causes the CPU to go through its trap cycle, just like its interrupt cycle, except that nn is obtained from the int instruction operand (0x80 here), and IF is not changed. Instead of “interrupt handler”, we say “trap handler”.

The CPU Trap cycle—execution of int $nn—from user mode with IF=1, the normal user CPU state (hw2: from kernel mode with IF=1)

Saves CPU state (CS, EIP and EFLAGS)
Gets ID (“nn”) from instruction operand
Uses nn to look up address of trap handler in IDT[nn] and put in EIP
CPU enters kernel mode (if not already there) with IF=1

Then after the trap cycle, the new EIP value makes the trap handler execute--

The trap handler services the system call and ends with a special instruction “iret” on x86 to restore previously saved state and continue execution just after the int $nn in user code.

Note on hw2 vs. real OS

Note that in hw2, all our execution, in user or kernel code, is executed in kernel mode of the CPU. The int 0x80 instruction still traps, but doesn't change the CPU execution mode. In a real OS, the user code executes in user mode on the CPU, disallowing all privileged instructions. The int instruction (or ta on Sparc, etc.) changes the CPU mode from user to kernel, and the iret (or equivalent) reverts it to user mode at the end of the system call. This isolation of the user code in user mode is a powerful methodology for system security. Even "privileged users" such as root on UNIX and Administrator on Windows, are executing code in user mode on the CPU. Only the kernel code is allowed to execute in kernel mode.

Steps in system call execution

As in Tan, pg. 51, consider read(dev, buf, nbytes) being executed (dev instead of fd for hw2)

User code in C has read(dev, buf, nbytes), calls read in user library (ulib.s in hw2)
Lib as fn copies dev, buf, nbytes values off stack into registers, puts 3 in eax as syscall number
Trap to kernel by executing int $0x80
CPU trap cycle for int instruction: save CPU state on stack, get new EIP from trap vector in IDT[0x80]. This address is _syscallhand, the kernel’s system call handler in hw2.
_syscallhand is the entry point in the as syscall handler—this fn executes, saves registers on the stack, calls C syscall handler, _syscallc.
syscallc is in tunix.c, the main C code for the kernel in hw2. It accesses the saved registers via its args (more on this below), determines from the syscall number that it should call sysread. sysread code = read code from hw1. sysread executes, does the input, returns to syscallc.
return to _syscallhand, restore registers (including setting eax for return value), do final iret.
iret restores EIP and EFLAGS, causing transition back to user mode and poised to execute next instruction after int $0x80.
finish lib as fn, which ends with “ret”. Ret restores saved EIP, causing function return.
return to user C code that called read, just after the call to read.

Assembler/C argument passing: In hw2, we need to go from C to assembler in user code (calling into ulib.s), assembler to C in kernel code (calling the C system call handler from the assembler envelope.)

Consider user C code with “write(dev, buf, nbytes)” Note that this C code is compiled into assembler code like this:

pushl nbytes

pushl buf

pushl dev

call _write (which pushl’s the return address)

On the x86, the stack grows to lower addresses, so the nbytes value is at the highest address, buf lower, dev lower than that, and finally the return address from the call is at the lowest address, the current top of stack pointed to by esp. Thus we have this picture, where addresses go from top to bottom:

user stack when execution reaches _write:

esp-> return addr

first arg (to be put in ebx for syscall)

second arg (to be put in ecx)

third arg (to be put in edx)

Now the _write function in ulib.s needs to pull the 3 args off the stack and put them in ebx, ecx, and edx, and put the syscall number, 4 in eax, and then do “int $0x80”.

We need to preserve register ebx to coexist with C, but are allowed to use the other three (the C “scratch registers” eax, ecx and edx) at will. Thus we need to save the caller’s ebx value at the start (by pushing it on the stack) and restore it at the end of this function. That push of ebx changes the picture of the stack to the following:

# user stack after pushl %ebx, needed to preserve %ebx (not a C scratch reg)

# esp-> saved-ebx

# 4(esp) return addr

# 8(esp) first arg (to be put in ebx for syscall)

#12(esp) second arg (to be put in ecx)

#16(esp) third arg (to be put in edx)

_write: pushl %ebx # save the value of ebx

movl 8(%esp),%ebx # first arg in ebx

movl 12(%esp),%ecx # second arg in ecx

movl 16(%esp),%edx # third arg in edx

movl $4,%eax # syscall # in eax

int $0x80 # trap to kernel

popl %ebx # restore the value of ebx

ret

It should be easy for you to add _read and _exit following this pattern.