CS444 Class 10

Last time: how syscalls work

One instruction in user code, int $0x80 for hw2, causes execution to trap to the kernel. The kernel executes, and when done doing the write action, it uses the iret instruction to get execution back to next instruction in user code after the int instruction.

Each syscall has a syscall number:

From /usr/include/sys/syscall.h:

#define SYS_exit 1

#define SYS_fork 2

#define SYS_read 3

#define SYS_write 4

#define SYS_open 5

#define SYS_close 6

When your program prepares to do a write syscall, the syscall number 4 is put in the eax register before the “int $0x80” instruction is executed. The kernel (in the syscall handler) sees the 4 and calls syswrite in the kernel, which does the needed work. Syswrite returns to the syscall handler, which iret’s back to the user code.

The Linux syscall linkage we will use in hw2:

the syscall # is put in eax
the syscall args are put in ebx (first), ecx (second), and edx (third).
int $0x80 is the syscall instruction, executed with eax, ebx, ecx, and edx set up as above

The resulting success or error code is in eax at the end of this sequence.

How can we get the needed registers loaded for a syscall?

Answer: use assembler. Need it anyway for the int instruction.

We’ll implement “write” in assembler, and C can call it normally. Also “read”, etc. We’ll put these in a file ulib.s.

The user C code has say write(dev, buf, nbytes) as in testio.c. C pushes args on the stack, then does the call instruction to our assembler routine. So in assembler, we need to pull those args off the stack and put them in the specified registers.

Assembler/C argument passing: In hw2, we need to go from C to assembler in user code (calling into ulib.s), assembler to C in kernel code (calling the C system call handler from the assembler envelope.)

Consider user C code with “write(dev, buf, nbytes)” Note that this C code is compiled into assembler code like this:

pushl nbytes

pushl buf

pushl dev

call _write (which pushl’s the return address)

On the x86, the stack grows to lower addresses, so the nbytes value is at the highest address, buf lower, dev lower than that, and finally the return address from the call is at the lowest address, the current top of stack pointed to by esp. Thus we have this picture, where addresses go from top to bottom:

user stack when execution reaches _write:

esp-> return addr

first arg (to be put in ebx for syscall)

second arg (to be put in ecx)

third arg (to be put in edx)

Now the _write function in ulib.s needs to pull the 3 args off the stack and put them in ebx, ecx, and edx, and put the syscall number, 4 in eax, and then do “int $0x80”.

We need to preserve register ebx to coexist with C, but are allowed to use the other three (the C “scratch registers” eax, ecx and edx) at will. Thus we need to save the caller’s ebx value at the start (by pushing it on the stack) and restore it at the end of this function. That push of ebx changes the picture of the stack to the following:

# user stack after pushl %ebx, needed to preserve %ebx (not a C scratch reg)

# esp-> saved-ebx

# 4(esp) return addr

# 8(esp) first arg (to be put in ebx for syscall)

#12(esp) second arg (to be put in ecx)

#16(esp) third arg (to be put in edx)

_write: pushl %ebx # save the value of ebx

movl 8(%esp),%ebx # first arg in ebx

movl 12(%esp),%ecx # second arg in ecx

movl 16(%esp),%edx # third arg in edx

movl $4,%eax # syscall # in eax

int $0x80 # trap to kernel

popl %ebx # restore the value of ebx

ret

It should be easy for you to add _read and _exit following this pattern.

After the trap cycle, execution arrives at _syscallhand in sysentry.s with the args and syscall # in the registers. We need to provide this info to the C system call handler. Luckily there is an old trick to help us out, using the fact that C passes arguments on the stack for this platform. If we push the register values on the stack and then call syscallc, syscallc will be called the same way it would normally be called by another C fn. Thus all we need to do is provide syscallc with 4 parameters and we can use those names to access the spots on the stack we have created by pushing the 4 saved-registers.

_syscallhand: pushl %edx # third arg first

pushl %ecx

pushl %ebx

pushl %eax # syscall #

call _syscallc

popl %eax # possibly new value for eax

... 3 more popls

iret

As discussed in class, another possibility is a “garbage pop” instead of popl %eax, that is, addl $4, %esp, which just moves the stack pointer like a pop, but doesn’t change the value of eax.

void syscallc(int user_eax, int arg1, int arg2, int arg3)

{

/* temporary code to show values */

kprintf(“syscallc: syscall#=%d, arg1=%d, arg2=%x, arg3=%d\n”,

user_eax, arg1, arg2, arg3);

…

Note that the hw2 assignment document has a suggested plan of development, i.e., steps to follow, at its end.

If you leave the final syscall return value in user_eax in C, then it will be on the stack and you do want the popl %eax (in _syscallhand after the call) to put it in %eax, whereas if you make it the syscallc’s return value, it will be in %eax for that and you want to use the garbage pop so it stays intact in eax in _syscallhand.

Back to Tan. Chap. 2

Process Table Entries

To the kernel, processes are software objects, like a banking application regards a checking account. Each process has a process table entry in the process table.

See Tan., pg. 92 for info in the process entry. The “registers” here are better called “saved registers”, because they are copies of the CPU registers at the moment the process loses the CPU because it’s blocking or being preempted. These saved registers save the CPU state for the next moment that the process is scheduled, at which point the values are copied back into the CPU. Once the CPU state is set back, and the address space set up again, the CPU continues execution just where it left off. This is the basic time-sharing mechanism that allows multiple processes all to think they have “the CPU.”

Tan. intro’s interrupts at this point, since they are crucial to the understanding of preemption and unblocking. Luckily we already have studied them in the simpler situation of standalone programming. Here are the steps from Tan., pg. 80:

Interrupt Processing steps (Interrupt cycle + interrupt handler execution)

CPU stacks PC, etc. ß-on kernel stack, see below
CPU loads new PC from int. vector
As fn saves registers
As fn sets up new stack or borrows a stack
C interrupt handler runs, typically reads and buffers input
Scheduler (a C fn) decides which process is to run next (decision only, i.e., figure out pid)
C fn returns to as fn
As fn starts up (dispatches) new current process (and saves CPU state for old process in old process entry before loading CPU state from new process entry) (often the new process is the same as the old one, so this is a nop)

Note that steps 1. and 2. are part of the CPU interrupt cycle, and note that the CPU switches to the kernel stack pointer before pushing these items on the stack, so they end up on the kernel stack. Steps 3-8 constitute the interrupt handler.

Steps 4, 6, and 8 are new to us:

Step 4: Actually this is optional. Many OS kernels allow interrupt handlers to execute on the kernel stack that the CPU just used to stack the PC.

Step 6: Scheduler loops through process table entries, looking for highest priority ready/running process. This is preparation for possible preemption.

Step 8: This is the real process switch, where the CPU state is switched, along with the address-space (like a brain transplant). This may or may not happen, since often the same old process is allowed to continue execution.

We can list a step 9 here, where the iret occurs. In the case of a process switch, this will be executed later (if we are following the lifetime of the old process), after the CPU state is restored for this process.

Next time: each process has a kernel stack. The kernel isn’t a separate program with its own stack. Instead, it separately handles system calls in different processes, so they can operate independently, or nearly so.