CS444 Class 10
Last time: how syscalls
work
One
instruction in user code, int $0x80 for hw2, causes execution to trap to the
kernel. The kernel executes, and when done doing the write action, it uses the
iret instruction to get execution back to next instruction in user code after
the int instruction.
Each
syscall has a syscall number:
From
/usr/include/sys/syscall.h:
#define
SYS_exit 1
#define
SYS_fork 2
#define
SYS_read 3
#define
SYS_write 4
#define
SYS_open 5
#define
SYS_close 6
When
your program prepares to do a write syscall, the syscall number 4 is put in the
eax register before the “int $0x80” instruction is executed. The kernel (in the syscall handler) sees the
4 and calls syswrite in the kernel, which does the needed work. Syswrite
returns to the syscall handler, which iret’s back to the user code.
The
Linux syscall linkage we will use in
hw2:
The
resulting success or error code is in eax at the end of this sequence.
How
can we get the needed registers loaded for a syscall?
Answer:
use assembler. Need it anyway for the
int instruction.
We’ll
implement “write” in assembler, and C can call it normally. Also “read”, etc. We’ll put these in a file ulib.s.
The
user C code has say write(dev, buf, nbytes) as in testio.c. C pushes args on the stack, then does the
call instruction to our assembler routine.
So in assembler, we need to pull those args off the stack and put them
in the specified registers.
Assembler/C argument
passing: In hw2, we need to go
from C to assembler in user code (calling into ulib.s), assembler to C in
kernel code (calling the C system call handler from the assembler envelope.)
Consider user C code
with “write(dev, buf, nbytes)” Note that this C code is compiled
into assembler code like this:
pushl nbytes
pushl buf
pushl dev
call _write (which pushl’s the return address)
On the x86, the stack grows
to lower addresses, so the nbytes value is at the highest address, buf lower,
dev lower than that, and finally the return address from the call is at the
lowest address, the current top of stack pointed to by esp. Thus we have
this picture, where addresses go from top to bottom:
user
stack when execution reaches _write:
esp->
return addr
first arg (to be put in ebx for syscall)
second arg (to be put in ecx)
third arg (to be put in edx)
Now the _write function in
ulib.s needs to pull the 3 args off the stack and put them in ebx, ecx, and
edx, and put the syscall number, 4 in eax, and then do “int $0x80”.
We need to preserve register
ebx to coexist with C, but are allowed to use the other three (the C “scratch
registers” eax, ecx and edx) at will. Thus we need to save the caller’s
ebx value at the start (by pushing it on the stack) and restore it at the end
of this function. That push of ebx changes the picture of the stack to
the following:
# user stack after
pushl %ebx, needed to preserve %ebx (not a C scratch reg)
# esp->
saved-ebx
# 4(esp) return addr
# 8(esp) first
arg (to be put in ebx for syscall)
#12(esp) second arg
(to be put in ecx)
#16(esp) third
arg (to be put in edx)
_write: pushl
%ebx
# save the value of ebx
movl
8(%esp),%ebx
# first arg in ebx
movl
12(%esp),%ecx
# second arg in ecx
movl 16(%esp),%edx
# third arg in edx
movl
$4,%eax
# syscall # in eax
int
$0x80
# trap to kernel
popl
%ebx
# restore the value of ebx
ret
It should be easy for you to add
_read and _exit following this pattern.
After the trap cycle,
execution arrives at _syscallhand in sysentry.s with the args and syscall # in
the registers. We need to provide this info to the C system call
handler. Luckily there is an old trick to help us out, using the fact
that C passes arguments on the stack for this platform. If we push the
register values on the stack and then call syscallc, syscallc will be called
the same way it would normally be called by another C fn. Thus all we
need to do is provide syscallc with 4 parameters and we can use those names to
access the spots on the stack we have created by pushing the 4 saved-registers.
_syscallhand: pushl
%edx # third arg first
pushl %ecx
pushl %ebx
pushl %eax
# syscall #
call _syscallc
popl %eax # possibly new value for eax
... 3 more popls
iret
As discussed in class,
another possibility is a “garbage pop” instead of popl %eax, that is, addl $4,
%esp, which just moves the stack pointer like a pop, but doesn’t change the
value of eax.
void syscallc(int
user_eax, int arg1, int arg2, int arg3)
{
/* temporary code to show values */
kprintf(“syscallc: syscall#=%d, arg1=%d, arg2=%x, arg3=%d\n”,
user_eax,
arg1, arg2, arg3);
…
Note that the hw2 assignment
document has a suggested plan of development, i.e., steps to follow, at its
end.
If you leave the final syscall return value in user_eax in C, then it will be on the stack and you do want the popl %eax (in _syscallhand after the call) to put it in %eax, whereas if you make it the syscallc’s return value, it will be in %eax for that and you want to use the garbage pop so it stays intact in eax in _syscallhand.
To the kernel, processes are
software objects, like a banking application regards a checking account.
Each process has a process table entry in the process table.
See Tan., pg. 92 for info in
the process entry. The “registers” here are better called “saved
registers”, because they are copies of the CPU registers at the moment
the process loses the CPU because it’s blocking or being preempted. These
saved registers save the CPU state for the next moment that the process is
scheduled, at which point the values are copied back into the CPU. Once
the CPU state is set back, and the address space set up again, the CPU
continues execution just where it left off. This is the basic
time-sharing mechanism that allows multiple processes all to think they have “the
CPU.”
Tan. intro’s interrupts at
this point, since they are crucial to the understanding of preemption and
unblocking. Luckily we already have studied them in the simpler situation
of standalone programming. Here are the steps from Tan., pg. 80:
Interrupt Processing steps (Interrupt cycle + interrupt handler execution)
Note that steps 1. and 2. are
part of the CPU interrupt cycle, and note that the CPU switches to the kernel
stack pointer before pushing these items on the stack, so they end up on the
kernel stack. Steps 3-8 constitute the interrupt handler.
Steps 4, 6, and 8 are new to
us:
Step 4: Actually this is
optional. Many OS kernels allow interrupt handlers to execute on the
kernel stack that the CPU just used to stack the PC.
Step 6: Scheduler loops
through process table entries, looking for highest priority ready/running
process. This is preparation for possible preemption.
Step 8: This is the real
process switch, where the CPU state is switched, along with the address-space
(like a brain transplant). This may or may not happen, since often the
same old process is allowed to continue execution.
We can list a step 9 here,
where the iret occurs. In the case of a process switch, this will be
executed later (if we are following the lifetime of the old process), after the
CPU state is restored for this process.
Next time: each process has a
kernel stack. The kernel isn’t a separate
program with its own stack. Instead, it separately handles system calls in
different processes, so they can operate independently, or nearly so.