Tues., Oct. 3
Look
at details of system
call mechanism, like Tan. pg. 46 but using the Linux
syscall linkage
we will use in hw2:
The
system
call instruction is a certain instruction in the CPU
instruction set, designed
for use as a system call by the CPU designer, i.e. Intel for x86, Sun
for Sparc. On x86, it’s
the int instruction, on
Sparc,
it’s the ta
(trap always) instruction. Its
job is to cause execution to jump out of
user execution into the kernel in a safe way, much like an interrupt
causes
kernel execution of the interrupt handler between two instructions in
user code
(interrupts can also happen in kernel code.)
Recall
slide 6 from the interrupts
handout: CPU Interrupt Handling, with a few additions here:
The
CPU Interrupt cycle--
Then
after the interrupt cycle, the new
EIP value makes the ISR (int. handler) run--
ISR/interrupt handler (same thing)
services the
interrupt, including resetting the interrupt controller; ends with a
special
instruction “iret”
on x86 to restore previously saved
state and resume from point of interrupt
The
system call instruction causes the CPU
to go through its trap cycle, just like its interrupt cycle, except
that nn is obtained
from the int
instruction operand (0x80 here), and IF is not changed.
Instead of “interrupt handler”, we say
“trap handler”.
The
CPU Trap cycle—execution
of int $nn--
Then
after the trap cycle, the new EIP
value makes the trap handler run--
The trap handler services the system call and ends with a special instruction “iret” on x86 to restore previously saved state and continue execution just after the int $nn in user code.
Note on hw2 vs. real OS
Note that in hw2, all our execution, in user or kernel code, is executed in kernel mode of the CPU. The int 0x80 instruction still traps, but doesn't change the CPU execution mode. In a real OS, the user code executes in user mode on the CPU, disallowing all privileged instructions. The int instruction (or ta on Sparc, etc.) changes the CPU mode from user to kernel, and the iret (or equivalent) reverts it to user mode at the end of the system call. This isolation of the user code in user mode is a powerful methodology for system security. Even "privileged users" such as root on UNIX and Administrator on Windows, are executing code in user mode on the CPU. Only the kernel code is allowed to execute in kernel mode.
As
in Tan, pg. 46, consider read(dev,
buf, nbytes)
being executed (dev instead of fd
for hw2)
Assembler/C argument
passing: In hw2, we need to go from C to assembler in
user code (calling into ulib.s), assembler to C in
kernel code (calling the C system call handler from the assembler envelope.)
Consider user C code with “write(dev, buf, nbytes)” Note that this C code is compiled into
assembler code like this:
pushl
nbytes
pushl
buf
pushl
dev
call _write (which pushl’s
the return address)
On the x86, the stack grows
to lower addresses, so the nbytes value is at the
highest address, buf lower, dev lower than that, and finally the return address from the call is
at the lowest address, the current top of stack pointed to by esp. Thus we have this picture, where addresses go
from top to bottom:
user stack when execution reaches _write:
esp-> return addr
first arg (to be put in ebx for syscall)
second arg (to be put in ecx)
third arg (to be put in edx)
Now the _write function in ulib.s needs to pull the 3 args
off the stack and put them in ebx, ecx, and edx, and put the syscall number, 4 in eax, and
then do “int $0x80”.
We need to preserve register ebx to coexist with C, but are allowed to use the other
three (the C “scratch registers” eax, ecx and edx) at will. Thus we need to save the caller’s ebx value at the start (by pushing it on the stack) and restore
it at the end of this function. That push of ebx changes the
picture of the stack to the following:
# user
stack after pushl %ebx,
needed to preserve %ebx (not a C scratch reg)
# esp-> saved-ebx
# 4(esp) return addr
# 8(esp) first arg (to be put in ebx for syscall)
#12(esp) second arg
(to be put in ecx)
#16(esp) third arg (to be put in edx)
_write: pushl %ebx # save the value of ebx
movl
8(%esp),%ebx # first arg
in ebx
movl
12(%esp),%ecx # second arg
in ecx
movl
16(%esp),%edx # third arg
in edx
movl
$4,%eax # syscall # in eax
int
$0x80 # trap to
kernel
popl %ebx # restore the value of ebx
ret
It should be easy for you to
add _read and _exit following this pattern.
After the trap cycle,
execution arrives at _syscallhand in sysentry.s with the args and syscall # in the registers.
We need to provide this info to the C system call handler. Luckily there is an old trick to help us out, using the fact that C passes arguments on the stack for
this platform. If we push the register
values on the stack and then call syscallc, syscallc will be called the same way it would normally be
called by another C fn. Thus all we need to do is provide syscallc with 4 parameters and we can use those names to
access the spots on the stack we have created by pushing the 4 saved-registers.
_syscallhand:
pushl %edx #
third arg first
pushl
%ecx
pushl
%ebx
pushl
%eax #
syscall #
call _syscallc
popl
%eax #
possibly new value for eax
...
3 more popls
iret
void syscallc(int user_eax, int
arg1, int arg2, int arg3)
{
/* temporary code to show values */
kprintf(“syscallc: syscall#=%d,
arg1=%d, arg2=%x, arg3=%d\n”,
user_eax, arg1, arg2, arg3);
…
Note that hw2.txt has a suggested plan of development, i.e., steps to follow, at its end.