CS444 Class 10
Reading: Chap 2, Sections
2.1, 2.2: all to pg. 106, skip 2.2.4, read 109-110, skip 2.2.6, .7, .8. read
114-end of 2.2.
Start reading 2.3.
Last time: how syscalls
work
One
instruction in user code, int $0x80 for hw2, causes execution to trap to the
kernel. The kernel executes, and when done doing the write action, it uses the
iret instruction to get execution back to next instruction in user code after
the int instruction.
Each
syscall has a syscall number:
From
/usr/include/sys/syscall.h on ulab:
#define
SYS_exit 1
#define
SYS_fork 2
#define
SYS_read 3
#define
SYS_write 4
#define
SYS_open 5
#define
SYS_close 6
When
your program prepares to do a write syscall, the syscall number 4 is put in the
eax register before the “int $0x80” instruction is executed. The kernel (in the syscall handler) sees the
4 and calls syswrite in the kernel, which does the needed work. Syswrite
returns to the syscall handler, which iret’s back to the user code.
The
Linux syscall linkage we will use in
hw2:
The
resulting success or error code is in eax at the end of this sequence.
How
can we get the needed registers loaded for a syscall?
Answer:
use assembler. Need it anyway for the
int instruction.
We’ll
implement “write” in assembler, and C can call it normally. Also “read”, etc. We’ll put these in a file ulib.s.
The
user C code has say write(dev, buf, nbytes) as in testio.c. C pushes args on the stack, then does the
call instruction to our assembler routine.
So in assembler, we need to pull those args off the stack and put them
in the specified registers.
Last
time we looked at the details for calling write in ulib.s. The provided ulib.s
contains that code. It should be easy for
you to add _read and _exit following this pattern.
Horizontal Stack pic, after
pushl %ebx to save %ebx: A=0 is way to
the left.
%esp 8(%esp) 12(%esp) 16(%esp)
| | |
|
V v v
v
Saved-ebx | ret-addr
| arg-1 | arg-2 | arg-3 | older-stack-contents
Note: This topic is covered
in Lecture 7 of cs341. Prof. Wilson discusses the full i386-gcc
function call linkage which includes setting up a frame pointer in %ebp, and
then using the frame pointer to access the arguments by 8(%ebp), 12(%ebp),
etc. We could do this, but it’s not
required—all we really need to do is preserve %ebp for C by not using it. My example code accessed the arguments by
offsetting from the stack pointer %esp after a push of %ebx (not to be confused
with %ebp!), that is, %8(%esp), %12(%esp), etc.
You are free to set up and use %ebp if you want.
Transitions to the Kernel from user code
System call instruction à trap cycle à arrive at IDT[0x80] =
_syscallhand, like
COM2 interrupt à interrupt cycle à arrive at IDT[0x23] = _irq3inthand
After the trap cycle,
execution arrives at _syscallhand in sysentry.s with the args and syscall # in
the registers. We need to provide this info to the C system call
handler. Luckily there is an old trick to help us out, using the fact
that C passes arguments on the stack for this platform. If we push the
register values on the stack and then call syscallc, syscallc will be called the
same way it would normally be called by another C fn. Thus all we need to
do is provide syscallc with 4 parameters and we can use those names to access
the spots on the stack we have created by pushing the 4 saved-registers.
_syscallhand: pushl
%edx # third arg first
pushl %ecx
pushl %ebx
pushl
%eax
# syscall #
call _syscallc
popl %eax # possibly new value for eax
... 3 more popls
iret
Stack pic, when execution
reaches syscallc: u short for “user”
%esp
4(%esp) 8(%esp) 12(%esp)
| |
| |
V v
v v
| ret-addr | u-eax |
u-ebx | u-ecx | older-stack-contents
Syscallc interprets the stack
like this:
%esp
4(%esp) 8(%esp) 12(%esp)
| | |
|
V v v
v
| ret-addr | arg-1 |
arg-2 | arg-3 | older-stack-contents
(or sets up %ebp
first and uses it to find thess sport on the stack)
We end up being able to
access the three user registers as args:
void syscallc(int
user_eax, int arg1, int arg2, int arg3)
{
/* temporary code to show values */
kprintf(“syscallc: syscall#=%d, arg1=%d, arg2=%x, arg3=%d\n”,
user_eax,
arg1, arg2, arg3);
…
Getting back to the user code, after syscallc returns:
_syscallhand: pushl
%edx # third arg first
pushl %ecx
pushl %ebx
pushl
%eax
# syscall #
call _syscallc
popl %eax # possibly new value for eax
... 3 more popls
iret
If you leave the final
syscall return value in user_eax in C, then it will be on the stack and you do
want the popl %eax (in _syscallhand after the call) to put it in %eax.
Recall that the C return
value for a function is found in %eax. If you make syscall return value the
syscallc’s return value, it will be in %eax for that and you need to keep it
intact through to the iret, not wipe it out with popl %eax.
Note that the hw2 assignment
document has a suggested plan of development, i.e., steps to follow, at its
end.
To the kernel, processes are
software objects, like a banking application regards a checking account.
Each process has a process table entry in the process table.
See Tan., pg. 92 for info in
the process entry. The “registers” here are better called “saved
registers”, because they are copies of the CPU registers at the moment
the process loses the CPU because it’s blocking or being preempted. These
saved registers save the CPU state for the next moment that the process is
scheduled, at which point the values are copied back into the CPU. Once
the CPU state is set back, and the address space set up again, the CPU
continues execution just where it left off. This is the basic
time-sharing mechanism that allows multiple processes all to think they have
“the CPU.”
Tan. intro’s interrupts at
this point, since they are crucial to the understanding of preemption and
unblocking. Luckily we already have studied them in the simpler situation
of standalone programming. Here are the steps from Tan., pg. 93:
Interrupt Processing steps (Interrupt cycle + interrupt handler execution)
Note that steps 1. and 2. are
part of the CPU interrupt cycle, and note that the CPU switches to the kernel
stack pointer before pushing these items on the stack, so they end up on the
kernel stack. Steps 3-8 constitute the interrupt handler.
Steps 4, 6, and 8 are new to
us:
Step 4: Actually this is
optional. Many OS kernels allow interrupt handlers to execute on the
kernel stack that the CPU just used to stack the PC.
Step 6: Scheduler loops
through process table entries, looking for highest priority ready/running
process. This is preparation for possible preemption.
Step 8: This is the real
process switch, where the CPU state is switched, along with the address-space
(like a brain transplant). This may or may not happen, since often the
same old process is allowed to continue execution.
We can list a step 9 here,
where the iret occurs. In the case of a process switch, this will be
executed later (if we are following the lifetime of the old process), after the
CPU state is restored for this process.
Each process has a kernel
stack. The kernel isn’t a separate
program with its own stack. Instead, it separately handles system calls in
different processes, so they can operate independently, or nearly so.
See handout, add the bouncing ball from the first handout.
First consider a
single-threaded, or traditional, UNIX or Windows process. It has one user stack
in user space, and one kernel stack in kernel space, as shown on the handout.
The user stack shows up all
by itself, because there is only the one user stack in a single-threaded user
image. Other processes have their own
user images, also with one user stack if they are single-threaded. But the kernel image is shared, so all the
kernel stacks must be at various different addresses in the kernel data area.
Just like there has to be a
separate place for each process to hold its set of saved registers (in its
process table entry), each process also needs its own kernel stack, to work as
its execution stack when it is executing in the kernel.
For example, if a process is
doing a read syscall, it is executing the kernel code for read, and needs a
stack to do this. It could block on user input, and give up the CPU, but
that whole execution environment held on the stack (and in the saved CPU state
in the process table entry) has to be saved for its later use. Another
process could run meanwhile and do its own syscall, and then it needs its own
kernel stack, separate from that blocked reader’s stack, to support its own
kernel execution. When a process is unblocked, it starts again using the stack
where it left off when blocked.
Similarly if preempted.
Since threads can also do
system calls, each needs a kernel stack as well.
In Linux, the process/thread
table entry and kernel stack are bundled up in one block of memory for each
thread. Other OS’s organize the memory differently, but still have both
of these for each process/thread.
Sometimes the kernel stack is completely empty,
notably when the process is executing user code. Then when it does a
system call, the kernel stack starts growing, and later shrinking back to
nothing at the system call return.
The kernel stack (of the
currently running process or thread) is also used by interrupt handlers
The kernel stack is also used
for interrupt handler execution, for the interrupts that occur while a
particular thread is running. As we have talked about already, the
interrupts are almost always doing something for another, blocked
process/thread. After all, that process is blocked waiting for something
to happen, and all the hardware happenings are signaled by interrupts.
So interrupt handlers “borrow”
the current process’s kernel stack to do their own execution. When they
finish, the kernel stack is back to its previous state, empty if the current
process is running at user level or non-empty if it was running in some system
call. Note that interrupt handlers
are not themselves allowed to block, so their execution is not delayed that
way. They can be involved in process switches, as shown by step 8 above,
but only just as they are returning, not in the middle of their work, so their
changes to system state are complete.
Kernel code types: Interrupt handler code vs. system
call code
We are beginning to see that
system call code and interrupt handler code are somewhat differently handled
kinds of kernel code. System call code is “normal” kernel code, allowed to
block as needed. Interrupt handler code
is special, not allowed to block, so it runs very fast and completes its work,
leaving the stack (almost) the way it found it. The “(almost)” comes from the
fact that this process can be preempted at the very end of the interrupt
handler, leaving a little runt contribution from the interrupt handler on the
top of the stack of the preempted process/thread. But this little stack part is
cleared naturally when the preempted process is rescheduled.