CS444 Class 10

Reading: Chap 2, Sections 2.1, 2.2: all to pg. 106, skip 2.2.4, read 109-110, skip 2.2.6, .7, .8. read 114-end of 2.2.

Start reading 2.3.

 

 

Last time: how syscalls work

One instruction in user code, int $0x80 for hw2, causes execution to trap to the kernel. The kernel executes, and when done doing the write action, it uses the iret instruction to get execution back to next instruction in user code after the int instruction.

 

Each syscall has a syscall number:

 

From /usr/include/sys/syscall.h on ulab:

#define SYS_exit        1

#define SYS_fork        2

#define SYS_read        3

#define SYS_write       4

#define SYS_open        5

#define SYS_close       6

 

When your program prepares to do a write syscall, the syscall number 4 is put in the eax register before the “int $0x80” instruction is executed.  The kernel (in the syscall handler) sees the 4 and calls syswrite in the kernel, which does the needed work. Syswrite returns to the syscall handler, which iret’s back to the user code.

 

The Linux syscall linkage we will use in hw2:

The resulting success or error code is in eax at the end of this sequence.

 

How can we get the needed registers loaded for a syscall?

Answer: use assembler.  Need it anyway for the int instruction. 

 

We’ll implement “write” in assembler, and C can call it normally.  Also “read”, etc.  We’ll put these in a file ulib.s.

 

The user C code has say write(dev, buf, nbytes) as in testio.c.  C pushes args on the stack, then does the call instruction to our assembler routine.  So in assembler, we need to pull those args off the stack and put them in the specified registers.

 

Last time we looked at the details for calling write in ulib.s. The provided ulib.s contains that code. It should be easy for you to add _read and _exit following this pattern.

 

Horizontal Stack pic, after pushl %ebx to save %ebx:  A=0 is way to the left.

 

  %esp                8(%esp) 12(%esp) 16(%esp)

    |                   |         |      |

    V                   v         v      v

 

Saved-ebx | ret-addr | arg-1 |  arg-2 | arg-3 | older-stack-contents

 

 

Note: This topic is covered in Lecture 7 of cs341.  Prof. Wilson discusses the full i386-gcc function call linkage which includes setting up a frame pointer in %ebp, and then using the frame pointer to access the arguments by 8(%ebp), 12(%ebp), etc.  We could do this, but it’s not required—all we really need to do is preserve %ebp for C by not using it.  My example code accessed the arguments by offsetting from the stack pointer %esp after a push of %ebx (not to be confused with %ebp!), that is, %8(%esp), %12(%esp), etc.  You are free to set up and use %ebp if you want.

 

Transitions to the Kernel from user code

 

System call instruction à trap cycle à arrive at IDT[0x80] =  _syscallhand, like

COM2 interrupt à interrupt cycle à arrive at IDT[0x23] = _irq3inthand

 

After the trap cycle, execution arrives at _syscallhand in sysentry.s with the args and syscall # in the registers.  We need to provide this info to the C system call handler.  Luckily there is an old trick to help us out, using the fact that C passes arguments on the stack for this platform.  If we push the register values on the stack and then call syscallc, syscallc will be called the same way it would normally be called by another C fn.  Thus all we need to do is provide syscallc with 4 parameters and we can use those names to access the spots on the stack we have created by pushing the 4 saved-registers.

 

_syscallhand: pushl %edx      # third arg first

      pushl %ecx

      pushl %ebx

      pushl %eax              # syscall #

      call _syscallc

      popl %eax  # possibly new value for eax

      ...   3 more popls  

      iret

 

Stack pic, when execution reaches syscallc: u short for “user”

 

  %esp      4(%esp)  8(%esp) 12(%esp)

    |         |         |      |

    V         v         v      v

 

| ret-addr | u-eax |  u-ebx | u-ecx | older-stack-contents

 

Syscallc interprets the stack like this:

 

  %esp      4(%esp) 8(%esp)  12(%esp)

    |          |         |     |

    V          v         v      v

 

| ret-addr | arg-1 |  arg-2 | arg-3 | older-stack-contents

 

(or sets up %ebp first and uses it to find thess sport on the stack)

 

We end up being able to access the three user registers as args:

 

void syscallc(int user_eax, int arg1, int arg2, int arg3)

{

    /* temporary code to show values */

    kprintf(“syscallc: syscall#=%d, arg1=%d, arg2=%x, arg3=%d\n”,

                  user_eax, arg1, arg2, arg3);

 

Getting back to the user code, after syscallc returns:

 

_syscallhand: pushl %edx      # third arg first

      pushl %ecx

      pushl %ebx

      pushl %eax              # syscall #

      call _syscallc

      popl %eax  # possibly new value for eax

      ...   3 more popls  

      iret

 

If you leave the final syscall return value in user_eax in C, then it will be on the stack and you do want the popl %eax (in _syscallhand after the call) to put it in %eax.

 

Recall that the C return value for a function is found in %eax. If you make syscall return value the syscallc’s return value, it will be in %eax for that and you need to keep it intact through to the iret, not wipe it out with popl %eax.

 

Note that the hw2 assignment document has a suggested plan of development, i.e., steps to follow, at its end.

 

 

Back to Tan. Chap. 2

 

Process Table Entries

To the kernel, processes are software objects, like a banking application regards a checking account.  Each process has a process table entry in the process table.

 

See Tan., pg. 92 for info in the process entry.  The “registers” here are better called “saved registers”, because they are copies of the CPU registers at the moment the process loses the CPU because it’s blocking or being preempted.  These saved registers save the CPU state for the next moment that the process is scheduled, at which point the values are copied back into the CPU.  Once the CPU state is set back, and the address space set up again, the CPU continues execution just where it left off.   This is the basic time-sharing mechanism that allows multiple processes all to think they have “the CPU.”

 

Tan. intro’s interrupts at this point, since they are crucial to the understanding of preemption and unblocking.  Luckily we already have studied them in the simpler situation of standalone programming.  Here are the steps from Tan., pg. 93:

 

Interrupt Processing steps  (Interrupt cycle + interrupt handler execution)

 

  1. CPU stacks PC, etc. ß-on kernel stack, see below
  2. CPU loads new PC from int. vector
  3. As fn saves registers
  4. As fn sets up new stack or borrows a stack
  5. C interrupt handler runs, typically reads and buffers input
  6. Scheduler (a C fn) decides which process is to run next (decision only, i.e., figure out pid)
  7. C fn returns to as fn
  8. As fn starts up (dispatches) new current process (and saves CPU state for old process in old process entry before loading CPU state from new process entry) (often the new process is the same as the old one, so this is a nop)

 

Note that steps 1. and 2. are part of the CPU interrupt cycle, and note that the CPU switches to the kernel stack pointer before pushing these items on the stack, so they end up on the kernel stack.  Steps 3-8 constitute the interrupt handler.

 

Steps 4, 6, and 8 are new to us:

Step 4: Actually this is optional.  Many OS kernels allow interrupt handlers to execute on the kernel stack that the CPU just used to stack the PC.

Step 6: Scheduler loops through process table entries, looking for highest priority ready/running process.  This is preparation for possible preemption.

Step 8: This is the real process switch, where the CPU state is switched, along with the address-space (like a brain transplant).  This may or may not happen, since often the same old process is allowed to continue execution.

 

We can list a step 9 here, where the iret occurs.  In the case of a process switch, this will be executed later (if we are following the lifetime of the old process), after the CPU state is restored for this process.

 

Each process has a kernel stack.  The kernel isn’t a separate program with its own stack. Instead, it separately handles system calls in different processes, so they can operate independently, or nearly so.

 

Each process has a kernel stack  (or more generally, each thread has its own stack)

See handout, add the bouncing ball from the first handout.

 

First consider a single-threaded, or traditional, UNIX or Windows process. It has one user stack in user space, and one kernel stack in kernel space, as shown on the handout.

 

The user stack shows up all by itself, because there is only the one user stack in a single-threaded user image.  Other processes have their own user images, also with one user stack if they are single-threaded.  But the kernel image is shared, so all the kernel stacks must be at various different addresses in the kernel data area.

 

Just like there has to be a separate place for each process to hold its set of saved registers (in its process table entry), each process also needs its own kernel stack, to work as its execution stack when it is executing in the kernel.

 

For example, if a process is doing a read syscall, it is executing the kernel code for read, and needs a stack to do this.  It could block on user input, and give up the CPU, but that whole execution environment held on the stack (and in the saved CPU state in the process table entry) has to be saved for its later use.  Another process could run meanwhile and do its own syscall, and then it needs its own kernel stack, separate from that blocked reader’s stack, to support its own kernel execution. When a process is unblocked, it starts again using the stack where it left off when blocked.  Similarly if preempted.

 

Since threads can also do system calls, each needs a kernel stack as well.

 

In Linux, the process/thread table entry and kernel stack are bundled up in one block of memory for each thread.  Other OS’s organize the memory differently, but still have both of these for each process/thread.

 

Sometimes the kernel stack is completely empty, notably when the process is executing user code.  Then when it does a system call, the kernel stack starts growing, and later shrinking back to nothing at the system call return.

 

The kernel stack (of the currently running process or thread) is also used by interrupt handlers

 

The kernel stack is also used for interrupt handler execution, for the interrupts that occur while a particular thread is running.  As we have talked about already, the interrupts are almost always doing something for another, blocked process/thread.  After all, that process is blocked waiting for something to happen, and all the hardware happenings are signaled by interrupts. 

 

So interrupt handlers “borrow” the current process’s kernel stack to do their own execution.  When they finish, the kernel stack is back to its previous state, empty if the current process is running at user level or non-empty if it was running in some system call.  Note that interrupt handlers are not themselves allowed to block, so their execution is not delayed that way.  They can be involved in process switches, as shown by step 8 above, but only just as they are returning, not in the middle of their work, so their changes to system state are complete. 

 

Kernel code types: Interrupt handler code vs. system call code

We are beginning to see that system call code and interrupt handler code are somewhat differently handled kinds of kernel code. System call code is “normal” kernel code, allowed to block as needed.  Interrupt handler code is special, not allowed to block, so it runs very fast and completes its work, leaving the stack (almost) the way it found it. The “(almost)” comes from the fact that this process can be preempted at the very end of the interrupt handler, leaving a little runt contribution from the interrupt handler on the top of the stack of the preempted process/thread. But this little stack part is cleared naturally when the preempted process is rescheduled.