Tues., Oct. 3

hw2: System Calls

 

Look at details of system call mechanism, like Tan. pg. 46 but using the Linux syscall linkage we will use in hw2:

 

The system call instruction is a certain instruction in the CPU instruction set, designed for use as a system call by the CPU designer, i.e. Intel for x86, Sun for Sparc.  On x86, it’s the int instruction, on Sparc, it’s the ta (trap always) instruction.  Its job is to cause execution to jump out of user execution into the kernel in a safe way, much like an interrupt causes kernel execution of the interrupt handler between two instructions in user code (interrupts can also happen in kernel code.)

 

Recall slide 6 from the interrupts handout: CPU Interrupt Handling, with a few additions here:

 

The CPU Interrupt cycle--

Then after the interrupt cycle, the new EIP value makes the ISR (int. handler) run--

ISR/interrupt handler (same thing) services the interrupt, including resetting the interrupt controller; ends with a special instruction “iret” on x86 to restore previously saved state and resume from point of interrupt

 

The system call instruction causes the CPU to go through its trap cycle, just like its interrupt cycle, except that nn is obtained from the int instruction operand (0x80 here), and IF is not changed.  Instead of “interrupt handler”, we say “trap handler”.

 

The CPU Trap cycle—execution of int $nn--

Then after the trap cycle, the new EIP value makes the trap handler run--

The trap handler services the system call and ends with a special instruction “iret” on x86 to restore previously saved state and continue execution just after the int $nn in user code.

Note on hw2 vs. real OS

Note that in hw2, all our execution, in user or kernel code, is executed in kernel mode of the CPU.  The int 0x80 instruction still traps, but doesn't change the CPU execution mode.  In a real OS, the user code executes in user mode on the CPU, disallowing all privileged instructions.  The int instruction (or ta on Sparc, etc.) changes the CPU mode from user to kernel, and the iret (or equivalent) reverts it to user mode at the end of the system call.  This isolation of the user code in user mode is a powerful methodology for system security.  Even "privileged users" such as root on UNIX and Administrator on Windows, are executing code in user mode on the CPU.  Only the kernel code is allowed to execute in kernel mode.


Steps in system call execution

As in Tan, pg. 46, consider read(dev, buf, nbytes) being executed (dev instead of fd for hw2)

 

  1. User code in C has read(dev, buf, nbytes), calls read in user library (ulib.s in hw2)
  2. Lib as fn copies dev, buf, nbytes values off stack into registers, puts 3 in eax as syscall number
  3. Trap to kernel by executing int $0x80
  4. CPU trap cycle for int instruction: save CPU state on stack, get new EIP from trap vector in IDT[0x80].  This address is _syscallhand, the kernel’s system call handler in hw2.
  5. _syscallhand is the entry point in the as syscall handler—this fn executes, saves registers on the stack, calls C syscall handler, _syscallc.
  6. syscallc is in tunix.c, the main C code for the kernel in hw2.  It accesses the saved registers via its args (more on this below), determines from the syscall number that it should call sysread.  sysread code = read code from hw1.  sysread executes, does the input, returns to syscallc.
  7. return to _syscallhand, restore registers (including setting eax for return value), do final iret.
  8. iret restores EIP and EFLAGS, causing transition back to user mode and poised to execute next instruction after int $0x80.
  9. finish lib as fn, which ends with “ret”.  Ret restores saved EIP, causing function return.
  10. return to user C code that called read, just after the call to read.

 

Assembler/C argument passing:  In hw2, we need to go from C to assembler in user code (calling into ulib.s), assembler to C in kernel code (calling the C system call handler from the assembler envelope.)

 

Consider user C code with  write(dev, buf, nbytes)”  Note that this C code is compiled into assembler code like this:

     pushl nbytes

     pushl buf

     pushl dev

     call _write     (which pushl’s the return address)

 

On the x86, the stack grows to lower addresses, so the nbytes value is at the highest address, buf lower, dev lower than that, and  finally the return address from the call is at the lowest address, the current top of stack pointed to by esp.  Thus we have this picture, where addresses go from top to bottom:

 

user stack when execution reaches _write:

esp->  return addr

       first arg  (to be put in ebx for syscall)

       second arg (to be put in ecx)

       third arg  (to be put in edx)

 

Now the _write function in ulib.s needs to pull the 3 args off the stack and put them in ebx, ecx, and edx, and put the syscall number, 4 in eax, and then do “int $0x80”.

 

We need to preserve register ebx to coexist with C, but are allowed to use the other three (the C “scratch registers” eax, ecx and edx) at will.  Thus we need to save the caller’s ebx value at the start (by pushing it on the stack) and restore it at the end of this function.  That push of ebx changes the picture of the stack to the following:

 

# user stack after pushl %ebx, needed to preserve %ebx (not a C scratch reg)

# esp->  saved-ebx

# 4(esp) return addr

# 8(esp) first arg  (to be put in ebx for syscall)

#12(esp) second arg (to be put in ecx)

#16(esp) third arg  (to be put in edx)

 

_write: pushl %ebx                    # save the value of ebx

        movl 8(%esp),%ebx             # first arg in ebx

        movl 12(%esp),%ecx            # second arg in ecx

        movl 16(%esp),%edx            # third arg in edx

        movl $4,%eax                  # syscall # in eax

        int $0x80                     # trap to kernel

        popl  %ebx                    # restore the value of ebx

        ret

 

It should be easy for you to add _read and _exit following this pattern.

 

F05 Note: the following was covered on Thursday, Oct. 6 (see next file for rest of Oct. 6 class, on Chap. 2)

After the trap cycle, execution arrives at _syscallhand in sysentry.s with the args and syscall # in the registers.  We need to provide this info to the C system call handler.  Luckily there is an old trick to help us out, using the fact that C passes arguments on the stack for this platform.  If we push the register values on the stack and then call syscallc, syscallc will be called the same way it would normally be called by another C fn.  Thus all we need to do is provide syscallc with 4 parameters and we can use those names to access the spots on the stack we have created by pushing the 4 saved-registers.

 

_syscallhand: pushl %edx      # third arg first

      pushl %ecx

      pushl %ebx

      pushl %eax              # syscall #

      call _syscallc

      popl %eax  # possibly new value for eax

      ...   3 more popls

      iret

 

void syscallc(int user_eax, int arg1, int arg2, int arg3)

{

    /* temporary code to show values */

    kprintf(syscallc: syscall#=%d, arg1=%d, arg2=%x, arg3=%d\n”,

                  user_eax, arg1, arg2, arg3);

Note that hw2.txt has a suggested plan of development, i.e., steps to follow, at its end.