CS444 Class 11

Handout: User-kernel transitions, with stacks shown

Reading: Chap 2, Sections 2.1, 2.2: all to pg. 106, skip 2.2.4, read 109-110, skip 2.2.6, .7, .8. read 114-end of 2.2.

Start reading 2.3.

Hw2 points:

Please use specified filenames, so our collection scripts will work!

Compile/assemble step produces object files like prog.opc (prog.o on UNIX), which have machine code for the program but unresolved addresses for external variables and function entry points.

The final load step resolves the undefined addresses as it places the modules in various non-overlapping memory locations. The result is the executable file.

That's how one file's code can call another's, even across C to assembler or vice versa.

Hw2 System call write example:

user program calls write in ulib.s

write does the syscall instruction (int $0x80)

----user-kernel transition----

syscall.s executes, calls syscallc

syscallc executes, “dispatches” (calls) syswrite (old write, renamed)

syswrite calls ttywrite

ttywrite executes as in hw1

Return, return, return, back to syscall.s

iret

----kerner-user transition----

Back to next instruction after syscall instruction

Return to user code that called write

Exit execution: user code, ulib, syscall, sysentry, syscallc, sysexit, where the system is brought down.

Final step is getting back to Tutor. We need to execute the breakpoint instruction, int $3. Easiest way is to add a tag to the int $3 in startup0.s, like "_finish: int $3". Needs .globl too. Then from C, call _finish by doing “finish();” It never returns.

Each process has a kernel stack (or more generally, each thread has its own stack)

See handout, add arrows to show transitions (like a bouncing ball)

First consider a single-threaded, or traditional, UNIX or Windows process. It has one user stack in user space, and one kernel stack in kernel space, as shown on the handout.

The user stack shows up all by itself, because there is only the one user stack in a single-threaded user image. Other processes have their own user images, also with one user stack if they are single-threaded. But the kernel image is shared, so all the kernel stacks must be at various different addresses in the kernel data area.

Just like there has to be a separate place for each process to hold its set of saved registers (in its process table entry), each process also needs its own kernel stack, to work as its execution stack when it is executing in the kernel.

For example, if a process is doing a read syscall, it is executing the kernel code for read, and needs a stack to do this. It could block on user input, and give up the CPU, but that whole execution environment held on the stack (and in the saved CPU state in the process table entry) has to be saved for its later use. Another process could run meanwhile and do its own syscall, and then it needs its own kernel stack, separate from that blocked reader’s stack, to support its own kernel execution. When a process is unblocked, it starts again using the stack where it left off when blocked. Similarly if preempted.

Since threads can also do system calls, each needs a kernel stack as well.

In Linux, the process/thread table entry and kernel stack are bundled up in one block of memory for each thread. Other OS’s organize the memory differently, but still have both of these for each process/thread.

Sometimes the kernel stack is completely empty, notably when the process is executing user code. Then when it does a system call, the kernel stack starts growing, and later shrinking back to nothing at the system call return.

The kernel stack (of the currently running process or thread) is also used by interrupt handlers

The kernel stack is also used for interrupt handler execution, for the interrupts that occur while a particular thread is running. As we have talked about already, the interrupts are almost always doing something for another, blocked process/thread. After all, that process is blocked waiting for something to happen, and all the hardware happenings are signaled by interrupts.

So interrupt handlers “borrow” the current process’s kernel stack to do their own execution. When they finish, the kernel stack is back to its previous state, empty if the current process is running at user level or non-empty if it was running in some system call. Note that interrupt handlers are not themselves allowed to block, so their execution is not delayed that way. They can be involved in process switches, as shown by step 8 above, but only just as they are returning, not in the middle of their work, so their changes to system state are complete.

Kernel code types: Interrupt handler code vs. system call code

We are beginning to see that system call code and interrupt handler code are somewhat differently handled kinds of kernel code. System call code is “normal” kernel code, allowed to block as needed. Interrupt handler code is special, not allowed to block, so it runs very fast and completes its work, leaving the stack (almost) the way it found it. The “(almost)” comes from the fact that this process can be preempted at the very end of the interrupt handler, leaving a little runt contribution from the interrupt handler on the top of the stack of the preempted process/thread. But this little stack part is cleared naturally when the preempted process is rescheduled.

Threads

So far, we have been mainly considering single-threaded processes, with one program counter saying where in the code the CPU is now executing, and one stack. Some programs have multiple ongoing activities that can benefit from multiple threads.

Example: mtip rewritten with threads: recall that mtip has two processes:

running the keymon fn, reads from user
running the linemon fn, reads from SAPC.

The more modern approach would be to use an explicit thread_create to make a second thread for linemon, leaving the original thread to run keymon

thread_create(linemon, ...);

Then we end up with two threads, with two stacks and two relevant PCs, one in keymon and one in linemon at each point in time, all in one address space. This is a higher-performance implementation, because it avoids a lot of address-space switches needed between the two processes.

Example: Tan, pg. 98-99 multithreaded web server. Look at code at end of handout. This is a good use of threads to allow new requests to come in and be served quickly while one thread takes some time with a long-winded request. Note that this works fine on a uniprocessor, as well as on a multiprocessor. Here threads are being used to run concurrent activities that are not CPU-intensive, so one CPU can do a lot of simultaneous requests. There are a lot of missing details here—we will later study semaphores, etc., that can help flesh this out.

This pseudocode is easily modified to design other servers that handle multiple requests involving delay. Another example is a database server, which maintains a thread pool and a dispatcher to assign a query to a certain worker. The worker then compiles the query in the query optimizer, and then executes the resulting query plan in the storage engine.

Another use for threads is for parallel programming. In parallel programming, we are actively using multiple CPUs to do a huge job.

Example: ray-tracing for movies. Each ray can be separately computed, so this is perfectly parallelizable. There are millions of rays to compute for each frame, so there is a huge amount of work. We could use 8-CPU systems and compute 8 times faster. Here we would want to use exactly 8 threads.

Programming with threads—options

Use Java, with its thread support built in, great for portability, and performs well.
Use C/C++ with POSIX Threads, the POSIX standard API for threads. Ref: O’Reilly book “PThreads Programming” by Nichols et al.

Warning: if the activities of the threads interact, i.e., share data, then there can be critical sections needing mutex. More on this soon.

P. 99: a competing method for doing some multi-activity programming: use non-blocking i/o, AKA asynchronous i/o. Here the i/o system calls return immediately, having only dropped off an i/o request with the OS. The app has to either check later on the status, using another syscall, or in UNIX, can be notified by signals. This is used for ex for writing disk blocks in a database server. There may be 50 concurrent block writes going on, so with a thread approach, we would need 50 threads to handle it. But it can all be done by one thread using non-blocking i/o. However, there are not a lot of these examples that I’ve seen. The only place I’ve seen this in database server code. Threads are the work horses for multi-activity programming in a process. You can ignore non-blocking i/o for this course.

2.2.4 pg 106 User-level threads: This boils down to writing a (user-level) scheduler inside one thread to implement multiple threads within it. However, a user-level scheduler cannot get hooks into the int handlers, so it can’t do preemption. Also, if one of these threads blocks, they all block. Today’s OS threads are high enough performance (thanks to the multithreaded web server’s needs and resulting vendor competition) that we don’t have to go to this extreme. Let’s ignore user-level threads in our coverage and assume that all threads we talk about are kernel-supported.

2.2.5, pg. 109 Kernel threads—i.e. “real” threads, that can be preempted and can do concurrent system calls. This is important. When I say thread, I mean kernel thread.

2.2.6, 2.2.7, 2.2.8 pg 110-111: “Scheduler activations”—research topic, can skip. “Pop-up threads”—skip this

Multi-threaded coding is tricky. Not only can you easily code your own race conditions, (bugs related to lack of mutex protection of shared variables) but your libraries can cause thread-related problems. You need to be sure to use “thread-safe” libraries. The UNIX man pages have info on this, as “attributes” of various calls, so “man attributes” explains the categories and lists examples of non-safe calls.

Examples of non-thread-safe C lib calls: ctime, rand, strtok. Each of these uses data that can be shared between threads. There are thread-safe variants of these called ctime_r, rand_r, and strtok_r, that have additional arguments for the caller to hand over private memory for the call to use.

Note: we have to dig this far into the C lib to find non-thread-safe calls. All the common calls are thread-safe: printf, strcpy, malloc, and so on. The C library is amazingly well designed for multithreading given that it was designed well before threads were first introduced.

If you do “man ctime” you will see both versions:

char *ctime(const time_t *clock);

char *ctime_r(const time_t *clock, char *buf, int buflen);

ctime accepts a time_t value and returns a string like “Fri Sep 13 00:00:00 1986\n” (example from man page.) Actually, it returns a pointer to this string.

Question: Where is the memory buffer holding this string?

--It can’t be on the stack (here we mean the current thread’s stack), because the ctime code can only put temporary data on the stack and such data is popped off by the return from ctime.

--It can’t be malloc’d because there’s nothing in the man page saying you have to do free(ptr), and otherwise there would be a memory leak.

So the conclusion is that it must be in static data owned by the C library. (We know the C library has other such data, such as the FILE structs for fopen’d files.)

That means ctime composes the string for you in its own buffer, and passes back the pointer to it for your access. If two threads use ctime, the internal buffer will get overwritten, possibly while the first thread is still accessing its copy. With ctime_r, each thread passes in its own string buffer, so there is no problem (unless this buffer is the same for both threads, but that’s the caller’s fault!)

Again, most C library functions are fine as originally set up: printf, strcpy, malloc, … You can see none of these holds internal data past one call. “man attributes” lists only 10 library calls that are thread-unsafe.

The errno problem—C’s one and only visible global variable (Tan., p. 114)

Luckily, the C lib mostly uses an API rather than shared memory variables to do its services. errno is one of the very few exceptions. errno is an int variable (as originally defined) that gets filled in by the C library at each system call with the error number. So you can consult errno to find out the error code for your program’s last system call.

Example of use of errno: from CMU page with other examples

errno = 0;

fileptr = fopen(filename, "rb");

if (fileptr == NULL) {

/* An error occurred in fopen(), now it's valid

* to examine errno */

if (errno != 0) {

/* handle error */

}

Obviously two threads doing concurrent system calls would both try to fill in the same variable if it stays a simple global variable. The trick that’s actually in use on Solaris is to define a macro like this in errno.h:

#ifdef BLAHBLAHBLAH

#define errno (*(___errno())) ßthread-safe way

#else

extern int errno; ß old single-threaded way

#endif

Then when you write if (errno == EINTR) ..., you’re really calling this ___errno function to get the errno for your thread. It works even when you write

errno = 0; /* reinitialize errno */

That’s why the underlying function __errno() returns a pointer to the thread-private cell for errno, not the value. This function has to determine the thread id (via a system call) and then find its errno spot for that thread.

Note that Tan. discusses errno on pg. 114, but doesn’t give this practical solution, available on Solaris UNIX.

Good question: what’s the story in Linux??