CS444 Class 11
Handout: User-kernel
transitions, with stacks shown
Reading: Chap 2, Sections
2.1, 2.2: all to pg. 106, skip 2.2.4, read 109-110, skip 2.2.6, .7, .8. read
114-end of 2.2.
Start reading 2.3.
Hw2 points:
Please use specified
filenames, so our collection scripts will work!
Compile/assemble step produces
object files like prog.opc (prog.o on UNIX), which have machine code for the
program but unresolved addresses for external variables and function entry
points.
The final load step resolves
the undefined addresses as it places the modules in various non-overlapping
memory locations. The result is the executable file.
That's how one file's code
can call another's, even across C to assembler or vice versa.
Hw2 System call write example:
user program calls write in
ulib.s
write does the syscall
instruction (int $0x80)
----user-kernel
transition----
syscall.s executes, calls
syscallc
syscallc executes, “dispatches”
(calls) syswrite (old write, renamed)
syswrite calls ttywrite
ttywrite executes as in hw1
Return, return, return, back
to syscall.s
iret
----kerner-user
transition----
Back to next instruction
after syscall instruction
Return to user code that
called write
Exit execution:
user code, ulib, syscall, sysentry, syscallc, sysexit, where the system is
brought down.
Final step is getting back to
Tutor. We need to execute the breakpoint instruction, int $3. Easiest way
is to add a tag to the int $3 in startup0.s, like "_finish: int
$3". Needs .globl too. Then from C,
call _finish by doing “finish();” It never
returns.
See handout, add arrows to show transitions (like a bouncing ball)
First consider a
single-threaded, or traditional, UNIX or Windows process. It has one user stack
in user space, and one kernel stack in kernel space, as shown on the handout.
The user stack shows up all
by itself, because there is only the one user stack in a single-threaded user
image. Other processes have their own
user images, also with one user stack if they are single-threaded. But the kernel image is shared, so all the
kernel stacks must be at various different addresses in the kernel data area.
Just like there has to be a
separate place for each process to hold its set of saved registers (in its
process table entry), each process also needs its own kernel stack, to work as
its execution stack when it is executing in the kernel.
For example, if a process is
doing a read syscall, it is executing the kernel code for read, and needs a
stack to do this. It could block on user input, and give up the CPU, but
that whole execution environment held on the stack (and in the saved CPU state
in the process table entry) has to be saved for its later use. Another
process could run meanwhile and do its own syscall, and then it needs its own
kernel stack, separate from that blocked reader’s stack, to support its own
kernel execution. When a process is unblocked, it starts again using the stack
where it left off when blocked.
Similarly if preempted.
Since threads can also do
system calls, each needs a kernel stack as well.
In Linux, the process/thread table
entry and kernel stack are bundled up in one block of memory for each
thread. Other OS’s organize the memory differently, but still have both
of these for each process/thread.
Sometimes the kernel stack is completely empty,
notably when the process is executing user code. Then when it does a
system call, the kernel stack starts growing, and later shrinking back to
nothing at the system call return.
The kernel stack (of the
currently running process or thread) is also used by interrupt handlers
The kernel stack is also used
for interrupt handler execution, for the interrupts that occur while a
particular thread is running. As we have talked about already, the
interrupts are almost always doing something for another, blocked
process/thread. After all, that process is blocked waiting for something
to happen, and all the hardware happenings are signaled by interrupts.
So interrupt handlers
“borrow” the current process’s kernel stack to do their own execution.
When they finish, the kernel stack is back to its previous state, empty if the
current process is running at user level or non-empty if it was running in some
system call. Note that interrupt
handlers are not themselves allowed to block, so their execution is not
delayed that way. They can be involved in process switches, as shown by
step 8 above, but only just as they are returning, not in the middle of their
work, so their changes to system state are complete.
Kernel code types: Interrupt handler code vs. system
call code
We are beginning to see that
system call code and interrupt handler code are somewhat differently handled
kinds of kernel code. System call code is “normal” kernel code, allowed to
block as needed. Interrupt handler code
is special, not allowed to block, so it runs very fast and completes its work,
leaving the stack (almost) the way it found it. The “(almost)” comes from the
fact that this process can be preempted at the very end of the interrupt
handler, leaving a little runt contribution from the interrupt handler on the
top of the stack of the preempted process/thread. But this little stack part is
cleared naturally when the preempted process is rescheduled.
So far, we have been mainly
considering single-threaded processes, with one program counter saying where
in the code the CPU is now executing, and one stack. Some programs have
multiple ongoing activities that can benefit from multiple threads.
Example: mtip rewritten
with threads: recall that mtip
has two processes:
The more modern approach
would be to use an explicit thread_create to make a second thread for linemon,
leaving the original thread to run keymon
thread_create(linemon,
...);
Then we end up with two threads, with two stacks and two relevant PCs, one in keymon and one in linemon at each point in time, all in one address space. This is a higher-performance implementation, because it avoids a lot of address-space switches needed between the two processes.
Example: Tan, pg. 98-99
multithreaded web server. Look
at code at end of handout. This is a good use of threads to allow new
requests to come in and be served quickly while one thread takes some time with
a long-winded request. Note that this works fine on a uniprocessor, as
well as on a multiprocessor. Here threads are being used to run
concurrent activities that are not CPU-intensive, so one CPU can do a lot of
simultaneous requests. There are a lot of missing details here—we will
later study semaphores, etc., that can help flesh this out.
This pseudocode is easily
modified to design other servers that handle multiple requests involving
delay. Another example is a database server, which maintains a thread
pool and a dispatcher to assign a query to a certain worker. The worker
then compiles the query in the query optimizer, and then executes the resulting
query plan in the storage engine.
Another use for threads is
for parallel programming. In
parallel programming, we are actively using multiple CPUs to do a huge job.
Example: ray-tracing for
movies. Each ray can be separately computed, so this is perfectly
parallelizable. There are millions of rays to compute for each frame, so
there is a huge amount of work. We could use 8-CPU systems and compute 8
times faster. Here we would want to use exactly 8 threads.
Programming with threads—options
Warning: if the activities of
the threads interact, i.e., share data, then there can be critical sections
needing mutex. More on this soon.
P. 99: a competing method for
doing some multi-activity programming: use non-blocking i/o, AKA
asynchronous i/o. Here the i/o system calls return immediately,
having only dropped off an i/o request with the OS. The app has to either
check later on the status, using another syscall, or in UNIX, can be notified
by signals. This is used for ex for writing disk blocks in a database
server. There may be 50 concurrent block writes going on, so with a
thread approach, we would need 50 threads to handle it. But it can all be
done by one thread using non-blocking i/o. However, there are not a lot
of these examples that I’ve seen. The only place I’ve seen this in
database server code. Threads are the work horses for multi-activity
programming in a process. You can ignore non-blocking i/o for this course.
2.2.4 pg 106 User-level
threads: This boils down to
writing a (user-level) scheduler inside one thread to implement multiple
threads within it. However, a user-level scheduler cannot get hooks into
the int handlers, so it can’t do preemption. Also, if one of these
threads blocks, they all block. Today’s OS threads are high enough
performance (thanks to the multithreaded web server’s needs and resulting
vendor competition) that we don’t have to go to this extreme. Let’s
ignore user-level threads in our coverage and assume that all threads we talk
about are kernel-supported.
2.2.5, pg. 109 Kernel threads—i.e. “real” threads, that can be preempted and can do
concurrent system calls. This is important.
When I say thread, I mean kernel thread.
2.2.6, 2.2.7, 2.2.8 pg
110-111: “Scheduler activations”—research topic, can skip. “Pop-up
threads”—skip this
Multi-threaded coding is
tricky. Not only can you easily code your own race conditions,
(bugs related to lack of mutex protection of shared variables) but your
libraries can cause thread-related problems. You need to be sure to use
“thread-safe” libraries. The UNIX man pages have info on this, as
“attributes” of various calls, so “man attributes” explains the categories and
lists examples of non-safe calls.
Examples of
non-thread-safe C lib calls: ctime, rand, strtok. Each of these
uses data that can be shared between threads. There are thread-safe
variants of these called ctime_r, rand_r, and strtok_r, that have additional
arguments for the caller to hand over private memory for the call to use.
Note: we have to dig this far
into the C lib to find non-thread-safe calls. All the common calls are
thread-safe: printf, strcpy, malloc, and so on. The C library is
amazingly well designed for multithreading given that it was designed well
before threads were first introduced.
If you do “man ctime” you
will see both versions:
char *ctime(const time_t
*clock);
char *ctime_r(const
time_t *clock, char *buf, int buflen);
ctime accepts a time_t value
and returns a string like “
Question: Where is the memory
buffer holding this string?
--It can’t be on the stack
(here we mean the current thread’s stack), because the ctime code can only put
temporary data on the stack and such data is popped off by the return from
ctime.
--It can’t be malloc’d
because there’s nothing in the man page saying you have to do free(ptr), and
otherwise there would be a memory leak.
So the conclusion is that it
must be in static data owned by the C library. (We know the C library has
other such data, such as the FILE structs for fopen’d files.)
That means ctime composes the
string for you in its own buffer, and passes back the pointer to it for your
access. If two threads use ctime, the internal buffer will get
overwritten, possibly while the first thread is still accessing its copy.
With ctime_r, each thread passes in its own string buffer, so there is no
problem (unless this buffer is the same for both threads, but that’s the
caller’s fault!)
Again, most C library
functions are fine as originally set up: printf, strcpy, malloc, … You can see
none of these holds internal data past one call. “man attributes” lists
only 10 library calls that are thread-unsafe.
The errno problem—C’s one
and only visible global variable (Tan., p. 114)
Luckily, the C lib mostly
uses an API rather than shared memory variables to do its services. errno
is one of the very few exceptions. errno is an int variable (as
originally defined) that gets filled in by the C library at each system call
with the error number. So you can consult errno to find out the error
code for your program’s last system call.
Example of use of errno: from
CMU
page with other examples
errno = 0;
fileptr = fopen(filename, "rb");
if (fileptr == NULL) {
/* An error occurred in fopen(), now it's valid
* to examine errno */
if
(errno != 0) {
/* handle error */
}
Obviously two threads doing
concurrent system calls would both try to fill in the same variable if it stays
a simple global variable. The trick that’s actually in use on Solaris is
to define a macro like this in errno.h:
#ifdef BLAHBLAHBLAH
#define errno (*(___errno())) ßthread-safe way
#else
extern int
errno;
ß old single-threaded way
#endif
Then when you write if
(errno == EINTR) ..., you’re really calling this ___errno function to get the
errno for your thread. It works even when you write
errno = 0; /* reinitialize errno */
That’s why the underlying
function __errno() returns a pointer to the thread-private cell for
errno, not the value. This function has to determine the thread id (via a
system call) and then find its errno spot for that thread.
Note that Tan. discusses
errno on pg. 114, but doesn’t give this practical solution, available on
Solaris UNIX.
Good question: what’s the
story in Linux??