CS444 Class 12
Handout: User-kernel
transitions, with stacks shown
Reading: Chap 2, Sections
2.1, 2.2: all to pg. 106, skip 2.2.4, read 109-110, skip 2.2.6, .7, .8. read
114-end of 2.2.
Start reading 2.3.
Hw2 points:
Compile/assemble step
produces object files like prog.opc (prog.o on UNIX), which have machine code
for the program but unresolved addresses for external variables and function
entry points.
The final load step resolves
the undefined addresses as it places the modules in various non-overlapping
memory locations. The result is the executable file.
That's how one file's code
can call another's, even across C to assembler or vice versa.
Exit execution: user code,
ulib, syscall, sysentry, syscallc, sysexit, where system is brought down.
Final step is getting back to
Tutor. We need to execute the breakpoint instruction, int $3. Easiest way
is to add a tag to the int $3 in startup0.s, like "_finish: int $3". Needs .globl too. Then from C, call _finish
by doing “finish();” It never returns.
Android Note: Each app
execution on Android gets its own Linux process, and thus its own virtual
machine, and inside that, a Java virtual machine if it’s a Java app. So it’s doubly
sandboxed. At the OS level, the Java system does Linux system calls for all its
system services.
Q: Suppose I’m running an
online banking app, and a downloaded game, back and forth, so they are both
active in memory. Can I be sure the
game, which might be a virus, cannot access the banking app’s memory and get my
bank password or other sensitive info?
A: Yes, each is in its own
process, and thus “bottled up” in its own virtual machine, with no access to
other memory.
Note we have not yet studied everything
relevant here. What if the banking app writes the password to persistent
(flash) memory? Can the virus see that?
No, but how this is prevented is a topic we’ve not yet studied.
See handout, add the bouncing ball from the first handout.
First consider a
single-threaded, or traditional, UNIX or Windows process. It has one user stack
in user space, and one kernel stack in kernel space, as shown on the handout.
The user stack shows up all
by itself, because there is only the one user stack in a single-threaded user
image. Other processes have their own
user images, also with one user stack if they are single-threaded. But the kernel image is shared, so all the
kernel stacks must be at various different addresses in the kernel data area.
Just like there has to be a
separate place for each process to hold its set of saved registers (in its
process table entry), each process also needs its own kernel stack, to work as
its execution stack when it is executing in the kernel.
For example, if a process is
doing a read syscall, it is executing the kernel code for read, and needs a
stack to do this. It could block on user input, and give up the CPU, but
that whole execution environment held on the stack (and in the saved CPU state
in the process table entry) has to be saved for its later use. Another
process could run meanwhile and do its own syscall, and then it needs its own
kernel stack, separate from that blocked reader’s stack, to support its own
kernel execution. When a process is unblocked, it starts again using the stack
where it left off when blocked.
Similarly if preempted.
Since threads can also do
system calls, each needs a kernel stack as well.
In Linux, the process/thread
table entry and kernel stack are bundled up in one block of memory for each
thread. Other OS’s organize the memory differently, but still have both
of these for each process/thread.
Sometimes the kernel stack is completely empty,
notably when the process is executing user code. Then when it does a
system call, the kernel stack starts growing, and later shrinking back to
nothing at the system call return.
The kernel stack (of the
currently running process or thread) is also used by interrupt handlers
The kernel stack is also used
for interrupt handler execution, for the interrupts that occur while a
particular thread is running. As we have talked about already, the
interrupts are almost always doing something for another, blocked
process/thread. After all, that process is blocked waiting for something
to happen, and all the hardware happenings are signaled by interrupts.
So interrupt handlers
“borrow” the current process’s kernel stack to do their own execution.
When they finish, the kernel stack is back to its previous state, empty if the
current process is running at user level or non-empty if it was running in some
system call. Note that interrupt
handlers are not themselves allowed to block, so their execution is not
delayed that way. They can be involved in process switches, as shown by
step 8 above, but only just as they are returning, not in the middle of their
work, so their changes to system state are complete.
Kernel code types: Interrupt handler code vs. system
call code
We are beginning to see that
system call code and interrupt handler code are somewhat differently handled
kinds of kernel code. System call code is “normal” kernel code, allowed to block
as needed. Interrupt handler code is
special, not allowed to block, so it runs very fast and completes its work,
leaving the stack (almost) the way it found it. The “(almost)” comes from the fact
that this process can be preempted at the very end of the interrupt handler,
leaving a little runt contribution from the interrupt handler on the top of the
stack of the preempted process/thread. But this little stack part is cleared
naturally when the preempted process is rescheduled.
So far, we have been mainly
considering single-threaded processes, with one program counter saying
where in the code the CPU is now executing, and one stack. Some programs
have multiple ongoing activities that can benefit from multiple threads.
Example: mtip rewritten
with threads: recall that mtip
has two processes:
The more modern approach
would be to use an explicit thread_create to make a second thread for linemon,
leaving the original thread to run keymon
thread_create(linemon,
...);
Then we end up with two threads, with two stacks and two relevant PCs, one in keymon and one in linemon at each point in time, all in one address space. This is a higher-performance implementation, because it avoids a lot of address-space switches needed between the two processes.
Example: Tan, pg. 98-99
multithreaded web server. Look
at code. This is a good use of threads to allow new requests to come in
and be served quickly while one thread takes some time with a long-winded
request. Note that this works fine on a uniprocessor, as well as on a
multiprocessor. Here threads are being used to run concurrent activities
that are not CPU-intensive, so one CPU can do a lot of simultaneous
requests. There are a lot of missing details here—we will later study
semaphores, etc., that can help flesh this out.
This pseudocode is easily
modified to design other servers that handle multiple requests involving
delay. Another example is a database server, which maintains a thread
pool and a dispatcher to assign a query to a certain worker. The worker
then compiles the query in the query optimizer, and then executes the resulting
query plan in the storage engine.
Another use for threads is
for parallel programming. In
parallel programming, we are actively using multiple CPUs to do a huge job.
Example: ray-tracing for
movies. Each ray can be separately computed, so this is perfectly
parallelizable. There are millions of rays to compute for each frame, so
there is a huge amount of work. We could use 8-CPU systems and compute 8
times faster. Here we would want to use exactly 8 threads.
Programming with threads—options
Warning: if the activities of
the threads interact, i.e., share data, then there can be critical sections
needing mutex. More on this soon.
P. 99: a competing method for
doing some multi-activity programming: use non-blocking i/o, AKA
asynchronous i/o. Here the i/o system calls return immediately,
having only dropped off an i/o request with the OS. The app has to either
check later on the status, using another syscall, or in UNIX, can be notified
by signals. This is used for ex for writing disk blocks in a database server.
There may be 50 concurrent block writes going on, so with a thread approach, we
would need 50 threads to handle it. But it can all be done by one thread
using non-blocking i/o. However, there are not a lot of these examples
that I’ve seen. The only place I’ve seen this in database server
code. Threads are the work horses for multi-activity programming in a
process. You can ignore non-blocking i/o for this course.
2.2.4 pg 106 User-level
threads: This boils down to
writing a (user-level) scheduler inside one thread to implement multiple
threads within it. However, a user-level scheduler cannot get hooks into
the int handlers, so it can’t do preemption. Also, if one of these
threads blocks, they all block. Today’s OS threads are high enough
performance (thanks to the multithreaded web server’s needs and resulting
vendor competition) that we don’t have to go to this extreme. Let’s
ignore user-level threads in our coverage and assume that all threads we talk
about are kernel-supported.
2.2.5, pg. 109 Kernel threads—i.e. “real” threads, that can be preempted and can do
concurrent system calls. This is important.
When I say thread, I mean kernel thread.
2.2.6, 2.2.7, 2.2.8 pg
110-111: “Scheduler activations”—research topic, can skip. “Pop-up
threads”—skip this
Multi-threaded coding is
tricky. Not only can you easily code your own race conditions,
(bugs related to lack of mutex protection of shared variables) but your
libraries can cause thread-related problems. You need to be sure to use
“thread-safe” libraries. The UNIX man pages have info on this, as
“attributes” of various calls, so “man attributes” explains the categories and
lists examples of non-safe calls.
Examples of
non-thread-safe C lib calls: ctime, rand, strtok. Each of these
uses data that can be shared between threads. There are thread-safe
variants of these called ctime_r, rand_r, and strtok_r, that have additional
arguments for the caller to hand over private memory for the call to use.
Note: we have to dig this far
into the C lib to find non-thread-safe calls. All the common calls are
thread-safe: printf, strcpy, malloc, and so on. The C library is
amazingly well designed for multithreading given that it was designed well
before threads were first introduced.
If you do “man ctime” you
will see both versions:
char *ctime(const time_t
*clock);
char *ctime_r(const
time_t *clock, char *buf, int buflen);
ctime accepts a time_t value
and returns a string like “
Question: Where is the memory
buffer holding this string?
--It can’t be on the stack
(here we mean the current thread’s stack), because the ctime code can only put
temporary data on the stack and such data is popped off by the return from
ctime.
--It can’t be malloc’d because
there’s nothing in the man page saying you have to do free(ptr), and otherwise
there would be a memory leak.
So the conclusion is that it
must be in static data owned by the C library. (We know the C library has
other such data, such as the FILE structs for fopen’d files.)
That means ctime composes the
string for you in its own buffer, and passes back the pointer to it for your
access. If two threads use ctime, the internal buffer will get
overwritten, possibly while the first thread is still accessing its copy.
With ctime_r, each thread passes in its own string buffer, so there is no
problem (unless this buffer is the same for both threads, but that’s the
caller’s fault!)
Again, most C library
functions are fine as originally set up: printf, strcpy, malloc, … You can see
none of these holds internal data past one call. “man attributes” lists
only 10 library calls that are thread-unsafe.
The errno problem—C’s one
and only visible global variable (Tan., p. 114)
Luckily, the C lib mostly
uses an API rather than shared memory variables to do its services. errno
is one of the very few exceptions. errno is an int variable (as
originally defined) that gets filled in by the C library at each system call
with the error number. So you can consult errno to find out the error
code for your program’s last system call.
Example of use of errno: from
CMU
page with other examples
errno = 0;
fileptr = fopen(filename, "rb");
if (fileptr == NULL) {
/* An error occurred in fopen(), now it's valid
* to examine errno */
if
(errno != 0) {
/* handle error */
}
Obviously two threads doing
concurrent system calls would both try to fill in the same variable if it stays
a simple global variable. The trick that’s actually in use on Solaris is
to define a macro like this in errno.h:
#ifdef BLAHBLAHBLAH
#define errno (*(___errno())) ßthread-safe way
#else
extern int
errno;
ß old single-threaded way
#endif
Then when you write if
(errno == EINTR) ..., you’re really calling this ___errno function to get the
errno for your thread. It works even when you write
errno = 0; /* reinitialize errno */
That’s why the underlying
function __errno() returns a pointer to the thread-private cell for
errno, not the value. This function has to determine the thread id (via a
system call) and then find its errno spot for that thread.
Note that Tan. discusses
errno on pg. 114, but doesn’t give this practical solution, available on
Solaris UNIX.
Good question: what’s the
story in Linux??