CS444 class 11

Tues, Oct. 10

Hw2 points:

Compile/assemble step produces object files like prog.opc (prog.o on UNIX), which have machine code for the program but unresolved addresses for external variables and function entry points.

The final load step resolves the undefined addresses as it places the modules in various non-overlapping memory locations. The result is the executable file.

That's how one file's code can call another's, even across C to assembler or vice versa.

Exit execution: user code, ulib, syscall, sysentry, syscallc, sysexit, where system is brought down.

Final step is getting back to Tutor. We need to execute the breakpoint instruction, int $3. Easiest way is to add a tag to the int $3 in startup0.s, like "_finish: int $3".

Example: Tan, pg. 88 multithreaded web server. Look at code. This is a good use of threads to allow new requests to come in and be served quickly while one thread takes some time with a long-winded request. Note that this works fine on a uniprocessor, as well as on a multiprocessor. Here threads are being used to run concurrent activities that are not CPU-intensive, so one CPU can do a lot of simultaneous requests. There are a lot of missing details here—we will later study semaphores, etc., that can help flesh this out.

This pseudocode is easily modified to design other servers that handle multiple requests involving delay. Another example is a database server, which maintains a thread pool and a dispatcher to assign a query to a certain worker. The worker then compiles the query in the query optimizer, and then executes the resulting query plan in the storage engine.

Another use for threads is for parallel programming. In parallel programming, we are actively using multiple CPUs to do a huge job.

Example: ray-tracing for movies. Each ray can be separately computed, so this is perfectly parallelizable. There are millions of rays to compute for each frame, so there is a huge amount of work. We could use 8-CPU systems and compute 8 times faster. Here we would want to use exactly 8 threads.

Programming with threads—options

Use Java, with its thread support built in, great for portability.
Use C/C++ with pThreads, the POSIX standard API for threads. Ref: O’Reilly book “PThreads Programming” by Nichols et al.

Warning: if the activities of the threads interact, i.e., share data, then there can be critical sections needing mutex. More on this soon.

P. 89: a competing method for doing some multi-activity programming: use non-blocking i/o, AKA asynchronous i/o. Here the i/o system calls return immediately, having only dropped off an i/o request with the OS. The app has to either check later on the status, using another syscall, or in UNIX, can be notified by signals. This is used for ex for writing disk blocks in a database server. There may be 50 concurrent block writes going on, so with a thread approach, we would need 50 threads to handle it. But it can all be done by one thread using non-blocking i/o. However, there are not a lot of these examples that I’ve seen. The only place I’ve seen this in database server code. Threads are the work horses for multi-activity programming in a process.

User-level threads: This boils down to writing a (user-level) scheduler inside one thread to implement multiple threads within it. However, a user-level scheduler cannot get hooks into the int handlers, so it can’t do preemption. Also, if one of these threads blocks, they all block. Today’s OS threads are high enough performance (thanks to the multithreaded web server’s needs and resulting vendor competition) that we don’t have to go to this extreme. Let’s ignore user-level threads in our coverage and assume that all threads we talk about are kernel-supported.

Kernel threads—i.e. “real” threads, that can be preempted and can do concurrent system calls.

pg 94: “Scheduler activations”—research topic, can skip.

pg. 96: “Pop-up threads”—skip this

Multi-threaded coding is tricky. Not only can you easily code your own race conditions, (bugs related to lack of mutex protection of shared variables) but your libraries can cause thread-related problems. You need to be sure to use “thread-safe” libraries. The UNIX man pages have info on this, as “attributes” of various calls, so “man attributes” explains the categories and lists examples of non-safe calls.

Examples of non-thread-safe C lib calls: ctime, rand, strtok. Each of these uses data that can be shared between threads. There are thread-safe variants of these called ctime_r, rand_r, and strtok_r, that have additional arguments for the caller to hand over private memory for the call to use.

Note: we have to dig this far into the C lib to find non-thread-safe calls. All the common calls are thread-safe: printf, strcpy, malloc, and so on. The C library is amazingly well designed for multithreading given that it was designed well before threads were first introduced.

If you do “man ctime” you will see both versions:

char *ctime(const time_t *clock);

char *ctime_r(const time_t *clock, char *buf, int buflen);

ctime accepts a time_t value and returns a string like “Fri Sep 13 00:00:00 1986\n” (example from man page.) Actually, it returns a pointer to this string.

Question: Where is the memory buffer holding this string?

--It can’t be on the stack (here we mean the current thread’s stack), because the ctime code can only put temporary data on the stack and such data is popped off by the return from ctime.

--It can’t be malloc’d because there’s nothing in the man page saying you have to do free(ptr), and otherwise there would be a memory leak.

So the conclusion is that it must be in static data owned by the C library. (We know the C library has other such data, such as the FILE structs for fopen’d files.)

That means ctime composes the string for you in its own buffer, and passes back the pointer to it for your access. If two threads use ctime, the internal buffer will get overwritten, possibly while the first thread is still accessing its copy. With ctime_r, each thread passes in its own string buffer, so there is no problem (unless this buffer is the same for both threads, but that’s the caller’s fault!)

Again, most C library functions are fine as originally set up: printf, strcpy, malloc, … You can see none of these holds internal data past one call. “man attributes” lists only 10 library calls that are thread-unsafe.

The errno problem—C’s one and only visible global variable

Luckily, the C lib mostly uses a pure API rather than memory variables to do its services. (A “pure” API can still have thread-safe problems as discussed above, i.e., ctime is not “impure” in this sense, since it does its work all via a function call.) errno is one of the very few exceptions. errno is an int variable (as originally defined) that gets filled in by the C library at each system call with the error number. So you can consult errno to find out the error code for your program’s last system call.

Obviously two threads doing concurrent system calls would both try to fill in the same variable if it stays a simple global variable. The trick that’s actually in use on Solaris is to define a macro like this in errno.h: to be continued...