Mon., Sept. 16: Programming the Hardware: Using I/O Ports

Reading: Tanenbaum

Previously assigned:Chap 1, specifically, Sec. 1.1, [1.2 optional history], 1.4, 1.5, 1.6, 1.8

Now add Chap. 5, 5.1, 5.2, 5.3 I/O systems

News: 64-bit Android coming! Thanks to Nick Rosato for noting this.

C “Objects”, a quick intro/review.

Here the Queue implementation in C is provided in the queue subdirectory, along with a unit test testqueue.c. queue.h defines the Queue API and the Queue type.

You know how to do objects in Java, what about in C? Recall the Chip object in CS341. Also the Cmd object for tutor.

Example: Rectangle objects

struct rect {

int x1, y1, x2, y2;

};

struct rect *recp;

recp->x1 = 10;

Watch out!, there’s no (validly-allocated) memory pointed to by recp! This is a garbage pointer, a pointer that points nowhere good. Using it usually causes segmentation violations on UNIX and similar exceptions on Windows. But on SAPC, all user memory is writable, so it may quietly work to damage some memory.

Normally under UNIX or Windows, we could malloc space for the Rectangle, but we have no malloc in the SAPC library. Anyway, malloc is quite expensive in CPU, and we can easily avoid using it. Just set up whole objects like this:

struct rect rect;

Now we can put

rect.x1 = 10;

As a Java programmer, you’re used to “Rectangle rect;” meaning rect is a ref to a Rectangle, but in C, rect is the whole object, more like int x; in C or Java.

To make a rectangle pointer, we have to use * in the type:

“struct rect *rp;” defines rp to be a struct rect pointer.

Living without malloc—use C’s ability to place objects in memory directly--

struct rect {

int x1, y1, x2, y2;

};

struct rect r; /* this sets up memory for r “right here” */

r.x1 = 10;

Similarly, we can set up larger objects that contain rectangles:

struct view {

struct rect first, last;

};

struct view v;

v.first.x1 = 10; /* set coord of v’s first rect */

v.first = rect; /* set whole rect by struct copy in C */

With typedef we can drop “struct” in the type name:

typedef struct rect {

int x1, y1, x2, y2;

} Rectangle;

struct rect r; /* one way */

Rectangle r1; /*Nice looking, but different from Java! */

struct view {

Rectangle first, last;

};

Note this typedef syntax used for the supplied Queue in hw1.

From testqueue.c:

Queue q1obj,q2obj; /* gives memory to Queue objects */

…

Queue *q1 = &q1obj; /* pointers to queue mem objects */

Queue *q2 = &q2obj; /* Now use ptrs like Java refs */

…

enqueue(q1, c) /* enqueue a char c */

Setting up whole objects like this may seem strange after using Java a long time, but it’s good to really know C too!

There is more we should cover on this topic. We have no encapsulation here, so calling this an “object” is hard to defend to object-oriented people. It’s an object in a practical sense that it brings together related data, and has an API that describes what can be done with the data—see queue.h.

Hw1: Dataflow to be handled by Queues

Then testio.c does a read(ldev, buf, 10) requesting 10 bytes from the user. It is implemented with interrupts already, but you need to switch it from direct use of rbuf and tbuf to use of Queues from queue.c. Also it doesn’t wait properly for all the chars requested. Note the busy loop before the read—this wastes time so you have a chance to input a few characters before read executes.

Input: chars arrive in the interrupt handler, where they are each enqueued into the input queue. This capability is called “typeahead”, and allows the user to type input early, before the read call happens.

ttyread dequeues each char, and puts it in the user buffer.

What’s the user buffer? That’s described by the parameters of ttyread, or read itself: “char *buf, int nchars”. The app code (here testio.c) is requesting a bufferful of chars by calling read. The code as provided already delivers some chars to the user buffer.

Output: chars arrive from the user, in the user buffer, i.e. buf and nchars as arguments.

The code in ttywrite enqueues them in the output queue.

Eventually, your new interrupt code will dequeue them in the interrupt handler. As provided, the code just outputs them from ttywrite. Don’t worry about this part yet—we will cover interrupts soon.

Programming the Hardware: Using I/O Ports

First note why we want to do this for an OS course: recall that the OS receives work to do via system calls from the running app program, and then does that work by controlling all the hardware. So to write an OS, we need to do hardware programming.

Ref: Notes on using C to access hardware registers, Tan, Sec. 1.3.5, 5.1.5

The PC uses I/O ports for most I/O devices, not memory-mapped I/O. It is capable of memory-mapped I/O, however.

First look at $pcex/echo.c. You can build it the same basic way as test.c. Just have $pcex/makefile as well as echo.c in a directory and “make C=echo”. You’ll get echo.lnx to download.

The (32-bit) x86 architecture specifies 32-bit memory addresses and, separately, 16-bit i/o port numbers. A device is assigned a little block of i/o port numbers for communication over the bus. For example, COM2 has 0x2f8-0x2ff, and COM1 has 0x3f8-0x3ff. This is very standard across all PC models and vendors.

x86 (32-bit) CPU Registers

EAX, 32 bits, AX, its low 16 bits, AL, its low 8 bits, and similarly

EBX, BX, BL,

ECX, CX, CL,

EDX, DX, DL,

a few other general registers

ESP, 32-bit stack pointer

EIP, 32-bit instruction pointer, also known as the program counter or PC for short

EFLAGS, control and status

64-bit: RAX, RBX, etc.

Gnu assembler uses %eax for EAX, %ax, %al, etc. It turns the order of operands around from the original Intel syntax, which is weird, but we’ll use it anyway since it’s the Linux assembler that goes with our software.

For example, the MOV instruction, in Gnu syntax: “ movb (%edx), %al” moves a byte from the address given in %edx to AL, i.e. from the first operand to the second. This same instruction is written in Intel syntax “mov al,[edx]”, i.e., from the second operand to the first.

IN and OUT instructions, the x86 i/o instructions.

In Gnu assembler, for 8-bit i/o, what we’ll be using:

outb %al, %dx

Put 8-bit data to output in %al, i/o port number in %dx (CPU registers)

Execution puts the 8 bits of data in the device register specified by the i/o port number, by sending it over the bus.

inb %dx, %al

Put i/o port number in %dx.

Execution gets the 8 bits of data from the device register specified by the i/o port number, by sending it over the bus, into the CPU register %al.

PC Serial port device (COM1 or COM2, a “UART”)

Each has 8 I/O ports, but luckily we only need to use a few of them.

COM2’s “base port” is 0x2f8. Its 8 ports are 0x2f8, 0x2f9, 0x2fa, ..., 0x2ff. Other devices have other I/O port assignments.

See $pcinc/serial.h for def—

#define COM2_BASE 0x2f8

Each UART has several registers accessible over the bus by various i/o ports:

TX--Transmit register

RX--Receiver register

IER—interrupt enable register

LSR—line status register

others we don’t need to use…

BTW, what’s a register? It’s an array of hardware bits, usually in flip-flops.

You can have a register chip on a breadboard holding bits completely separately from any computer.

Important idea that the UART can have registers and hold “state”, i.e. data, on its own separate from the CPU.

The UART’s base port, 0x2f8, is use for both input and output using the in and out instructions:

“in” from 0x2f8 accesses the UART’s receiver register and delivers a byte to %al in the CPU
“out” to 0x2f8 sends a byte from %al to the UART’s transmit register.

So two registers in the UART are in use via the one i/o port. This is a common trick to save on i/o ports. The UART is fully aware of the difference between a read and a write access over the bus (using the R/W bus line), so no confusion arises.

Luckily we can avoid programming in assembler by writing a C-callable assembler function once and for all to do each of these instructions. These are in the SAPC support library, prototyped in $pcinc/cpu.h:

ch = inpt(port)

outpt(port, ch)

Here ch is an 8-bit quantity, usually an unsigned char. For example, to output ‘A’ to COM2,

outpt(0x2f8, ‘A’);

But we don’t want to use wired-in numbers like this in real programs. We could write:

outpt(COM2_BASE, ‘A’);

This is better but still not perfect—base addresses are usually accompanied with offsets to say which port of the set is being used:

from serial.h: offsets for a PC serial port device

#define UART_RX 0 /* receiver reg */

#define UART_TX 0 /* transmit reg */

#define UART_IER 1 /* interrupt enable reg*/

#define UART_LSR 5 /* line status reg */

So we end up with

outpt(COM2_BASE + UART_TX, ‘A’); /* output ‘A’ to COM2, using TX reg*/

/* using i/o port 0x2f8 + 0 = 0x2f8 */

Similarly, if we know there’s a character ready to be read in from COM2:

ch = inpt (COM2_BASE + UART_RX); /* input char from COM2, using RX */

/* using i/o port 0x2f8 + 0 = 0x2f8 */

Note that this involves communication with two different hardware registers in the serial device. The byte being read or written travels over the bus between the CPU and the device. The device knows when it is selected (by logic sensing the i/o port # on the address bus) and whether it is a read or write over the bus. Thus there is no ambiguity caused by using the same i/o port for both actions.

/* get the current value of the line status register (8 bits) */

stat = inpt (COM2_BASE + UART_LSR);

We usually want a particular bit from this byte, most commonly the DR bit for data-ready (receiver has a char) or THRE (transmitter can take another char). The mask for DR is #defined in serial.h with name UART_LSR_DR, and the mask for THRE is UART_LSR_THRE. These names come from the Linux serial driver sources.

Numbering bits: we count the rightmost bit as bit 0. The highest bit number in a byte is 7, in a 32-bit int it's 31.

Idea of mask for one bit: just that one bit is on. Here are some examples.

hex binary

Mask for bit 0: 0x01 0000 0001

Mask for bit 1: 0x02 0000 0010

Mask for bit 2 0x04 0000 0100

...

Mask for bit 7: 0x80 1000 0000

In general, mask for bit i can be written:

(1 << i)

This takes 1, a mask for bit 0, and left shifts it i bits, making it produce a mask for bit i.

From serial.h: masks for status bits:

#define UART_LSR_THRE 0x20 /* Transmit-hold-register empty */

…

#define UART_LSR_DR 0x01 /* Receiver data ready */

We can test the one bit we want in stat by bitwise-anding it with the mask for the bit we want. Thus for the DR bit, we form

stat & UART_LSR_DR

and this is a quantity which is either 0 or has the one bit on, i.e., equals UART_LSR_DR, the one-bit mask.

True and false in C: non-0 and 0

Recall that in C, there is no Boolean type, and instead we use ints with 0 representing FALSE, and any non-0 value representing TRUE.

Thus the expression (stat & UART_LSR_DR) is TRUE if the DR bit is on and FALSE otherwise.

The upshot is that can write these basic tests on COM2's status:

inpt(COM2_BASE + UART_LSR) & UART_LSR_DR is TRUE or FALSE depending
if the receiver is ready

inpt(COM2_BASE + UART_LSR) & UART_LSR_THRE is TRUE or FALSE depending
if the transmitter is ready

We can make a loop of inpt's testing the DR bit and thus wait for the UART to get a new character. Or a loop of the THRE inpt's to wait for the transmitter to be ready for another byte. For THRE:

while (inpt(COM2_BASE + UART_LSR)& UART_THRE) == 0)

; /* not ready yet, keep trying */

/* here when transmitter is ready, OK to output another character */

outpt(COM2_BASE + UART_TX, ch);

This is like the code of Figure 5-8 in Tanenbaum, pg. 346, except that we use the IN instruction to get the device's statis register value instead of memory-mapped i/o.

Idea of “programmed I/O”

When the CPU loops testing a device ready bit, waiting for the device to produce or accept data, we call it “programmed I/O”, or “polling for data”. The loop is called a “busy loop” or a “polling loop.” The other alternative is interrupt-driven I/O. Polling I/O is simpler and is commonly used in programs that don’t utilize an OS for I/O, for example, all ordinary programs on the SAPC. When you write printf(“hi”) in the SAPC environment, printf calls putc in Tutor, which does a polling loop on COM2 (or whatever the console device is.) OS drivers usually use interrupt-driven I/O, so as not to waste CPU in busy loops.

Look at echo.c. There you see a polling loop for input. Surprisingly, there is no polling loop for output, but this is a special case where the output is slowed down so much by the input that the transmitter will always be ready when it’s used.

We have looked at COM2 here. COM1 is exactly the same device, so just use COM1_BASE instead of COM2_BASE in the examples above. Also read about LPT1 in the linked document above.

Next time: x86 interrupts