CS341 Notes on Using C to access hardware registers


We have been writing portable programs that use the C library for all their i/o, and running them on UNIX and on the SAPC.  Now we want to dig down to the underlying hardware in the SAPC case.  We are not allowed to do this in the UNIX case.  There the OS runs all the hardware for us and provides us with higher level "system calls" (read, write, etc., see Chap. 8 of K&R if interested) to do i/o.


First we need to know some CPU basics.  See S&S, p. 52 for the 32-bit register set.  We see there are 8 32-bit general registers, plus EIP (also called PC, program counter, by many people), the address of the next instruction to be executed by the CPU, and EFLAGS, the control and status bits for the CPU.  Although the 8 registers are called general registers, in fact some have specialized uses.  EAX is called the accumulator and is used for numerical operations as well as just holding things.  ECX is the count register, but also can just hold things.  See how the first four registers have named parts, AX for the low 16-bits of EAX, and AL for the low 8 bits of EAX, for example—this is where a short and a char would normally be held in this register.


The CPU also has 16-bit-wide segment registers, but we can largely ignore these even though some of them are in fact in use all the time, because they are set up once and for all by the bootup sequence in a very simple way known as a “flat address space.”   If you have never heard of segments, don't worry about this.  Just rest assured that address 1000, as we use it, is referencing the 1000'th byte of physical memory, and so on.


To see the values in the registers using Tutor, use the "rd" command, for register display.


The CPU instruction set has instructions like mov, add, inc (increment), and so on, as we will soon study.  Right now let's consider the i/o instructions.  See S&S, p. 66.  Here are the two instructions for byte i/o:   


in al, dx    (byte input)      out dx, al   (byte output)


In both cases, dx contains a 16-bit port number and al the byte of data coming in from the port or going out through the port.  There are 2**16 = 64K possible port numbers, for 64K different byte-wide ports, to be compared with 2**32 = 4G possible memory locations, all byte-wide as well.  Here byte means 8 bits.  We see the CPU is set up to communicate with two "spaces": a memory space and a port space.  In both cases, a particular computer has only a certain subset of ports in active use, and a certain subset of memory space in active use with real memory under it.  Each device owns a little set of ports, often a block of 4 ports, or 8 ports, or 16 ports.  Two byte-wide ports can be used together as a 16-bit port, or 4 as a 32-bit port, but we won't be actually doing that in this class.  We'll just use byte-wide ports.


You can use Tutor to do the in and out instructions for you:

Tutor> pd 200                 --pd for port-display

0200    32  41  6a  ff  fe ….       

This displays 16 ports: port 0x200, 201, 202, …, 20f, and so does 16 "in" instructions.


Tutor> ps 200 55              --ps for port-set

This does one out instruction to output the byte 55 to port 0x200.


We cannot be sure what we'll see after this in port 200.  We could be writing to a real device or maybe not.  We are definitely not writing to ordinary memory, so we can't expect to see 55 from port 200 on a pd command, (unless we happen to be writing to a read/write device parameter port.)  We could be telling the keyboard to stop working and thus lose contact with the system, requiring reboot.  So it's better to know a little more what we're doing!


The simplest i/o device, the parallel port.


You have located the DB25 connector for the parallel port on the back of the PCs.  It is known as LPT1 to DOS/Windows, so we'll call it that.  It has 25 pins, numbered 1 to 13 down one side and 14 to 25 down the other, so that 1 and 14 are both at one end.  Pins 2 to 9 are the data output pins that carry 8 bits of data out the port to the "parallel printer" that is the normal device used with this port.  Other pins are used for controlling the printer or getting status info from it, or just providing reference voltages, as explained on p. 630.


Each pin, data or status/control, carries a "TTL signal".  As will be explained in lab, this is a system that uses voltages between 0 and 5V to represent two digital logic levels, "logic 0" at low voltages in this range, and "logic 1" at high voltages in this range.  Thus at a certain point in time, we should be able to determine each bit of the 8 bits being output on pins 2-9, and thus know the byte being output to the printer.


In lab 2, you also located chips that are connected to the DB25 connector—these contain the parallel  interface "device", the (small amount) of digital hardware that runs the port.  This device is very simple.  It can't even be turned on and off.  But still it has an important job of holding bits.  It is connected to the CPU by the "bus", a collection of wires, one for each bit, that connect the CPU and its devices.  When the out instruction is used to send a byte to the device, the instruction is all over in under a microsecond, but the signals on the DB25 are expected to be held for at least milliseconds, and maybe hours.  These bit values are held in a "register" in the device.  A register is just a digital circuit that can hold bits until it is told to hold other bits, and can supply the bit values when asked.


Thus we see that even the simplest device is working on its own most of the time, and just occasionally getting commands from the CPU.  The computer system is like a  workgroup, with a boss occasionally giving commands and the workers doing their jobs in between.


The parallel interface device has 3 i/o ports:


port  0x378: data

        0x379: status

        0x37a: control


To send a byte to the printer, simply do an  out instruction with dx containing 0x378 and al containing the byte.  To get the status byte from the printer, do an in instruction with dx containing 0x379, and then looking in CPU register al for the result.  To command the device to change its behavior, for example to "enable interrupts", put the appropriate byte in al and do an out to port 0x37a.


We can do these actions from Tutor, by using  "ps 378 41" to send an 'A'  (ASCII 0x41) to the port, and using "pd 379" to look at the status register, for example.  We can see that the status register is read-only by trying to write to it with ps and seeing that the new value did not take effect by using pd.


In truth, there's more to running a parallel printer than just getting the byte out to the pins.  How could the printer  tell the difference between one E and two Es in a row?  Or when the data stops coming.  The real way that data is output is via a "handshake" mechanism, where the software in the computer puts the byte out and then makes a "strobe" signal via the control port, and then waits until it sees an "acknowledge" signal come back from the printer on the ACK pin, displayed in the status register.  But we'll come back to this later.



Accessing LPT1 from a C program


We can do these actions from C, but we need a little help from assembler because there is no way to get C to generate an in or out instruction (except by using "inline assembler", but that's not C anyway, and has its own complications we will simply avoid by not using it).  So we write C-callable assembler functions that do nothing except the one instruction, in or out, each.  These are prototyped in the header file cpu.h and follow the naming of S&S, p. 169, but not the exact types of operands.  We tend to use chars and 32-bit ints and avoid 16-bit quantities, so instead of two 16-bit arguments for outpt we have one int for the port and one unsigned char for the data going out.  Note that valid port numbers only really use (have bits on in) 16 bits of the int supplied here.


For example, outpt(0x378, 'A');   sends an A character out to the parallel port, and status = inpt(0x378); gets the status register from the PI device.  But we don't like to use these "wired-in" numbers in programming.  We need symbolic names for them.  These are defined in lp.h, as follows:


#define LPT1_BASE 0x378


#define LP_DATA 0

#define LP_STATUS 1

#define LP_CNTRL 2


Here the base port is 0x378 and each of the three LP_ offsets work off this base, so that LPT1_BASE+LP_DATA is the data port, LPT1_BASE+LP_STATUS is the status port, and so on.  This way, we can easily generalize to handle LPT2, LPT3, etc.


The last example, with wired-in numbers replaced, becomes outpt(LPT1_BASE+LP_DATA,'A'); and status = inpt(LPT1_BASE+LP_STATUS);


Now we look at testlp.c from the $pcex directory.  testlp.c "initializes" the device, to make sure it's in a known state, but in fact it will work without this because it's such a simple device.  But this is the normal way to start working with a device, and we see it involves setting the control register in the device via an out instruction to port 0x3fa.  Then the program just loops, getting chars from the user and using them in out instructions to the data port.


Serial Ports, or "COM" ports


You have seen the COM1 and COM2 ports on the PCs, with DB9 and DB25 connectors.  These ports provide the same signals: most of the 25 pins of the DB25 are not in use.  For serial ports, only 1 pin is used for output of data, and one for input.  The signal varies in time to provide the bits, in a way we'll study after assembler.  Again we have the same basic picture as in the parallel port case: the serial interface device, or UART, for COM1 is running the COM1 serial port, and talking to the CPU over the bus, and similarly for COM2.  Again the device contains registers that hold data over  time and communicate it on demand to the CPU.  The device is initialized by writing to its control registers, but we'll skip this part for now and depend on Tutor to have done that for us already.  We'll just consider sending data out and receiving data in from the port.


Again there are certain ports assigned to COM1 and others to COM2, and these have definitions in a header file serial.h.  The important defines for now are:


#define COM1_BASE 0x3f8

#define COM2_BASE 0x2f8


#define UART_RX  0

#define UART_TX 0

#define UART_LSR 5


Then COM1_BASE+UART_RX is the data-in port for COM1, and COM1_BASE+UART_TX is the data-out port for COM1, and COM1_BASE+UART_LSR is the line status register for COM1, and similarly for COM2.  These communicate with three 8-bit registers in the UART, the receiver register, the transmit holding register, and the line status register.  But we see that actually only 2 ports in the port space are being used, 0x3f8 and 0x3f8+5 = 0x3fd.  Port 3f8 handles both byte input from the UART receiver register and byte output to the UART transmit register.


Two UART registers are accessed via one i/o port—how does that work??

This works fine because the UART device knows whether an in or an out is happening, and if it's an in, it supplies the byte from its receiver register, and if it's an out, it accepts the byte and puts it in its transmit holding register.  This does mean that after we've put a byte in the transmit holding register, we can't check it by reading it back, but we wouldn't be able to do that for any determinate length of time anyway—the byte is in flight out through the device.  Similarly, when we input a byte using the in instruction, we are finishing its flight in, and the hardware can put another byte in the register at any time, so we can't expect to be able to check by rereading.


We can use Tutor to do the ins and outs.  Let's use COM2 since it provides the console for the online SAPCs available via mtip.  Thus these experiments can be done under mtip.


Tutor> ps  2f8  41

                  ATutor >          


The A is output "on top" of the carriage return that should have made the next prompt appear at the left margin.  The line feed has gotten out to bring us down to a lower line, but not the carriage return that normally follows it for printing on a serial device.  Note that all the characters in this example have been output via COM2 by Tutor, and all but the A were carefully handled so as not to trample each other.


Tutor> pd 2f8

02f8   0d …

Here we see the last character  received through COM2, the carriage return character that I had just typed and the end of the "pd 2f8" line.  The return key on the keyboard generates a carriage-return <CR> character, ASCII 0x0d, and it is input by the serial port.  When UNIX or the SAPC C library sees the <CR> character, it converts it to '\n' (line-feed, ASCII 0x0a) before it gets to a program, but what we're seeing is what came through the port.  Similarly on output, printf("\n") causes "\n" to go to the C library, and UNIX or SAPC support code convert  the '\n' to <LF><CR> or <CR><LF> to a serial device.


Now we can look at echo.c.  It calls sys_get_console_dev() to find out whether or not the user running the program is connected via a serial port or not.  If the return value is KBMON=0, the answer is no, the user is connected via the PC keyboard and monitor, not a serial port.  If the return value is COM1 = 1 or COM2 = 2, then yes, this program can work to echo characters back to the user.  Then the variable conport is used to hold the base port for the COM port in use.


We can get a character in from a serial port by using inpt(conport+UART_RX)  [note better programming, using UART_RX even though it's 0].  But this will give us whatever is in the receiver register in the UART, often a "stale" character, one that has already be input.  What we really want is a new character typed by the user, no matter how long we have to wait for it, and humans are incredibly slow by 100Mhz CPU scales.  So we need to keep checking the LSR register's DR (data ready) bit until it goes on, notifying us of an unread character in the receiver register.  We end up with the funny-looking loop


      while ((inpt(conport+UART_LSR)&UART_LSR_DR)==0)



This is called a busy wait, because we are "burning CPU" while waiting, spinning in this loop.  But it's not a crime here, because we have nothing else to do.  When we finally exit this loop, there is a char ready for input, and so we do an inpt to get it and then an outpt to echo it back to the user.