Daily Course Notes for CS240
Computer Architecture 1
Patrick E. O'Neil
Class 19. Administer Quiz 2
Starting Chapter 7. We will be defining the standard library for ANSI C, available on any UNIX system, portable to other OS's (NT).
You will be responsible for having Glass UNIX in class; questions from Glass on Quizzes and Exams.
Full library function specifications are available in K&R Appendix B, Section B.1. You are responsible for all specifications in Appendix B.
Note: all header files for libraries (stdio.h, string.h, ctype.h, etc.) are online in /usr/include. You should cd to this directory and nose around.
Here's an important command you should know: grep. Read about it on Glass UNIX manual (look up in index, bold page number shows definition).
%grep -hilnvw pattern {fileName}*
This is a command to search for a pattern in a list of files. It's like the program we wrote to illustrate command line argument use. (The example for that was: find -xn pattern. But use -v instead of -x in grep.)
In grep, look for and print out lines in a file or list of files that contain match to pattern.. Each file encountered is named in output unless -h option is given (in which case fname is not listed in output!).
A list of file names delimited by spaces is what is meant by the argument {fileName}*. We could write:
%grep US studentfile staffile facultyfile (3 files to search, pattern: US)
In the command: grep printf cs240/hw5/*.c, the *.c gets expanded to a list of files by the shell and grep looks at them one by one.
Options: -n means give line numbers, -i means ignore case ("And" matches "and"), -l displays only filenames that contain pattern, -v displays lines that DON'T match pattern, -w means only matching for whole words count.
Use grep to search for function names in header files.
%grep isupper /usr/include/*
Note, if no filenames are specified, grep will search stdin. Common idea in C, useful for "pipes", explained below.
We can use a wildcard specification of a pattern in grep, called "Regular Expressions". See pg 606 of Glass (2nd Edition, see index in 1st). Note that:
%grep U.S. studentfile staffile facultyfile
would use wildcards, since the character period, ".", stands for any single character, so we need to "quote" each period with a "\" sign.
But the "\" sign itself will be interpreted by the shell so the "." will be passed, and "." will continue to be interpreted by grep as "any character".
To get the "\" sign seen by grep, you would need to write:
%grep U\\.S\\. studentfile staffile facultyfile
You need to VERY defensive using grep with regular expressions. (Read about in Glass. more on this later.)
AND YOU NEED TO TEST IDEAS. In a fresh directory, create one file with "U.S." and another with "U_S_", and a third with "US" and try:
%grep U.S. * (* here stands for all files in current directory.) Next try:
%grep U\.S\. And finally, try
%grep U\\.S\\. (This should work to retrieve "U.S." lines only.)
Note how we would perform grep for either US or U.S. Note on page 606 "*" following a character denotes zero or more occurrences of that character.
The following doesn't work:
%grep U.*S.* * (The * at the end stands for all files, of course.)
Because "." is interpreted as any character. This doesn't work either:
%grep U\\.*S\\.* *
Because * after \\. is interpreted by the shell. And this doesn't work:
%grep U\\.\\*S\\.\\* *
because grep sees "U\.\*S\.\*" and will now be looking for literal characters "U.*S.*" But this works:
%grep U\\.\*S\\.\* *
Because grep sees "U\.*S\.*". See what I mean about being "defensive" in using regular expressions?
You should read Glass to understand why this works. Responsible on Quiz! Note should type "man grep" when you're online and forget some feature.
Chapter 7. Now start going through Chapter 7 of K&R. Follow along. Also, look for MORE DETAILS of everything we cover in Appendix B!
You should now learn and use EVERYTHING YOU CAN in the C Library Function coverage of Appendix B! You'll be expected to know almost ALL.
Section 7.1 Standard Input (stdin) and Standard Output (stdout.) Already know a lot of this. Standard I/O redirection:
%prog <infile >outfile
Idea of a pipe is new: prog1 | prog2. This will execute prog1 and prog2, send stdout from prog1 into stdin for prog2. For example:
%grep S *|grep U
Will perform "grep S *" on all files, pass lines through the interface as stdin, then grep U will find all lines with both U and S in them somewhere!
Note, specification in Glass, pg. 33: more -f [+lineNumber] {fileName}*
The option -f means don't fold long lines (useful if piping stdout to something else that deals with lines, e.g., wc command); can start at given line number. If no file name specified, will use stdin. See how this is useful?
For example: ls -lt|more This gives a long listing of files in the current directory, most recent first (-t option), & pipes through "more"; more uses stdin, so most recent files won't go off screen. Also: finger name|more
7.2. Formatted output: Printf. Formats and prints out internal values.
int printf(char *format, arg1, arg2, . . .);
Note printf has a VARIABLE LENGTH ARGUMENT LIST (as many as there are % conversions in the format string). We will learn how to do this shortly.
The return from printf is the number of characters printed (haven't used this up to now, but useful if there is error or some limit truncation).
Between the % and the conversion character, there are a number of other characters which may exist. In the order they must be placed, they are:
- (minus sign) left adjust printing of argument
m (number m) minimum field width
. (dot) separates min field width from precision
p (integer p) precision: max chars for string, min digits for int
h or l (letter h or l) h for short int, l for long int
(ORDER of options for %d is: %[-][m][.][p][h|l]d, Note: no embedded spaces!)
E.g., figure out what these would do: %10d, %-10d, %hd, %ed, %10ld, %10.p
Try spending 10 minutes with a program, using different formats.
Learn these with examples of string precision given on pg. 154, bottom.
Also, to print at most max characters from string s (max is int type var or const), use * after % and include the int max as an argument before s:
printf("%.*s", max, s);
Note, it is possible to print out a character string as a format string with no % variables: what we do with printf("hello, world!\n"); could write:
char s[ ] = "hello, world";
printf(s);
But if s might get a % character in it, this is unsafe, since format will then require another argument after s. Better to write out s[ ] as:
printf("%s", s);
Finally, the function sprintf will work same as printf, but write to string.
int sprintf(char *string, char *format, arg1, arg2, . . .);
See sprintf in Appendix B, pg 245 K&R. Note how useful sprintf is!
Recall how we wrote itoa() and itox() functions. No functions like this in C library! Instead use sprintf() to print int into a string, using %d or %x.
Might now want to strcpy the string into a malloc'd area you create. You can start reading ahead now, using any C library function. See Appendix B.3 and B5 for malloc.
7.3 Variable-length argument lists. How to declare a variable argument list such as printf( ) has in a functional prototype with an ellipsis (. . .).
void minprintf(char *fmt, . . .); /* three dots in a row: var arg list */
Here is how this works. Recall that when a function call is called, a new stack frame is created for the function execution.
The stack frame holds a memory location to return to from this func, the arguments of the function, and local variables of the function. Leave Up!
Return location |
Argument 1 |
Argument 2 |
. . . |
Argument k |
Local variable 1 |
Local variable 2 |
. . . |
Local variable n |
This representation is not guaranteed for all machines. The va_ package we cover here hides actual representation.
The function can pick off one argument after another, and in the case of variable length argument lists, the k is not set in advance.
In a function with a var-arg list, we start by declaring a variable (say, ap for arg pointer), of type va_list, to refer to the arguments in the list.
void func(int n, . . .) /* Note the "ellipsis", i.e. (. . .) */
{
va_list ap;
Now use macros va_start, va_arg, and va_end to retrieve argument values. These macros exist in stdarg.h library: B7, pg 254 of K&R. Look at it.
Start by initializing ap, using va_start:
va_start(ap, n);
The variable "n" named in va_start is the last named argument before the ellipsis (. . .) in the var-arg list function, function. There must be at least one such argument: use it to tell how many arguments in list.
E.g., in printf( ), first argument gives string with % conversions for all later arguments.
Now ap points just BEFORE first unnamed arg. Each call to va_arg will advance ap one argument, return value; va_arg must name type of arg:
ival = va_arg(ap, int); /* use this where int argument */
dval = va_arg(ap, double); /* use where double argument */
sval = va_arg(ap, char *); /* use where string argument (ptr) */
But think about how you are going to KNOW the type of each arg!!! This is why format string is normally passed for printf( ): % conversions tell you.
Of course, you could assume that all the arguments for a specific var-arg function are of the same type, say string.
But you MUST know HOW MANY arguments. If overestimate, parsing past Argument k in stack frame, get garbage. To terminate, write: va_end(ap);
Example minprintf, page 156 (PUT ON BOARD). Will have homework on this.
7.4 Formatted input: scanf. This is the opposite of printf. Reads in variables from stdin using conversion format string. See pg. 246.
int scanf(char *format, . . .);
The value returned from scanf( ) is the number of successfully scanned tokens: not successful if can't parse the value brought in from stdin.
In calling scanf, call with any number of arguments. but must call with POINTER to variable so that the variable values can be set by scanf!
int age, weight;
char lname[100];
while(some condition) {
printf("Input your last name, age, and weight, separated by spaces);
cnt = scanf("%s %d %d", lname, &age, &weight);
. . .
}
Note: name is an array, and is already like a pointer to the char string.
Scanf is useful to allow you to read in int or double value AS A NUMBER, instead of a character string, where you have to do your own conversion.
(Of course, scanf() will always see a character sequence in stdin: just does its own conversion to int or double.)
However, scanf is FLAWED, because it ignores '\n' characters. Can get very confusing if user puts too few arguments on some line.
(Prompt) Input your last name, age, and weight, separated by spaces:
(User input) Clinton 50
(No response after carriage return. User tries again, remembers to include weight this time.)
(User input) Clinton 50 300
(scanf will see: Clinton 50 Clinton, since '\n' character from user carriage return is seen as white space separator; so thinks weight is weird value, and will return 2 as the number of successfully scanned tokens.)
Worst part is we're out of synch, since now 46 will be seen as last name in next prompt loop. Can't code defensively with scanf( ): can't count number of tokens parsed ON A LINE scanf doesn't care about input lines.
The best approach is to read a line into an array s[ ] and use "sscanf( )" function to pick apart the arguments in the line just input. This also allows you to try to interpret things in more than one way.
Know that sscanf works on a string if successfully scans all tokens in the string; there's another
But instead of getline( ) to bring in a user line, use library function fgets(), K&R pg. 247. Cover a bit later, Section 7.7. Prefer fgets( ) to gets() since they behave differently, and we must use fgets() for files.
Note in that in both scanf and sscanf, if you put special characters in the format string, we MUST see exactly those special characters in user input.
cnt = sscanf(s, "%d/%d/%d", &month, &day, &year); /* s has string */
will expect input like: 07/23/96. If not, cnt returned by sscanf will be less than 3.
RECALL how we wrote function atoi, axtoi to convert character string s to integer i. There is a function atoi in C library, but no axtoi. Question: How would we do this?
Use sscanf(s, "%d", &i) for atoi, or sscanf(s, "%x", &i) for axtoi.
RULE: Use sscanf only for programs needing only ONE input item, usually "quick and dirty" programs with no input checking.
Class 20.
7.5 File Access.
We have had practice reading from stdin and writing to stdout. We can redirect stdin from a file and stdout to a file at the command level.
In this Section, we learn to get program control over reading and writing named files. An example of an application of this is the command "cat".
cat fname1 fname2
This reads from file fname1, then from file fname2, and puts all characters it reads to standard output. Can catenate two files to a third, thus:
cat fname1 fname2 >fname3
Any number of files can be catenated; the Glass UNIX syntax is:
cat -n {FileName}*
The option -n gives line numbers to the output. What do you think happens if no files are named? (Yes. Read from stdin.)
Dealing with named files is surprisingly similar to dealing with stdin and stdout. Start by declaring a special named object, a "file pointer":
FILE *fp; (See Appendix B1.1, pg. 242)
The <stdio.h> header contains a structure definition with typedef name FILE, which contains component variables (buffer, etc.) used in file I/O.
You don't need to know the details of structs to use simple file I/O. Just use primitive functions, such as fopen():
fp = fopen(name, mode)
(Functional prototype: FILE * fopen(char *name, char * mode);)
Here fp is the return value, set to NULL if fopen fails! Now fopen is asked to open a named file (character string "name") in a particular "use mode".
Legal mode values include "r" for read, "w" for write, and "a" for append.
These modes cause the open file to have different behaviors. We can make calls to get a char out of an "r" file with getc(): (Appendix B1.4, pg 247)
c = getc(fp); /* like getchar(): an "r" mode file acts like stdin */
(Functional prototype: int getc(FILE *stream); Return EOF if fails.)
or put a char to an "a" or "w" file:
status = putc(c, fp); /* like putchar: "w" or "a" mode files like stdout */
(Func. prototype: int putc(int c, FILE * stream); Return EOF if error.)
When we fopen a file in "w" or "a" mode, if the file does not already exist, it will be created (as the vi editor creates a file it has never heard of).
If the file does already exist, then "w" mode fopen will destroy the old contents (like the command mv) and "a" mode will append new material to the end of the existing file (like the "save" command in mail).
More modes are given in Appendix B, pg. 242. (Will come to update later.)
When you have finished reading from a file or writing to a file, you should call fclose to close the file.
status = fclose(fp);
(Functional prototype: int fclose(FILE *stream);)
The function fclose( ) returns EOF if any error occurs, and zero otherwise.
Reading from stdin or writing to stdout, you sit at a particular character in a virtual file, called a "stream", and move only to the right to the next character.
But as we will see, in named files it is possible to "go to the left" to read characters over again (using the function fseek(), App. B1.6, pg 248).
When a C program is started, the operating system opens three files and provides file pointers (FILE *) to them: stdin, stdout, and stderr.
We can now define our old friends getchar and putchar as macros:
#define getchar( ) getc(stdin)
#define putchar(c) putc((c), stdout) <-- see why (c) is in parens?
Other file oriented analogs to input and output functions we've known are:
int fscanf(FILE *fp, char *format, . . .); /* mode of fp must be "r" */
int fprintf(FILE *fp, char *format, . . .); /* mode of fp is "w" or "a" */
Typically, we will use "f" version of primitives, fgets rather than gets.
(But use getc and putc)
(Note, we would still use fgets and sscanf in preference to fscanf.) OK, Now here's the cat program (LEAVE UP).
#include <stdio.h>
/* program to be compiled as "cat" executable (gcc cat.c -o cat) */
main(int argc, char *argv[ ])
{
FILE *fp;
void filecopy(FILE *, FILE *); /* funtional prototype */
if (argc == 1) /* no args: copy standard input */
filecopy(stdin, stdout); /* from on left, to on right is common */
else
while (--argc > 0)
if ((fp = fopen(*++argv, "r")) == NULL) {
printf("cat: can't open %s\n", *argv);
return 1;
} else {
filecopy(fp, stdout); /* copy this file to stdout */
fclose(fp);
} /* loop through all files named */
return 0;
}
/* filecopy: copy file ifp to ofp */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
while ((c = getc(ifp)) != EOF)
putc(c, ofp);
}
Every file open requires resources, and there is a limit on files open at once; good idea to close fp when done (all close at program termination).
FILE structure has buffer for disk data in memory; when putc, may not get written out to file. THUS IT THE DATA IS NOT SAFELY ON DISK.
This is important to a database, say. Functions fclose() (and fflush()) will flush buffer to disk file.
Look at Appendix B, pg. 241. This Appendix describes the standard library (ALWAYS LOOK AT the standard headers listed there). B1 contains <stdio.h> stuff.
Class 21.
QUIZ 3 Next class. BRING GLASS UNIX.
7.6 Error Handling
Trying to fopen a file that does not exist is an error, and there are other errors as well: reading or writing a file without appropriate permission.
With the "cat" program just covered (put up again), an error performing fopen will write something to stdout; maybe this was redirected to a file.
%cat fname1 fname2 >fname3
But recall there are three streams opened by the operating system when a program begins execution, stdin, stdout, and stderr.
And stderr usually goes to the screen even if stdout is redirected to a file
prog . . . >outfile (redirect stdout to outfile; destroy old outfile)
prog . . . >&outfile (redirect stdout and stderr to outfile, destroy old)
prog . . . >>outfile (redirect stdout to outfile; append on end)
Note it is reasonably common to use both > and >& in a single command:
prog . . . >outfile1 >&outfile2
Then you can grep for error in outfile2, but in any case outfile1 has no error msgs.
How do we rewrite the "cat" program so write error msgs to stderr? Done on pg 163 of K&R (below). Rewrite fopen() in loop of that program as:
if ((fp = fopen(*++argv, "r")) == NULL {
fprintf(stderr, "%s: can't open %s\n", prog, *argv);
exit(1);
}
Here, the error msg goes to stderr; the variable prog printed out under %s is a char array containing the name of this compiled program, initialized:
char *prog = argv[0]; /* name invoked for this program */
The exit(int) function terminates program execution when called and returns argument to invoking process (debugger, shell program, fork parent)
Of course, a "return value" from a main program would do this as well, but exit() will terminate execution as if we executed a return from main(), and can be called from any nested function!
A zero returned by a program means no error; non-zero values mean exceptional condition. You can set up conventions as to what values mean, but it's best to keep the values positive.
At the end of the main program on pg. 163, have new statement:
if (ferror(stdout)) {
fprintf(stderr, "%s: error writing stdout\n", prog);
exit(2)
}
The function ferror returns non-zero if an error occurred on the stream fp:
int ferror(FILE *fp); /* See B1.7, pg. 248 */
But this doesn't give the actual error number to allow us to tell the user or programmer what problem has occurred. Need Error handling functions.
Error handling functions covered in Appendix B, Section B1.7. (See pg 248) These are errors that can arise from any library calls, not just I/O.
Note there is a problem with program on pg. 163. To handle errors, we should #include <errno.h> (See B1.7). The function ferror() tells us if there is an error indication for a stream (the last one that occurred).
More generally, errno.h contains a macro expression "errno" which can be tested; it is zero if there is no problem and non-zero otherwise.
(Text in B1.7 says errno "may" contain an error number; it will contain one if there has been an errorany error, not just in a streamunless the error is so serious it has corrupted the error structs.)
We can use the function perror to write out the error msg associated with errno, but we have to test for error right after it occurs to get right one.
Note too that the most recent error that has occurred ON A STREAM may not be the most recent error that occurred ON THE SYSTEM.
Since perror will print out last error on the system, this might be an error that occurred on a different file!
So do test:
if (errno != 0) {
perror(s); exit(2);
}
after each system call. perror will print out an errormsg corresponding to integer in errno, as if by: fprintf(stderr, "%s: %s\n", s, "error message");
7.7 Line input and output. The standard C library equivalents to getline and putline: fgets and fputs. only slightly different from getline().
char *fgets(char *line, int maxline, FILE *fp); (like getline from file)
Reads the next input line (including '\n' at the end) from file fp into the char array line; at most maxline-1 chars will be read, then there will be a terminal '\0' added. Returns ptr to line or NULL (means EOF).
For output, the function fputs writes out line to fp. Usually printed out if end with '\n'.
int fputs(char *line, FILE *fp);
It returns EOF if error occurs (disk fills up?), and zero otherwise. Recall can use perror to print out exact error cause (to stderr, not user screen).
Don't use gets() (stdin) and puts() (stdout). They are confusing in inclusion of newline char. (No maxline in gets.) Always use fgets and fputs.
K&R shows (pg 165) how fgets is written in terms of getc in the standard library that was present on their system. Very simple.
/* fgets: get at most n chars from iop into char array s */
char *fgets(char *s, int n, FILE *iop)
{
register int c;
register char *cs;
cs = s;
while (--n >0 && (c = getc(iop)) != EOF))
if ((*cs++ =c) == '\n')
break;
*cs = '\0''
return (c == EOF && cs == s) ? NULL : s; /* error return if nothing new */
}
7.8. Miscellaneous functions (pg 166).
7.8.1 String operations. Talk through. Note strncpy variant, etc. Anything important missing? See Appendix B3, pg 249-250. Several of interest.
One valuable one: char * strstr(cs, ct). Find first example of char string ct in cs. (Look for string ct "01/05/97" in cs named "movie_times".)
Note you can look for ALL occurrences of ct in cs: After you find a match (ptr to char), advance pointer cs to that ptr + 1.
Function char *strtok(s, ct), pg 250, very commonly used indeed. Does the same thing suggested for strstr(), but does it automatically. For example:
char *tok[30] /* handle up to 30 tokens */
char s[ ] = " , Clinton, 50, 300.25"; /* normally would use fgets */
char ct[ ] = " ,"; /* space and comma are two delimiters */
int count = 1;
tok[count-1] = strtok (s, ct);
if (tok[count-1] == NULL) /* would indicate no tokens in line */
. . .; /* take appropriate action */
/* now calls for subsequent tokens */
while ((tok[count] = strtok(NULL, ct)) != NULL)
count++; /* count is right when fall through */
Following this, we can use sscanf to convert various arguments, e.g., second argument tok[1] under conversion %d.
The mem... functions are very much like the str... functions, except there is no null terminator. A good C library has very efficient mem... functions.
Note difference between memcpy and memmove (overlap objects check). Clearly the size n can be given by a sizeof(struct . . .) reference.
Section 7.8.2. Char Class testing. isalnum(). See App. B2, list pg 249. Less common but interesting: isxdigit(), iscntrl(), isspace(), ispunct().
7.8.3 Ungetc. Had something like this in Chapter 4, ungetchar(). Only have guarantee can push one char back, but usually enough.
7.8.4 Command Execution. The function system(). See pg 253.
int system(const char *s);
The string s has a system command (pwd, or date, or ls, or ls >fname, or could run a program: prog). Return depends on system. Learn from man.
A very common use is to run a program with parameters in another shell that will do some work I want.
int a, b;
char command[MAXCMD];
sprintf(command, "prog %d %d > prog.out", a, b);
system(command);
In program compiled as prog, get a and b values through argc and argv[ ].
The system call doesn't return string returned from the command, but by writing > prog.out, will create file with this output.
The calling function can then open this file, input the result and parse it (tokenize it).
7.8.5 Have talked about malloc()/free() before. you need to review it.
7.8.6 Math functions. Need to have #include <math.h>, and use flag for gcc:
gcc -lm source.c
7.8.7 Random functions: rand(), srand(); Look at pg. 252 where covered.
Also see bsearch() and qsort() on pg. 253. There is homework on this.
Here is the beginning of the man page on the C Library function qsort. (You should use the man command to get specifications of commands & fns?)
-------------------
man qsort
Reformatting page. Please wait ... done
C Library Functions qsort(3C)
NAME
qsort - quick sort
SYNOPSIS
#include <stdlib.h>
void qsort(void *base, size_t nel, size_t width,
int (*compar) (const void *, const void *));
DESCRIPTION
The qsort() function is an implementation of the quick-sort
algorithm. It sorts a table of data in place. The contents
of the table are sorted in ascending order according to the
user-supplied comparison function.
The base argument points to the element at the base of the
table. The nel argument is the number of elements in the
table. The width argument specifies the size of each ele-
ment in bytes. The compar argument is the name of the com-
parison function, which is called with two arguments that
point to the elements being compared.
The function must return an integer less than, equal to, or
greater than zero to indicate if the first argument is to be
considered less than, equal to, or greater than the second
argument.
The contents of the table are sorted in ascending order
according to the user supplied comparison function.
EXAMPLES
The following program sorts a simple array:
static int intcompare(int *i, int *j)
{
if (*i > *j)
return (1);
if (*i < *j)
return (-1);
return (0);
}
main()
{
int a[10];
int i;
a[0] = 9; a[1] = 8; a[2] = 7; a[3] = 6; a[4] = 5;
a[5] = 4; a[6] = 3; a[7] = 2; a[8] = 1; a[9] = 0;
qsort((char *) a, 10, sizeof(int), intcompare);
for (i=0; i<10; i++) printf(" %d",a[i]);
printf("\n");
}
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
__________________________________
|_ATTRIBUTE_TYPE___ATTRIBUTE_VALUE_ |
| MT-Level | MT-Safe |
|_______________|___________________|
SEE ALSO
sort(1), bsearch(3C), lsearch(3C), string(3C), attributes(5)
NOTES
The comparison function need not compare every byte, so
arbitrary data may be contained in the elements in addition
to the values being compared.
The relative order in the output of two items that compare
as equal is unpredictable.
Here's a more complex example of how to use qsort. Given an array emps[ ] of struct empstr below, you want to sort them in order by lname.
struct empstr {
char lname[12];
char fname[12];
int empno;
} emps[100];
Let's say we fill in the first 5 entries of the emps array (subscripts 0 to 4) and just want to sort the first five elements. The following will work.
qsort(emps, 5, sizeof(struct empstr), cmplname);
emps is a pointer (array name) to the base of the array (qsort doesn't recognize the pointer type, of course, but makes it a (void *) pointer.)
The number of elements to sort is 5. The size of an element is the sizeof expression given.
But there will be a gcc warning that we are "passing arg4 of qsort from incompatible pointer type." The cmplname function we will define is:
int cmplname(struct empstr *p1, struct empstr *p2) {
return(strcmp(p1->lname, p2->lname));
}
Note that the arguments must POINT TO the elements being compared. Then the function provided does the comparison based on these pointers and returns <0, =0, or >0 according to whether the first argument is <, =, or > the second argument.
Note that we could just as well have compared the elements on the basis if their empno, and then we would sort in order by empno.
To make everything cast properly in the call and do away with the warning, we would write instead when we call qsort:
qsort((void *) emps, 5, sizeof(struct empstr),
(int (*) (const void *, const void *))cmplname);
Now you should also know how to perform bsearch( ).
Class 22. Administer Quiz 3.
Lecture on UNIX Shells (Created by E. O'Neil)
Ref: Glass, Chap. 3, Chap. 6; There's an OLD and a NEW. Page #s format will be (OLD,NEW)
Look at Glass, (85,77). There we see a nice picture of the relationship between the three most common shells. At UMB we use the C shell, or its descendent, tcsh, the "T shell". tcsh should show up as a circle around the C shell, since it supports all the C shell actions and some more.
Note the "common core" of both shell variations. Many important features are there, and we'll cover them first, following Glass Chap. 3. Then we'll go on to the C shell, Chap. 6.
Figure 3.2 shows subdivision of core features. We've covered redirection "prog>out", wildcards "ls *.c", pipes "ls -lt|more", but not the others.
Processes
Basic to UNIX is the idea of a process. Each process contains a program in execution (it might be stopped, but it is living on the system, anyway).
Each process has own memory space with code, data, and a program stack.
Most programs we use daily are written in C and have "main(int argc, char *argv[])" to start, through which they access the arguments on their own command line. Even if they aren't written in C, they are given such access.
Processes can give birth to other processes using the fork() system call (K&R, Chap 8). Then there is a parent and a child process.
Typically the parent keeps track of the child but not vice versa. A common thing for a parent to do is just wait around until the child finishes some work, and exits (exit system call, Appendix B).
Shells are just programs that provide the command interface you interact with; each shell runs in its own process for you as a user.
Typically have a shell in a process giving a command interface and sometimes a program running under it (e.g, a command), in its own process.
Shell operations (87,79)
The shell is a program that is basically an initialization and then a loop over user commands. The shell interprets a line of user input, does whatever that says, and waits for another user input line.
Shell terminates when user types control-D at the beginning of a new line, or the shell command to exit, typically "exit". (If rlogin, end with exit.)
Some shells disable the control-D option or allow you to customize this point. The "logout" command not only causes this shell to go away but in addition everything else in your UNIX session.
Non built-in shell commands: they are programs.
Consider the "ls" or "lpr" or "vi" commands, or "myprog" --these are all programs, some in system directories, and "myprog" in your own current dir. The shell simply runs the program in a child process, passing the arguments to it via argc/argv.
Built-in Shell Commands. pg (88,79)
echo and cd are built-in shell commands--instead of running a program, the shell program detects these in the input line from the user and does the right action itself.
Note that cd needs to be built-in, to change the current dir in the shell process (doing the action in a program run from the shell would only change it for that process. e.g. in vi, can type !cd subdir, but doesn't "last")
When type "!" before a command, start a lower-level process for a temporary shell.
Shell variables.
Glass postpones variables until later (a bit on (103-105,92-95) much more in Chap. 6 (200-205,183-187). We introduce now to use in examples.
A shell variable or local variable is a name with a value held just in the current shell. (Follow this starting on (200,183).) Using the C shell:
% set x = 5
% set hwdir = ~poneil/cs240/hw4 --C shell: ~user expands to path for user
Once these are defined, we can access their values via $name, and this works in any shell.
% echo cking x: $x $hwdir -- NOTE not $(x) as would be in makefile
cking x: 5 /home/poneil/cs240/hw4
% ls $hwdir
assignment
% cd $hwdir
The full syntax and rules are given on pg. (200,103). In particular, see $name[selector] syntax. We will cover this later,
Note that makefiles have different parser:
CC= gcc later need to type:
$(CC) -g -o calcit -- WE DON'T NEED THE ()'s IN SHELL VARIABLE USE
Environment variables. (103-105,92-95)
An environment variable is a name with a value that gets communicated from the shell to programs running under the shell, including other shells.
Note for example that another shell is created when you time !ls in email!
There are lots of preexisting environment variables. See (103-105,92-95). E.g. TERM is type of your terminal (vt100).
This is set in your .login file (executed when you login). Take a look at your .login file and understand what you have there. Responsible for.
To define an env var of your own using the C shell:
% setenv y 10
% setenv printer lw_office
Their values are accessed just the same way as shell vars:
% echo $y $printer
10 lw_office
% lpr -P$printer *.c
Since they stick with you better, you might like to use env vars for "places to save info" while you work.
Metacharacters: table (89,80)
All shells agree on a bunch of special marks and what they mean. You need to know them both for their utility and so you can avoid or sidestep them to get the right characters to arrive as arguments to the program you want to run. We'll discuss many of them.
Wildcards (92-93, 83-84)
ls *.? list filenames with anything before a dot, one char after dot.
lpr *.[ch] print files *.c and *.h: any chars, then dot, then either c or h.
lpr *.[a-z][a-z] print files with lowercase double-char suffixes.
Pipes (93-95,84-86). Know this. Ignore mention here of awk, unless you need a way to print out just one or several columns for a file
Command substitution: `command` (95,86)
Enclose command in back-single-quotes, `command`, returns text from output into command line, as if you substituted output for `command`
A favorite use of this is to remember spots in the filesys where you are now, before you cd elsewhere, so you can get back later.
% pwd <---- print working directory
/nfs/gnu/gcc2.11.5/src/gdb/arch/i386/new
(You, thinking . . . how can I remember this???? AHA!)
% setenv g `pwd` Note: setenv is C-shell specific
% echo $g prove you have it saved in g
/nfs/gnu/gcc2.11.5/src/gdb/arch/i386/new
% cd ~/otherstuff
...later...
% cd $g get back to hard-to-type spot
...or use it without cd'ing there:
% more $g/README display README of that dir
Scripts (100,91)
In theory, you should be able to put any command in a file and then execute it from there--this is true if you're careful enough.
Look at (101,91) for steps. We've added step 1a here, important for C shell users. It ensures that the shell used to interpret the script is the C shell, not the Bourne shell (often the default)
1. put commands in file fname, usually via an editor
1a. First line of file: #!/bin/csh <-- this starts C Shell even if not in one
(See (101,91) Example line 2)
2. % chmod +x fname mark it executable
3.
% fname run itExample: a "printdisc" command, to print discussion.doc on myprinter. Create a file called printdisc, with the following contents:
% cat printdisc
#!/bin/csh
lpr -Pmyprinter discussion.doc
% chmod +x printdisc make it executable
% printdisc use it to print discussion.doc
NOTE CAN LOOP IN SCRIPT! See (216 & following, 198 & following)
Background processes (98,89)
If you have work that requires no user interaction, you can spin off a subprocess that runs concurrently with your main shell, typically doing some long-winded processing like preparing a file for printing.
Another long-winded job is finding something by looking at all the filenames in a directory tree--this is a job for the find program (258,237):
% find . -name a.c -print > find.out
looks for file "a.c" in all the subdirs of ".", i.e, the current dir. If you leave out the "-print", nothing at all is output. The "-print" specifies that the filename should be printed out. Here the output is redirected to find.out.
But this can take minutes. To free up the main shell from waiting for this to finish, make the find a background process by simply tacking on an "&":
% find . -name a.c -print > find.out& -- Spin off background process
Now the shell is ready again for your commands. If you later need to kill this background process, use the ps command to find its pid and "kill pid". Use ps again to make sure its gone.
Note: in example on (99,90) where mailing to yourself, the symbol before "mail" should be a pipe sign "|" (typo in old text, OK in new). Here the output of find is input to mail, so user gets a message with that output.
Quoting (105,95)
As mentioned earlier, there are so many shell metacharacters they often interfere with what you want to do. But you're in charge -- you can tell the shell to ignore the special meaning when you need to.
Strangely enough, single quotes are "stronger" than double quotes in suppressing metachars. They both suppress wildcards (*.c expands to a.c b.c), but single quotes suppress $var substitution as well.
Example. To find all the C sources in all subdirs of the current dir, we could do this:
% find . -name *.c -print
would expand to
find . -name a.c b.c -print (assuming a.c and b.c in the current dir)
But *.c has expanded too soon, so find would be given 5 args. It would complain about syntax, to the great mystery of the user. This is the kind of problem that gives UNIX a bad name.
But here is what we want to do: suppress wildcard expansion by the shell:
% find . -name "*.c" -print -- Quotes around *.c protect it from the shell
-- Lets *.c through to find
Single quotes would work here too.
Basic Job Control (107,97)
You should know how to use ps (e.g., when use & to run in background) and kill (111,101).
On (112,102), the command wait is an advanced command where you need to know how to use fork; but the command sleep (in examples) is useful.
Example. You want to watch a long-running program, with pid 2345.
% ps -l |grep 2345 (-l means long listing; messed up on old book; use man
to figure out what to do)
You write a little script in C-shell, to report on it every 5 secs (after pgm started); while is described on (222,204)
#!/bin/csh
while (1)
date >> watchfile
ps -l|grep 2345 >> watchfile
sleep 5
end
You can run this in the background and kill it when you're done.
Class 23.
Exam 2 soon. Homework 6 is due. Homework 7 due soon. Will cover only a bit of Chapter 8, but you should know parts of Appendix B we've mentioned AND ALSO Appendix A. You must be able to find things in these appendices.
Continuing C Shell Scripts from Glass
Example An improved print script. Earlier we had:
file printdisc:
#!/bin/csh
lpr -Pmyprinter discussion.doc
Now we would like to make a "print" command that can print any file, so that "print prog.c" does the command "lpr -Pmyprinter prog.c" and so on. Here it is:
file print:
#!/bin/csh
lpr -Pmyprinter $1 (See (105,95).)
The $1 stands for first argument, $2 for second, etc, so "print prog.c" makes $1 = "prog.c".
We can do better: $* stands for all args starting from the first, so we do:
#!/bin/csh
lpr -Pmyprinter $*
as the next version of print. Then "print a.c b.c" prints both files.
You might worry that here we are using a metacharacter "*". Do we need to quote it to prevent it from being wildcard-expanded?
The answer is no, here were talking to the shell, and we are seeing that the shell uses * for more than one kind of "all-of-these" indicators. Its only when we need to get a * through the shell to another program that we need to quote it.
Note: this kind of command is often made into an alias, instead of a script:
alias print lpr -Pmyprinter (See (209,191).)
will do the job. Then "print a.c b.c" will print both files, etc., because the alias expands in place, replacing the "print" with "lpr -Pmyprinter" and leaving the a.c and b.c in the resulting command.
Typically this alias is put in .cshrc so that it is always available.
E.g. alias ls ls -F
But scripts can do bigger jobs than aliases, so lets return to them. Suppose you want to run a program for several different values of its args. The simplest way is to use an editor to write out all the ways:
#!/bin/csh -- file runprog.csh: "runprog.csh" does 4 runs of myprog
myprog small 10 (two arguments are in argv[ ])
myprog small 20
myprog large 10
myprog large 20
Another way is to pass the args through to runprog.csh:
#!/bin/csh
-- now "runprog.csh small large 10 20" does the same 4 runs
myprog $1 $3
myprog $1 $4
myprog $2 $3
myprog $2 $4
Now you can tune the run as needed, for example doing
runprog.csh small large 15 30
Or you can use looping in C Shell (See 216,198):
#!/bin/csh
foreach size (small large) --using a list (pg. 202) in a foreach (pg. 216)
foreach n (10 20)
myprog $size $n
end
end
This is clearly a good way to go if a larger number of cases are needed.
You can generate the 10, 20, 30, sequence using arithmetic in the C Shell big advantage of the C shell over the Bourne shell: it can do arithmetic but the Bourne shell cant. See example on pg. (223,204) for how to do this.
Class 24.
Chapter 8. UNIX System Interface: C Library functions & UNIX System calls
The C library functions are uniform across UNIX OS (not NT), written using UNIX system calls; UNIX system calls use NOTHING -- they are at the lowest level.
Can print descriptions using man online. There is no index to the names of the calls, but you can list all file names for functions in each category.
%cd /usr/man/man2 (directory of UNIX system calls files)
%cd /usr/man/man3 (directory of C library functions files)
Then the ls command will give you the names of all the files, which gives you the names of the functions that exist. See next pg for fork.
(Already seen early part of man page for C Library function qsort.)
Fork is a particularly basic "UNIX system call" by which your program creates another process that will run independently from you.
Idea: the shell is a command interpreter living in a process; when you run a program the shell performs a fork to run the program.
The parent process for your running program is the shell; when you fork a process yourself, it has you as parent (the shell as grandparent). (email on own process. When !ls, creates child process to do ls; !cd doesn't STICK.)
After the fork occurs, the child process seems to have EXACTLY the same program environment that created the new process (memory, data, etc.)
How you tell the different is the returned value pid_t is zero if you're the child and the pid of child if you're the parent. (SF idea: how do you know if you're the clone? Parents keep track of child processes, not vice-versa.)
cs240. Am examples man page for UNIX System Call: fork .====
terminus(3)% man fork (There's a new version in man now.)
Reformatting page. Please wait ... done
FORK(2V) SYSTEM CALLS FORK(2V)
NAME
fork - create a new process
SYNOPSIS
int fork()
SYSTEM V SYNOPSIS
pid_t fork()
DESCRIPTION
fork() creates a new process. The new process (child process)
is an exact copy of the calling process except for the following:
+ The child process has a unique process ID. The child process
ID also does not match any active process group ID.
+ The child process has a different parent process ID (the pro-
cess ID of the parent process).
+ The child process has its own copy of the parent's de-
scriptors. These descriptors reference the same underlying
objects, so that, for instance, file pointers in file objects
are shared between the child and the parent, so that an
lseek(2V) on a descriptor in the child process can affect
a subsequent read(2V) or write(2V) by the parent. This de-
scriptor copying is also used by the shell to establish
standard input and output for newly created processes as well
as to set up pipes.
+ The child process has its own copy of the parent's open direc-
tory streams (see directory(3V)). Each open directory stream
in the child process shares directory stream positioning with
the corresponding directory stream of the parent.
+ All semadj values are cleared; see semop(2).
+ The child processes resource utilizations are set to 0; see
getrlimit(2). The it_value and it_interval values for the
ITIMER_REAL timer are reset to 0; see getitimer(2).
+ The child process's values of tms_utime(), tms_stime(),
tms_cutime(), and tms_cstime() (see times(3V)) are set to zero.
+ File locks (see fcntl(2V)) previously set by the parent are not
inherited by the child.
+ Pending alarms (see alarm(3V)) are cleared for the child process.
Sun Release 4.1 Last change: 21 January 1990 1
FORK(2V) SYSTEM CALLS FORK(2V)
+ The set of signals pending for the child process is
cleared (see sigvec(2)).
RETURN VALUES
On success, fork() returns 0 to the child process and returns
the process ID of the child process to the parent process. On
failure, fork() returns -1 to the parent process, sets errno to
indicate the error, and no child process is created.
ERRORS
fork() will fail and no child process will be created if one or more
of the following are true:
EAGAIN The system-imposed limit on the total number
of processes under execution would be
exceeded. This limit is determined when the
system is generated.
The system-imposed limit on the total number
of processes under execution by a single user
would be exceeded. This limit is determined
when the system is generated.
ENOMEM There is insufficient swap space for the new process.
SEE ALSO
execve(2V), getitimer(2), getrlimit(2), lseek(2V), read(2V),
semop(2), wait(2V), write(2V)
--EOF: /tmp/man.22130--
END OF MAN PAGE ===============================================
The child will note a value of zero returned from the fork call and go do what the child process is supposed to do, eventually exiting.
Note that the C library function system(char *s) (K&R pg. 167), which can run any UNIX command, creates a new Shell program to run the command.
The name "system" for this function is a confusing one: it is NOT a system call. We refer to it as the system function in the C library.
K&R, in Chapter 8 (pg 171, midpage), mentions a header file, syscalls.h, that gives functional prototypes for system call functions.
But in real life there need be NO SUCH HEADER. There are NO FUNCTIONAL PROTOTYPES needed for system calls; the standard is pre-ANSI C where the arguments for the functions are typed AFTER the function name.
The other thing to understand about a system call is that Operating System security is based on not allowing the caller to play any tricks.
When a program performs a system call, the processor performs a TRAP (flow of control stops, as though an error had occurred: divide by zero).
The trap handler in the "UNIX Kernel" looks at the processor stack to see which system call number was invoked; the system call then starts running in a privileged mode: e.g., it can modify memory locations that control your process.
It would be a bad thing to allow general program flow of control to run in this privileged mode, because it might be able to take over the processor, keep other processes from running, destroy the system disk, etc.
All the viruses you hear about on Microsoft DOS or Windows 3.1 occur because the program can do things in privileged mode. Windows 95 OK.
In Chapter 8 of K&R, some example system calls are covered that we will not have time to treat in detail, but here are some of the ideas.
NOTE that we DON'T want you to use these system calls for I/O. They're too low-level! We just want you to see how C Library Functions are implemented using System Calls. DON'T USE IN PROGRAMS ON EXAMS!
Open K&R to Section 8.1: File I/O (pg. 169-70). Note that file descriptors for system calls are simply integers.
Each such filed descriptor returned by a call to creat( ) or open( ) is an array subscript into an array of structs that contain info about I/O.
See Section 8.3, pg. 172 for how to open a file (PLACE ON BOARD):
int fd;
int open(char *name, int flags, int perms); /* E.g. functional prototype */
fd = open(name, flags, perms); /* opens a file */
Note: We don't REALLY need functional prototypes for system calls. There is no header file for them because they predate ANSI SQL.
In arguments above, "name" is name of the file to open, "perms" is always zero, "flags" is an int, an OR of flag symbolic constants such as these.
O_RDONLY open for reading only
O_WRONLY open for writing only
O_RDWR open for both reading and writing
E.g., see pg. 176, contents of <stdio.h> (supports C Library functions fopen, fgets, fputs, fread, fwrite, getc, putc, etc.) Note enum_flags in middle of page. Not the same as O_ flags above, because those are for system calls.
However, probably O_RDONLY is 01 and O_WRONLY is 02 and O_RDWR is 03. System calls use the same _iobuf struct array elements to have file open.
See the UNIX system directory /usr/include/stdio.h for stdio.h; see the sub-directory /usr/include/sys/fcntl.h for the O_ flags.
Instead of open, might want to create a new file and open it for I/O:
int creat(char *name, int perms);
fd = creat(name, perms);
Note both open and creat return -1 (EOF) if failure to open. If a file already exists with this name, not an error: creat will discard prior contents.
Back in Section 8.2, pg 170, see system calls to read and write files. The examples given are confusing (neither calls nor functional prototypes).
#define BUFSIZE 1024
int read(int fd, char *buf, int n); /* functional prototypes */
int write(int fd, char *buf, int n);
int x, y, fd, perms = 0;
char buff[1024], fname[ ] = ". . .";
fd = open(fname, O_RDONLY, perms);
x = read(fd, buff, BUFSIZE);
The read system call actually reads out of another buffer (the system buffer for I/O), and brings the required data into process local memory. The request MAY make the system perform a REAL disk I/O (expensive).
Databases use special "raw devices" in the open command to avoid system buffering: instead of fname, use /dev/DISK001 (a named disk device).
Then read( ) will actually read from disk (better have good sized buffer).
Now: How are C Library calls implemented in terms of System Calls?
Note that fopen in the C library uses these system calls: when you try to open a file and it doesn't exist, really invokes a new system call, creat.
See Section 8.5, pg 176, the stdio.h contents (simplified). Note symbolic constants at the top, including NULL, EOF, BUFSIZE, and OPEN_MAX.
Of course stdio.h is included in a user program. What is the maximum number of files a user will be able to open? (Actually depends on the type of machine: defined with #if-#elif-#endif: See /usr/include/stdoi.h.)
Read through more of stdio.h, especially the _iobuf struct, typedef name FILE. Recall in Library Function the file descriptor is of type FILE *.
Note declaration right after that struct definition says array of FILE type structs named "_iob" is extern (not part of header file naturally, but user can access it. It has OPEN_MAX entries).
And stdin, stdout, and stderr use the first three entries of these arrays.
Note the entry (of struct type _iobuf) contains information on fd (the array subscript) and lots of info on the buffer (1024 bytes) and pointer into the array and how many characters are left.
See how getc(p) is a macro (recall p is of type FILE *):
(--(p)->cnt >= 0 ? (unsigned char) *(p)->ptr++ : _fillbuf(p))
getc( ) brings back one char from buffer if still has one, otherwise calls _fillbuf, which fills the area pointed to by *base from disk and returns the first character. See pg 178 for _fillbuf.
On page 177, fopen looks through the array _iob with a pointer fp (points to FILE) until fp->flag doesn't have the _READ or _WRITE flag turned on.
This means that an empty slot in _iobuf has been found.
Now if *mode == 'w', we call creat to create a new file. And so on.
On page 178, see function _fillbuf mentioned above to fill a buffer, that is pointed to by "base" in the FILE struct pointed to by argument fp.
Thus we see (halfway down), if no buffer exist yet:
. . . fp->base = (char *) malloc(bufsize) . . .
Then a bit later, the read command fills the buffer with data read from fp->fd, the UNIX file descriptor int.
This section tells you what the C library I/O functions ACTUALLY DO.
It's tremendously enticing for people who always want to get to the root of things, to understand the lowest level details of how C works.
In Section 8.6, we see an example of how system calls would be used to walk directories.
A simple implementation of malloc() and free() is given in Section 8.7. The only system call here is on pg. 188, in the function morecore(): sbrk(), increments the systems idea of the program's data space. See the man pg.