CS 451/651 Syllabus
Fall Semester, 2009
University of Massachusetts at Boston
David Levine
From the catalog:
Introduction to compiler organization
and implementation, including formal specifications and algorithms for
lexical and syntactic analysis, internal representation of the source
program, semantic analysis, run-time environment issues and code
generation. Students will write a compiler for a reasonably large
subset of a contemporary language, targeted to a virtual machine.
Expected Background/Preparation:
- Algorithms and Data Structures, particularly trees, also stacks
and
lists;
- Recursion, and basic object-oriented programming (classes,
subclasses,
member functions, virtual/abstract functions);
- Theory of Computation, primarily Regular Expressions (and Finite
Automata), and Context-Free Grammars;
- Some familiarity with machine models at an assembly-language
level: ability to understand a simple RISC machine
- (not required for first month)
- Ability to program in Java.
Syllabus:
- Introduction: role of compiler in transforming
programs. Need to maintain semantics; need precise
definitions of programming language.
- Lexical structure: defined by regular expressions.
Brief experimentation with hand-coded scanner; brief overview of
how the RE description could be mechanically transformed into a
table-driven scanner. Introduction to JFlex scanner tool;
care and feeding.
- Syntactic structure: defined by BNF. (Brief review of
context-free grammars). Importance of syntax in ascribing
meaning, example: 2+3*4. Simple grammars: English phrases,
class expression grammar (from text).
- Parsing: hand-coded recursive descent analysis of simple
grammars. Concept of "FIRST" sets; ideas about how one might
mechanise parser construction. Introduction to LR parsing;
watching the parser in action. Introduction to CUP parser
generator tool.
- Source Language for class/project: simple integer
expression grammar. Assignment statements, variable declarations
for scalars and arrays; integer constants.
- First Internal Representation: design of tree data
structure to represent program (AST); object-oriented
basis. Action routines; building the IR during the parse.
- Semantic Analysis: walking the IR. Type checking:
propagating information up the tree. How would one insert
conversions etc?
- Second Internal Representation: ILOC machine model (from
text), simple RISC structure. Code generation: tree walk to
select instructions. Unlimited number of registers. Linear
structure of IR.
- Execution Model: simple stack, automatic variables.
- Optimization 1: discussion of different kinds of
optimizations, various issues, costs and benefits. Local and
"global" optimization. Value numbering: theory and operation,
illustration of actual implementation within class compiler framework.
- Control Flow: adding "if" and "while" to the
language. Boolean type. Dealing with control flow, forward
and backwards branches.
- Functions: explore adding function definitions and calls to
the
language. Runtime model: stack, activation records,
arguments, return value, local variables, other storage types.
Implementation of factorial.
- Optimization 2: additional topics. Loop optimization,
including hoisting, introduction to software pipelining.
Introduction to data flow equations. SSA representation.
Scheduling. Register allocation with local optimization.
Register allocation with global optimization: graph coloring solutions.
Project:
Students will develop a complete compiler for a small algebraic
language (a subset of Java). They will be able to compile and
execute programs.
This project will be done in Java, using some libraries supplied by the
instructor. It will be created incrementally, with new
capabilities added each week. Students should plan on a weekly
programming assignment: the size and complexity of the weekly
assignment will not be a problem if done on time each week.
Text:
Engineering a
Compiler, by Keith D. Cooper and Linda Torczon.
Morgan Kaufman, 2003.