Notes
Slide Show
Outline
1
Compiler Design
an Overview


2
Overview

  • 1 Introduction
  • 2 Language processors (tombstone diagrams, bootstrapping)
  • 3 Architecture of a compiler
3
Levels of Programming Languages
4
Levels of Programming Languages
  • Some high-level languages:
    • C, C++, Java, Pascal, Ada, Fortran, Cobol, Scheme, Prolog, Smalltalk, ...
  • Some low-level languages:
    • x86 assembly language, PowerPC assembly language, SPARC assembly language, MIPS assembly language, ARM assembly language, ...
5
Levels of Programming Languages
  • What makes a high-level language different from a low-level language?
6
Abstraction
  • A high-level language is more abstract than a low-level language.


  • More abstract? What does that mean?


  • Abstraction: Separate the ‘how’ from the ‘what’.
  • Or what is implemented from how is it implemented.
  • e.g. procedural abstraction = separate ‘what does it do’ from ‘how does it do it’
  • HL languages abstract away from the underlying machine => much more portable
7
Levels of Programming Languages
  • Q: How do the following make a HL language more abstract?
8
Language Processors: What are they?
  • Examples:
    • Editors
    • Translators (e.g. compiler, assembler, disassembler)
    • Interpreters
9
Language Processors: Why do we need them?
10
Programming Language Specification
  • Why?
    • A communication device between people who need to have a common understanding of the PL:
      • language designer, language implementer, user
  • What to specify?
    • Specify what is a ‘well formed’ program
      • syntax
      • contextual constraints (also called static semantics):
        • scoping rules
        • type rules
    • Specify what is the meaning of (well formed) programs
      • semantics (also called runtime semantics)
11
Programming Language Specification
  • Why?
  • What to specify?
  • How to specify ?
    • Formal specification: use some kind of precisely defined formalism
    • Informal specification: description in English.


    • Usually a mix of both (e.g. Java specification)
      • Syntax => formal specification using CFG/BNF
      • Contextual constraints and semantics => informal
12
Syntax Specification
  • Syntax is specified using “Context Free Grammars”:
    • A finite set of terminal symbols
    • A finite set of non-terminal symbols
    • A start symbol
    • A finite set of production rules
  • Usually CFG are written in “Bachus Naur Form” or BNF notation.
  • A production rule in BNF notation is written as:
  • N ::= a     where N is a non terminal
                and a a sequence of terminals and non-terminals
  • N ::= a | b | ...    is an abbreviation for several rules with N
  •                            as left-hand side.
13
Syntax Specification
  • A CFG defines a set of strings. This is called the language of the CFG.


  • Example:
  • Start ::= Letter
  •         | Start Letter
  •         | Start Digit
  • Letter ::= a | b | c | d | ... | z
  • Digit  ::= 0 | 1 | 2 | ... | 9


  • Q: What is the “language” defined by this grammar?
14
Example: Syntax of “Mini-Triangle”
  • Mini-Triangle is a very simple Pascal-like programming language.
  • An example program:
15
Example: Syntax of “Mini-Triangle”
16
Example: Syntax of “Mini-Triangle” (continued)
17
Example: Syntax of “Mini-Triangle” (continued)
18
Syntax Trees
  • A syntax tree or parse tree is an ordered labeled tree such that:
    • a) terminal nodes (leaf nodes) are labeled by terminal symbols
    • b) non-terminal nodes (internal nodes) are labeled by non-terminal symbols.
    • c) each non-terminal node labeled by N has children X1, X2, ... Xn (in this order) such that N := X1 X2 ... Xn is a production.
19
Syntax Trees
  • Example:
20
Concrete and Abstract Syntax
  • The previous grammar specified the concrete syntax of Mini-Triangle.
21
Example: Concrete/Abstract Syntax of Commands
22
Example: Concrete/Abstract Syntax of Commands
23
Example: Concrete Syntax of Expressions
24
Example: Abstract Syntax of Expressions
25
Abstract Syntax Trees
  • Abstract Syntax Tree for:    d:=d+10*n
26
Contextual Constraints
27
Scope Rules
28
Type Rules
29
Semantics
30
Semantics
31
Semantics
32
Semantics
33
Conclusion / Summary
  • This course is about compilers
  • Compilers are language processors
    • translate high-level language into low-level language
    • help bridge the semantic gap
  • Language specification
    • needed for communication between language designers, implementers, and users
    • Three “parts” we will study during this course
      • Syntax of the language: usually formal: Extended BNF
      • Contextual constraints: usually informal: scope rules and type rules (written in English)
      • Semantics: usually informal: descriptions in English
34
Overview
  • Introduction
  • Language processors (tombstone diagrams, bootstrapping)
  • Architecture of a compiler
35
Compilers and other translators
  • Examples:
    • Chinese => English
    • Java => JVM byte codes
    • Scheme => C
    • C => Scheme
    • x86 Assembly Language => x86 binary codes
36
Assemblers versus Compilers
  • An assembler and a compiler both translate a “more high level” language into a “more low-level” one.
37
Assemblers versus Compilers
38
Terminology
39
Tombstone Diagrams
  • What are they?
    • diagrams consist of a set of “puzzle pieces” we can use to reason about language processors and programs
    • different kinds of pieces
    • combination rules (not all diagrams are “well formed”)
40
Tombstone diagrams: Combination rules
41
Compilation
42
Cross compilation
43
Two Stage Compilation
44
Compiling a Compiler
45
Interpreters
46
Interpreters
47
Interpreters
48
Interpreters versus Compilers
  • Compilers typically offer more advantages when
    • programs are deployed in a production setting
    • programs are “repetitive”
    • the instructions of the programming language are complex
  • Interpreters typically are a better choice when
    • we are in a development/testing/debugging stage
    • programs are run once and then discarded
    • the instructions of the language are simple
49
Interpretive Compilers
50
Interpretive Compilers
51
Portable Compilers
52
Portable Compilers
  • In the previous example we have seen that portability is not an “all or nothing” kind of deal.


  • It is useful to talk about a “degree of portability”  as the percentage of code that needs to be re-written when moving to a dissimilar machine.


  • In practice 100% portability is not possible.
53
Example: a “portable” compiler kit
54
Example: a “portable” compiler kit
55
Example: a “portable” compiler kit
56
Bootstrapping
57
Bootstrapping
58
Bootstrapping an Interpretive Compiler to Generate M code
59
Bootstrapping an Interpretive Compiler to Generate M code
60
Bootstrapping an Interpretive Compiler to Generate M code
61
Bootstrapping an Interpretive Compiler to Generate M code
62
Bootstrapping an Interpretive Compiler to Generate M code
63
Full Bootstrap
64
Full Bootstrap
65
Full Bootstrap
66
Full Bootstrap
67
Half Bootstrap
68
Half Bootstrap
69
Half Bootstrap
70
Bootstrapping to Improve Efficiency
71
Bootstrapping to Improve Efficiency
72
Overview
  • 1 Introduction
  • 2 Language processors (tombstone diagrams, bootstrapping)
  • 3 Architecture of a compiler
73
The Major “Phases” of a Compiler
74
Different Phases of a Compiler
  • The different phases can be seen as different transformation steps to transform source code into object code.
  • The different phases correspond roughly to the different parts of the language specification:
  • Syntax analysis <-> Syntax
  • Contextual analysis <-> Contextual constraints
  • Code generation <-> Semantics
75
Example Program
  • We now look at each of the three different phases in a little more detail. We look at each of the steps in transforming an example Triangle program into TAM code.
76
1) Syntax Analysis
77
1) Syntax Analysis --> AST
78
2) Contextual Analysis --> Decorated AST
79
2) Contextual Analysis --> Decorated AST
80
Contextual Analysis
  • Finds scope and type errors.
81
3) Code Generation
  • Assumes that program has been thoroughly checked and is well formed (scope & type rules)
  • Takes into account semantics of the source language as well as the target language.
  • Transforms source program into target code.
82
3) Code Generation
83
Compiler Passes
  • A “pass” is a complete traversal of the source program, or a complete traversal of some internal representation of the source program (such as an AST).
  • A pass can correspond to a “phase” but it does not have to!
  • Sometimes a single pass corresponds to several phases that are interleaved in time.
  • What and how many passes a compiler does over the source program is an important design decision.
84
Single Pass Compiler
85
Multi Pass Compiler
86
Example: Single Pass Compilation of ...
87
Compiler Design Issues
88
Language Issues
  • Example Pascal:
  • Pascal was explicitly designed to be easy to implement with a single pass compiler:
    • Every identifier must be declared before its first use.
89
Language Issues
  • Example Pascal:
    • Every identifier must be declared before it is used.
    • How to handle mutual recursion then?
90
Language Issues
  • Example Pascal:
    • Every identifier must be declared before it is used.
    • How to handle mutual recursion then?
91
Example: The Triangle Compiler Driver