|
1
|
|
|
2
|
- 1 Introduction
- 2 Language processors (tombstone diagrams, bootstrapping)
- 3 Architecture of a compiler
|
|
3
|
|
|
4
|
- Some high-level languages:
- C, C++, Java, Pascal, Ada, Fortran, Cobol, Scheme, Prolog, Smalltalk,
...
- Some low-level languages:
- x86 assembly language, PowerPC assembly language, SPARC assembly
language, MIPS assembly language, ARM assembly language, ...
|
|
5
|
- What makes a high-level language different from a low-level language?
|
|
6
|
- A high-level language is more abstract than a low-level language.
- More abstract? What does that mean?
- Abstraction: Separate the ‘how’ from the ‘what’.
- Or what is implemented from how is it implemented.
- e.g. procedural abstraction = separate ‘what does it do’ from ‘how does
it do it’
- HL languages abstract away from the underlying machine => much more
portable
|
|
7
|
- Q: How do the following make a HL language more abstract?
|
|
8
|
- Examples:
- Editors
- Translators (e.g. compiler, assembler, disassembler)
- Interpreters
|
|
9
|
|
|
10
|
- Why?
- A communication device between people who need to have a common
understanding of the PL:
- language designer, language implementer, user
- What to specify?
- Specify what is a ‘well formed’ program
- syntax
- contextual constraints (also called static semantics):
- Specify what is the meaning of (well formed) programs
- semantics (also called runtime semantics)
|
|
11
|
- Why?
- What to specify?
- How to specify ?
- Formal specification: use some kind of precisely defined formalism
- Informal specification: description in English.
- Usually a mix of both (e.g. Java specification)
- Syntax => formal specification using CFG/BNF
- Contextual constraints and semantics => informal
|
|
12
|
- Syntax is specified using “Context Free Grammars”:
- A finite set of terminal symbols
- A finite set of non-terminal symbols
- A start symbol
- A finite set of production rules
- Usually CFG are written in “Bachus Naur Form” or BNF notation.
- A production rule in BNF notation is written as:
- N ::= a where N is a non
terminal
and a a
sequence of terminals and non-terminals
- N ::= a | b | ... is an
abbreviation for several rules with N
- as
left-hand side.
|
|
13
|
- A CFG defines a set of strings. This is called the language of the CFG.
- Example:
- Start ::= Letter
- | Start Letter
- | Start Digit
- Letter ::= a | b | c | d | ... | z
- Digit ::= 0 | 1 | 2 | ... | 9
- Q: What is the “language” defined by this grammar?
|
|
14
|
- Mini-Triangle is a very simple Pascal-like programming language.
- An example program:
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
- A syntax tree or parse tree is an ordered labeled tree such that:
- a) terminal nodes (leaf nodes) are labeled by terminal symbols
- b) non-terminal nodes (internal nodes) are labeled by non-terminal
symbols.
- c) each non-terminal node labeled by N has children X1, X2,
... Xn (in this order) such that N := X1 X2
... Xn is a production.
|
|
19
|
|
|
20
|
- The previous grammar specified the concrete syntax of Mini-Triangle.
|
|
21
|
|
|
22
|
|
|
23
|
|
|
24
|
|
|
25
|
- Abstract Syntax Tree for: d:=d+10*n
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
|
|
30
|
|
|
31
|
|
|
32
|
|
|
33
|
- This course is about compilers
- Compilers are language processors
- translate high-level language into low-level language
- help bridge the semantic gap
- Language specification
- needed for communication between language designers, implementers, and
users
- Three “parts” we will study during this course
- Syntax of the language: usually formal: Extended BNF
- Contextual constraints: usually informal: scope rules and type rules
(written in English)
- Semantics: usually informal: descriptions in English
|
|
34
|
- Introduction
- Language processors (tombstone diagrams, bootstrapping)
- Architecture of a compiler
|
|
35
|
- Examples:
- Chinese => English
- Java => JVM byte codes
- Scheme => C
- C => Scheme
- x86 Assembly Language => x86 binary codes
|
|
36
|
- An assembler and a compiler both translate a “more high level” language
into a “more low-level” one.
|
|
37
|
|
|
38
|
|
|
39
|
- What are they?
- diagrams consist of a set of “puzzle pieces” we can use to reason about
language processors and programs
- different kinds of pieces
- combination rules (not all diagrams are “well formed”)
|
|
40
|
|
|
41
|
|
|
42
|
|
|
43
|
|
|
44
|
|
|
45
|
|
|
46
|
|
|
47
|
|
|
48
|
- Compilers typically offer more advantages when
- programs are deployed in a production setting
- programs are “repetitive”
- the instructions of the programming language are complex
- Interpreters typically are a better choice when
- we are in a development/testing/debugging stage
- programs are run once and then discarded
- the instructions of the language are simple
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
- In the previous example we have seen that portability is not an “all or
nothing” kind of deal.
- It is useful to talk about a “degree of portability” as the percentage of code that needs
to be re-written when moving to a dissimilar machine.
- In practice 100% portability is not possible.
|
|
53
|
|
|
54
|
|
|
55
|
|
|
56
|
|
|
57
|
|
|
58
|
|
|
59
|
|
|
60
|
|
|
61
|
|
|
62
|
|
|
63
|
|
|
64
|
|
|
65
|
|
|
66
|
|
|
67
|
|
|
68
|
|
|
69
|
|
|
70
|
|
|
71
|
|
|
72
|
- 1 Introduction
- 2 Language processors (tombstone diagrams, bootstrapping)
- 3 Architecture of a compiler
|
|
73
|
|
|
74
|
- The different phases can be seen as different transformation steps to
transform source code into object code.
- The different phases correspond roughly to the different parts of the
language specification:
- Syntax analysis <-> Syntax
- Contextual analysis <-> Contextual constraints
- Code generation <-> Semantics
|
|
75
|
- We now look at each of the three different phases in a little more
detail. We look at each of the steps in transforming an example Triangle
program into TAM code.
|
|
76
|
|
|
77
|
|
|
78
|
|
|
79
|
|
|
80
|
- Finds scope and type errors.
|
|
81
|
- Assumes that program has been thoroughly checked and is well formed
(scope & type rules)
- Takes into account semantics of the source language as well as the
target language.
- Transforms source program into target code.
|
|
82
|
|
|
83
|
- A “pass” is a complete traversal of the source program, or a complete
traversal of some internal representation of the source program (such as
an AST).
- A pass can correspond to a “phase” but it does not have to!
- Sometimes a single pass corresponds to several phases that are
interleaved in time.
- What and how many passes a compiler does over the source program is an
important design decision.
|
|
84
|
|
|
85
|
|
|
86
|
|
|
87
|
|
|
88
|
- Example Pascal:
- Pascal was explicitly designed to be easy to implement with a single
pass compiler:
- Every identifier must be declared before its first use.
|
|
89
|
- Example Pascal:
- Every identifier must be declared before it is used.
- How to handle mutual recursion then?
|
|
90
|
- Example Pascal:
- Every identifier must be declared before it is used.
- How to handle mutual recursion then?
|
|
91
|
|