Compilers - Lecture 9

February 16, 2021

Lecture 9

Parser

The Big Picture

  • Language syntax is specified over parts of speech (tokens)
  • Syntax checking matches sequence of tokens against a grammar
  • Here is an example of context free grammar (CFG) G.
1. goal -> expr
2. expr -> expr op term
3.       | term
4. term -> number
5.       | id
6. op   -> +
7.       | -

G in BNF form

Math form:

S = goal
T = { number, id, +, - }
N = { goal, expr, term, op }
P = { 1, 2, 3, 4, 5, 6, 7 }

Scanner - Big Picture

Why study lexical analysis?

  • We want to avoid writing scanners by hand

Goals:

  • To simplify specification & implementation of scanners
  • To understand the underlying techniques and technologies

Regular Expressions

  • Σ = {a, b, c}
    • Alphabet, finite set of symbols

Syntax vs. Semantics

Syntax - Regular Expressions Semantics
ϵ { ϵ }
a { a }
---
r1r2 { w1, w2 | w1 ∈ L(r1), w2 ∈ L(r2) }
r1 | r2 {w | w ∈ L(r1) or w ∈ L(r2) }
r* ∪(i =0, ∞) Li(r)
r2+ ∪(i =1, ∞) Li(r)

Example 1

a(b|c)

a = { a } b = { b } c = { c } b | c = { b, c } a ( b | c ) = { ab, ac }

The Point

Regular expressions can be used to specify words to be translated to parts of speech (tokens) by a lexical analyzer

Using results from automata theory and theory algorithms, we can automatically build recognizers from regular expressions

=> We study REs and associated theory to automate scanner construction!

Example 2

Consider the problem of recognizing ILOC register names Register -> r (0|1|2|...|9) (0|1|2|...|9)*

  • Allows registers of arbitrary number
  • Requires at least one digit

RE corresponds to a recognizer (or DFA)

DFA Operation

  • Start in state s0 & take transitions on each input character
  • DFA accepts a word x if and only if x leaves it in a final state (s2)

So,

  • r17 takes it through s0, s1, s2 and accepts
  • r takes it through s0, s1, and fails
  • a takes it straight to error state se (not shown here)

To be useful, recognizer must turn into code