. Syntax and semantics of programs
. syntax : the symbols used to write a program
. semantics : the actions that occur when a program is executed
. programming language implementation
. syntax -> semantics
. tranform program syntax into machine instructions
that can be executed to cause the correct sequence
of actions to occur
. Interpreter vs Compiler
. Interpreter : 느리다. 컴파일이 따로 필요없어서 간편하다.
source 코드를 바로 읽는 다.
. Compiler : 여러 translation을 해서 복잡하지만 빠르다.
target program으로 먼저 번역한다.
. Typical compiler
1. source program
2. lexical analyzer(lexer, scanner)
http://en.wikipedia.org/wiki/Lexical_analyzer
Regular expression, finite automata, lex
keyword, variable 등을 나눔(lexical token을 구함)
쓸데 없는 blank도 제거
3. syntax analyzer(parsing, parser)
http://en.wikipedia.org/wiki/Syntax_analysis
http://en.wikipedia.org/wiki/Yacc
Context free grammar, BNF, yacc(yet another compiler compiler)
parse tree를 만듬.
terminal node를 모으면 program이 된다.
nonterminal node도 이ㅆ음.
4. semantic analyzer - PL에서 다룸
Abstract syntax tree(AST) : parse tree에서 중요한 것만 남김
augmented parse tree : type와 declare된 장소에 관한 정보를 더함.(annotation)
type system
Annotation : symbol table을 이용해서 각 variable이 어디에 쓰이는 지 type등을 적음.
5. intermediate code generator - OS, architecture에서 다룸
function_start(), computer_expr(), assing() ...
AST를 tree traverse하여 depth first search하면 intermediate code가 된다.
Virtual Machine, 3-address instructions
6. code optimizer - OS, architecture에서 다룸
Common subexpression elimination : 한번만 계산하고 저장해뒀다가 사용
copy propagation : 같은 value 한번만 assign
dead-code elimination : 실행 안되는 코드 제거
loop optimizations : loop 밖으로 최대한 꺼냄
function in-lining : 함수 호출을 줄임, 프로그램 크기가 증가함.
7. code generator
Machine architecture
Register allocation
Optimization
reuse registers efficiently
8. target program - machine code
. Brief look at syntax
. Grammar
e ::= n | e+e | e-e
n ::= d | nd
d ::= 0|1|2|3|4|5|6|7|8|9
. expressions in language
e -> e-e -> e-e+e -> n-n+n -> nd-d+d -> dd-d+d -> .. -> 27-4+3
. Derivation represented by tree
Tree shows parenthesization of expression
. Parsing
. Given expression find tree
. Ambiquity
. Expression 27-4+3 can be parsed two ways
. Problem : 27-(4+3) != (27-4)+3
. Ways to resolve ambiquity
. Precedence(다른 operator일때) - high or low
. Group * befor +
. Parse 3*4+2 as (3*4)+2
. associativity(equal precedence일때) - left or right
. parenthesize operators of equal precedence to left(or right)
. (3-4)+5 : left-associative
. 3-(4+5) : right-associative
. Theoretical foundatinos
. many foundational system
. computability theory
. Program logics
. Lambda calculus
. denotational semantics : PL의 semantics
. operational semantics : PL의 manual 같은 것
. Tyep theory
. consider these methods
. Lambda calculus(syntax, operational semantics)
. computability theory
. denotational semantics
. lambda calculus
. 가장 primitive하지만 모든 것이 표현가능, theoretical한 증명이 쉽다.
. formal system with three parts
. Notation for function expressions
. proff system for qeuations
. calculation rules called reduction
. additional topics in lambda calculus
. mathematical semantics
. type systems
. history
. original intention
. formal theory of substitution
. more successful for computablel functions
. substitution -> sybolic computation
. church/turing thesis
. influenced design of Lisp, ML, other languages
. See boost lambda library for C++ function objects
http://www.boost.org/libs/lambda/index.html
. important part of CS history and foundations
. Why study this now?
. Basic syntactic notions
. Free variables : global variable
. Bound variable : function argument
. functions
. declarations - 선언과 정의
. Calculation rule(매우 간단하다) - 단 1개의 symbolic substitution
. Symbolic evaluation useful for discussing programs
. Used in optimization (in-lining), macro expansion
. correct macro processing requires variable renaming
(Name conflict control)
. Illustrates some ideas about scope of binding
. Lisp originally departed from standard lambda calculus,
returned to the fold through Scheme, Common Lisp
. Lambda expression
. Symbolic computation, pattern matching을 잘해서 reduction
. BNF
. M :: = x (variables)
| M1 M2 (application), M1: function, M2: function의 actual parameter
| lx.M (abstraction), M:function body, x : M의 formal parameter
ex)
lx.x = identity function의 abstraction form
lx.(f (g x))
(lx.x)5
lx.x y == lx.(x y) : 최대한 길게 뒤로 가서 parsing
application, abstration 모두 되므로 ambiguous grammar
. ambiguous grammar
. if there si some string that it can generate in more than one way.
. http://en.wikipedia.org/wiki/Ambiguous_grammar
ex) CFG, A->A+A|A-A|a, is ambiquous.
. expression
x + y
. functions
lx.(x+y)
. application
(lx.(x+y))3 = 3+y
. Higher-order functions
. Given function f, return funcition f.f
. lf. lx.f(f x)
. How does this work?
(lf. lx. f(f x))(ly.y+1)
=> lx.(ly.y+1)((ly.y+1) x)
=> lx.(ly.y+1)(x+1)
=> lx.x+1+1
. Free and Bound Variables
. Bound variable is "placeholder" - 따라서 항상 rename 가능
. Variable x is bound in lx.(x+y)
. function lx.(x+y) is same function as lz.(z+y)
. Name of free(=unbound) variable does matter
. variable y is free in lx.(x+y)
. function lx.(x+y) is not same function as lx.(x+z)
. Occurrence
. y is freee and bound in lx((ly.y+2) x) + y
. scope
. bound variable : 이름 바꾸기 가능(local이므로)
. free variable : 이름 바꾸기 불가능(global이므로)
. lambda calulus는 static scope이다.
. alpha-reduction : varialbe rename
. beta-reduction : substitution
. (lx.e1)e2 -> [e2/x]e1
. e1의 free variable x를 e2로 clghks
. where substitution involves renaming as needed
(name conflict를 피하려면 renmaing해야 한다.)
. reduction
. apply basic computation rule to any subexpression
. repeat
. confluence : 맘대로 evaluation해도 결과는 같다.
. final result (if there is one) is uniquely determined
. Redex : reducable expression
. Substitution
. [N/x]x = N
. [N/x]y = y where y is different from x
. [N/x](M1 M2) = ([N/x]M1 [N/y]M2)
. [N/x](lx.M) = lx.M (헷갈리는 부분, 두 x의 scope가 다르다.)
. [N/x](ly.M) = ly.([N/x]M) where y is not free in N
(free y를 bound y로 만들면 안된다.)
ex.1)
(lx.(ly.(x+y+1))(y+1)
=> (lx.(lz.(x+z+1))(y+1) (rename을 먼저해야 한다.)
=> lz.(y+1+z+1)
!= ly.(y+1+y+1) (free y를 bound y로 만들면 안된다.)
ex.2)
(lx.(lx.x+1))5
=> [5/x](lx.(x+1))
=> lx.(x+1)
혹은
=> (lx.(ly.y+1))5
=> ly.y+1
!= 5+1
. FV(x) = {x}
. FV(M N) = FV(M) U FV(N)
. FV(lx.M) = FV(M) - x
. function with several arguments(argument가 여러개 일때)
. curried form
. http://en.wikipedia.org/wiki/Currying
. ex) f(g,x) = g(x)
. f curry = lg.lx.g x
. f curry g x = g x = g(x) = f(g,x)
. pair notation = curried form
. f=l(g,x).g x
. Recursion and fixed Point
. factorial function
function f(n) { if n = 0 then 1 else n*f(n-1)};
-> 이것은 f의 definition이 아니라 equation이다.
. Equation with recursive definition
f(n) = if n = 0 then 1 else n * f(n-1)
. Get the solution f with above equation, Let
G = lf.ln.if n=0 then 1 else n*f(n-1)
. f is the fixed point of function G, ex f = G(f)
. Fiexed point operator = Y-combinator, domain theory로 증명가능
. http://en.wikipedia.org/wiki/Fixed_point_operator
. fixed point combinator allow the definition of anonymous recursive functions
. a fixed point operator Y
Y = lf.(lx.f(x x))(lx.f(x x))
. Show that Yf = f(Yf)
Yf = (lf.(lx.f(x x))(lx.f(x x)))f
=> (lx.f(x x))(lx.f(x x))
=> f((lx.f(x x)) (lx.f(x x)))
=> f(Yf)
. original motivation for topic
. denotational semantics를 배우는 이유
. Precision
. Use mathematics instead of english
. avoid details of specific machines
. Aim to capture "pure meaning" apart from implementation details
. Basis for program analysis
. Justify program proof methods
. soundness of type system, control flow analysis
. Proof of compiler correctness
. Language comparisons
. Why study this?
. Look at programs in a different way
. Program analysis
. Initialize befor use
. Introduce historical debate: functional vs imperative programming
. program expressiveness : what does this mean?
. theory vs practice : we don't have a doog theoretical understanding of programming language "usefulness"
. Basic Principle of denotational semantics
. compositionality
. The meaning of a compound program must be defined from the meanings of its parts (not the syntax of its parts)
. Exmaples
. P; Q
composition of two functions, state -> state
. letrec f(x) = e1 in e2
meaning of e2 where f denotes function
. Trivial example : binary numbers
. syntax
b ::= 0|1
n ::= b|nb (binary number)
e ::= n|e+e (e: starting symbol)
. semantics value fuction E : exp -> numbers
E[[0]] = 0
E[[1]] = 1
E[[nb]] = 2*E[[n]] + E[[b]]
E[[e1+e2]] = E[[e1]] + E[[e2]]
E : syntatic element를 mathematical(semantic) entity로 mapping
무한한 mapping을 유한하게 describe해보자.(structural induction)
. Second ex : expressions
. syntax
b ::= 0|1|...|9
n ::= b|nb (decimal number)
e ::= s|n|e+e (e: starting symbol)
. semantics
value E : exp -> numbers
state s : vars -> numbers
E[[x]]s = s(x) : variable namve -> value
E[[0]]s = 0
E[[1]]s = 1
E[[nb]]s = 2*E[[n]]s + E[[b]]s
E[[e1+e2]]s = E[[e1]]s + E[[e2]]s
E[[x]]s -> E[[x,s]] : argument가 2개로 늘어난 셈
. x : variable - 어떤 state function s가 name x를 받으면 value를 return
E[[x+z]]는 syntactic element뿐만 아니라 state에 따라 의미가 달라진다.
S0 = {x->4, y->7, z->8}
E[[x+4+y]]S0
= E[[e1+e2]]S0
= E[[e1]]S0 + E[[e2]]S0
= E[[e3+e4]]S0 + E[[e2]]S0
= E[[e3]]S0 + E[[e4]]S0 + E[[e2]]S0
= E[[x]]S0 + E[[4]]S0 + E[[y]]S0
= 4 + 7 + 4
. Semantics of imperative programs
. Syntax
. P ::= x:=3 | if B then P else P | P; P | while B do P
. Semantics
. C : Programs -> (State -> State)
. State = Variables -> Values
would be locations -> values if we wanted to model aliasing
. C: P의 semantic function
. Program : Initial state -> final state
. C[[P]]S0 = S1
. Semantics of Assignment
. C[[x:=e]]
is a function states -> states
. C[[x:=e]]s = s'
where s':variables->values is identical to s except
s'(x) = E[[e]]s gives the value of e in state s
. Semantics of Conditional
C[[if B then P else Q]]
C[[if B then P else Q]]s
= C[[P]]s if E[[B]]s is true
= C[[Q]]s if E[[B]]s is false
. Semantics of iteration
C[[while B do P]]
is a function states -> states
C[[while B do P]] = the function f such that
f(s) = s if E[[B]] is false
f(s) = f(C[[P]](s)) if E[[B]] is true
f is recursively defined(사실 define이 아니라 equation)
C[[P1;P2]]S0 = C[[P2]](C[[P1]]S0)
C[[P1]]S0 = S1
C[[P2]]S1 = S2
. Perspective
. Denotational semantics
. Assign mathematical meanings to programs in a
structured, principled way
. Imperative programs define mathematical functions
. Can write semantics using lambda calculus, extended with operators like
modify:(state x var x value) -> state
. Impact
. Influential theory
. Application via
abstraction interpretation, type theory
C[[x:=e]]S0 = S1
modify(S0, x, e) = S1
. What is a functional language?
. No side effects = pure(LISP)
. OK, we have side effects, but we also have higher-order functions
. No-side-effects language test
= Referential transparency
. within the scope of specific declarations of x1, x2, ..., xn
all occurrences of an expression e containing only
variables x1, x2, ..., xn must have the same value.
. Example languages
. Pure Lisp
atom, eq, car, cdr, cons, lambda, define
. Impure Lisp: rplaca, rplacd
. Common procedural languages are not functional
. Pascal, C, Ada, C++, Java, Modula
. Backus' Turing Award
. John Backus was designer of Fortran, BNF, etc.
. Turing Award in 1977
. Turing Award Lecture
. functional prog better than imperative programming
. Easier to reason about functional programs
. More efficient due to parallelism
. Algebraic laws
. reason about programs
. Optimizing compilers(translate가 쉽다.)
. Reasoning about programs
. To prove a program correct
. Must consider everything a program depends on
. In functional programs
. dependence on any data structure is explicit
. therefore
. easier to reason about function programs
. Do you believe this?
. This thesis must be tested in practice
. Many who prove properties of programs believe this
. Not many people really prove their code correct
. Haskell Quicksort
. Very succinct program(5줄 밖에 안된다.)
. This is the whole thing
. No assignment - just write expression for sorted list
. No array indices, no pointers, no memory management
. Compare : C quicksort 14줄
. Interesting case study
. Abstraction이 잘 될수록 high-level이고 사람을 efficient하게 만든다.
. Haskell이 가장 짧고 C++이 가장 길다.
. Relational Lisp이 가장 시간이 적게 걸렸다.
. Disadvantages of functional prog
. Assign이 없으므로 copy를 너무 자주한다.
. Von Neumann bottleneck = stored program
. Von numann - data의 절반이 instruction
. Mathematician responsible for idea of stored program
. Von numann Bottleneck
. Backus' term for limitation in CPU-memory transfer
. Related to sequentiality of imperative langauges
. Eliminating VN bottleneck
. No side effects
. Evaluate subexpressions independently => parallel하게 계산가능
. Does this work in practice? good idea but
. Too much parallelism
. Little help in allocation of processors to processes
. Scheduling overhead가 크다.
. communication overhead가 크다.
. David shaw promised to build the non-Von
. Effective, easy concurrency is a hard problem
. Functional vs Imperative programs
. Denotational semantics shows
. Every imperative program can be written as a
functional program, using a data structure to
represent machine states
. This is a theoretical result
. I guess "theoretical" means "it's really true"
. What are the practical implications?
. Can we use functional programming languages for practical application?
compilers, graphical user interfaces, network routers.
=> I/O, meta circular evaluation이 중요
. Summary
. parsing
The real program is the disambiquated parse tree
. lambda calculus
Notation for functions free and bound variables
Calculate using substitution, rename to avoid capture
. Pure functional program
May be easier to reason about
Parallelism : eay to find, too much of a good thing
. computability
Halting problem makes some error reporting, optimization impossible
. Formal languages
. language - a set of strings
L = {if, while, for, then, else ...}
. string - a finite sequence of symbols from some alphabet
. alphabet - a finite set of symbols
{a,b,c, ..., z}, {0, 1, ... , 9}
. A generator generates members of the set
. A recognizer recognizes members of the set
. Regular expressions
. symbols, empty string, alternation, concatenation, repetition
. Context-free Grammars
. S : statement
. E : expression
. Language : 우리가 작성한 모든 program의 set
. S -> S;S
. S -> id:=E
. E -> num
. E -> E + E
. terminals - tokens : 최종적으로 program에 나오는 것
. Nonterminal - symbols used in grammar (S, E)
. Start symbol - a nonterminal
. Derivation - generating phrases of language by replacing
nonterminals with r.h.s of productions
. 무한개를 생성할 수 있는 이유 : S와 E가 자기자신으로 정의된다.
. Nested structures require recursion
. BNF(Backus Naur Form)
. 1958년 algol의 syntax define시 사용
. CFG define productions
. Each production consists of a
. left-hand side non-terminal, and a
. right-hand side sequence of terminals and non-terminals
. A grammar also defines a non-terminal start symbol
. BNF ex
. Notation typically used for defining languages
<statement> ::= <statement> ; <statement>
<statement> ::= <identifier> := <expression>
<expression> ::= <number>
<expression> ::= <expression> + <expression>
<statement>, <expression>, <identifier>, <number> : non-terminal(< >로 기술)
;, :=, + : terminal
. Extended BNF
. optioanl syntax : [] : 0 or 1번
. Repeated syntax : {} : RE의 *와 비슷
. syntax diagrams
. Derivations
. Begin with start symbol, replace any nonterminal with the rhs of some production for that nonterminal
. leftmost - pcik leftmost nonterninal to expand
. rightmost - pcik rightmost nonterninal to expand
. ex) leftmost
S -> id:= E
-> id:= E + E
-> id:= E + E + E
-> id:= num + E + E
-> id:= num + num + E
-> id:= num + num + num
. Derivation algorithm
1. Begin with the start symbol nonterminal
2. Replace the non-terminal with the right-hand side of a matching production
3. Choose a non-terminal from the resulting sequence
4. repeat steps 2 and 3 until no non-terminals remain
. This algorithm includes two choices
step2. choice of matching production
step3. choice of non-terminal
. Parse tree
. a tree representaion of a derivation
. connect a symbol to the one from which it was derived
. leaf node = treminal
. left most derivation이면 left child들이 더 풍성하다.
. Ambiquity - syntax에는 문제가 없으나 semantics(evaluation)에서는 문제가 있다.
. a grammar is ambiguous if a phase has two different derivations.
. equivalently, two parse trees
. example (rightmost derivation)
. Problems with ambiquity(= grammar의 므ㅠㅑ벼ㅑ쇼)
. If the shape of the parse tree conveys meaning, then ambiquity in a language's grammar(syntax) translates into ambiquity in the language's meaning (semantics)
. Problematic grammar
<statement> :: <condition> | <assignment> | begin { <statement> } end
해결책
1. grammar를 다시 쓴다.(Unambiquous grammars) (C언어의 방법 - 가까운 곳에 match)
2. endif를 도입
3. then다음에 if가 올 수 없고 반드시 begin-end가 와야 한다.
. Denotational semantics
http://en.wikipedia.org/wiki/Denotational_semantics
formalizeing the semantics of system by constructing mathematical objects with express the semantics of these systems.
. Axiomatic semantics
http://en.wikipedia.org/wiki/Axiomatic_semantics
mathematical logic에 따라 program의 correctness를 따짐.
. Operational semantics
http://en.wikipedia.org/wiki/Operational_semantics
give meaning to computer programs in a mathematically rigorous ywa
. semantics of initialize-before-use
. single assignment
. can bind a value to a name at most once.
. oftern function languages.
. Erlang, Haskell, SISAL