2006년 4월 22일 토요일

[PL]Ch.4 - Fundamentals

. Syntax and semantics of programs
  . syntax : the symbols used to write a program
  . semantics : the actions that occur when a program is executed
  . programming language implementation
  . syntax -> semantics
  . tranform program syntax into machine instructions
     that can be executed to cause the correct sequence
     of actions to occur

. Interpreter vs Compiler
  . Interpreter : 느리다. 컴파일이 따로 필요없어서 간편하다.
  source 코드를 바로 읽는 다.
  . Compiler : 여러 translation을 해서 복잡하지만 빠르다.
  target program으로 먼저 번역한다.

. Typical compiler
  1. source program

  2. lexical analyzer(lexer, scanner)
    http://en.wikipedia.org/wiki/Lexical_analyzer
    Regular expression, finite automata, lex
    keyword, variable 등을 나눔(lexical token을 구함)
    쓸데 없는 blank도 제거

  3. syntax analyzer(parsing, parser)
    http://en.wikipedia.org/wiki/Syntax_analysis
    http://en.wikipedia.org/wiki/Yacc
    Context free grammar, BNF, yacc(yet another compiler compiler)
    parse tree를 만듬.
    terminal node를 모으면 program이 된다.
    nonterminal node도 이ㅆ음.

  4. semantic analyzer - PL에서 다룸
    Abstract syntax tree(AST) : parse tree에서 중요한 것만 남김
    augmented parse tree : type와 declare된 장소에 관한 정보를 더함.(annotation)
    type system
    Annotation : symbol table을 이용해서 각 variable이 어디에 쓰이는 지 type등을 적음.

  5. intermediate code generator - OS, architecture에서 다룸
    function_start(), computer_expr(), assing() ...
    AST를 tree traverse하여 depth first search하면 intermediate code가 된다.
    Virtual Machine, 3-address instructions

  6. code optimizer - OS, architecture에서 다룸
    Common subexpression elimination : 한번만 계산하고 저장해뒀다가 사용
    copy propagation : 같은 value 한번만 assign
    dead-code elimination : 실행 안되는 코드 제거
    loop optimizations : loop 밖으로 최대한 꺼냄
    function in-lining : 함수 호출을 줄임, 프로그램 크기가 증가함.

  7. code generator
    Machine architecture
    Register allocation
    Optimization
    reuse registers efficiently

  8. target program - machine code

. Brief look at syntax
  . Grammar
  e ::= n | e+e | e-e
  n ::= d | nd
  d ::= 0|1|2|3|4|5|6|7|8|9
  . expressions in language
  e -> e-e -> e-e+e -> n-n+n -> nd-d+d -> dd-d+d -> .. -> 27-4+3

. Derivation represented by tree
  Tree shows parenthesization of expression

. Parsing
  . Given expression find tree
  . Ambiquity
  . Expression 27-4+3 can be parsed two ways
  . Problem : 27-(4+3) != (27-4)+3
  . Ways to resolve ambiquity
  . Precedence(다른 operator일때) - high or low
     . Group * befor +
     . Parse 3*4+2 as (3*4)+2
  . associativity(equal precedence일때) - left or right
     . parenthesize operators of equal precedence to left(or right)
     . (3-4)+5 : left-associative
     . 3-(4+5) : right-associative

. Theoretical foundatinos
  . many foundational system
  . computability theory
  . Program logics
  . Lambda calculus
  . denotational semantics : PL의 semantics
  . operational semantics : PL의 manual 같은 것
  . Tyep theory

  . consider these methods
  . Lambda calculus(syntax, operational semantics)
  . computability theory
  . denotational semantics


. lambda calculus
  . 가장 primitive하지만 모든 것이 표현가능, theoretical한 증명이 쉽다.
  . formal system with three parts
  . Notation for function expressions
  . proff system for qeuations
  . calculation rules called reduction
  . additional topics in lambda calculus
  . mathematical semantics
  . type systems

. history
  . original intention
  . formal theory of substitution
  . more successful for computablel functions
  . substitution -> sybolic computation
  . church/turing thesis
  . influenced design of Lisp, ML, other languages
  . See boost lambda library for C++ function objects
     http://www.boost.org/libs/lambda/index.html
  . important part of CS history and foundations

. Why study this now?
  . Basic syntactic notions
  . Free variables : global variable
  . Bound variable : function argument
  . functions
  . declarations - 선언과 정의

  . Calculation rule(매우 간단하다) - 단 1개의 symbolic substitution
  . Symbolic evaluation useful for discussing programs
  . Used in optimization (in-lining), macro expansion
     . correct macro processing requires variable renaming
       (Name conflict control)
  . Illustrates some ideas about scope of binding
     . Lisp originally departed from standard lambda calculus,
       returned to the fold through Scheme, Common Lisp

. Lambda expression
  . Symbolic computation, pattern matching을 잘해서 reduction

. BNF
  . M :: = x (variables)
     | M1 M2 (application), M1: function, M2: function의 actual parameter
     | lx.M (abstraction), M:function body, x : M의 formal parameter
ex)
lx.x = identity function의 abstraction form
lx.(f (g x))
(lx.x)5
lx.x y == lx.(x y) : 최대한 길게 뒤로 가서 parsing
application, abstration 모두 되므로 ambiguous grammar

. ambiguous grammar
  . if there si some string that it can generate in more than one way.
  . http://en.wikipedia.org/wiki/Ambiguous_grammar
  ex) CFG, A->A+A|A-A|a, is ambiquous.

. expression
  x + y

. functions
  lx.(x+y)

. application
  (lx.(x+y))3 = 3+y

. Higher-order functions
  . Given function f, return funcition f.f
  . lf. lx.f(f x)
  . How does this work?
     (lf. lx. f(f x))(ly.y+1)
    => lx.(ly.y+1)((ly.y+1) x)
    => lx.(ly.y+1)(x+1)
    => lx.x+1+1

. Free and Bound Variables
  . Bound variable is "placeholder" - 따라서 항상 rename 가능
  . Variable x is bound in lx.(x+y)
  . function lx.(x+y) is same function as lz.(z+y)
  . Name of free(=unbound) variable does matter
  . variable y is free in lx.(x+y)
  . function lx.(x+y) is not same function as lx.(x+z)
  . Occurrence
  . y is freee and bound in lx((ly.y+2) x) + y
  . scope
  . bound variable : 이름 바꾸기 가능(local이므로)
  . free variable : 이름 바꾸기 불가능(global이므로)
  . lambda calulus는 static scope이다.

. alpha-reduction : varialbe rename

. beta-reduction : substitution
  . (lx.e1)e2 -> [e2/x]e1
  . e1의 free variable x를 e2로 clghks
  . where substitution involves renaming as needed
  (name conflict를 피하려면 renmaing해야 한다.)

. reduction
  . apply basic computation rule to any subexpression
  . repeat

. confluence : 맘대로 evaluation해도 결과는 같다.
  . final result (if there is one) is uniquely determined
. Redex : reducable expression

. Substitution
  . [N/x]x = N
  . [N/x]y = y where y is different from x
  . [N/x](M1 M2) = ([N/x]M1 [N/y]M2)
  . [N/x](lx.M) = lx.M (헷갈리는 부분, 두 x의 scope가 다르다.)
  . [N/x](ly.M) = ly.([N/x]M) where y is not free in N
  (free y를 bound y로 만들면 안된다.)
  ex.1)
  (lx.(ly.(x+y+1))(y+1)
  => (lx.(lz.(x+z+1))(y+1) (rename을 먼저해야 한다.)
  => lz.(y+1+z+1)
  != ly.(y+1+y+1) (free y를 bound y로 만들면 안된다.)

  ex.2)
  (lx.(lx.x+1))5
  => [5/x](lx.(x+1))
  => lx.(x+1)
  혹은
  => (lx.(ly.y+1))5
  => ly.y+1
  != 5+1

  . FV(x) = {x}
  . FV(M N) = FV(M) U FV(N)
  . FV(lx.M) = FV(M) - x

. function with several arguments(argument가 여러개 일때)
  . curried form
  . http://en.wikipedia.org/wiki/Currying
  . ex) f(g,x) = g(x)
  . f curry = lg.lx.g x
  . f curry g x = g x = g(x) = f(g,x)
  . pair notation = curried form
     . f=l(g,x).g x

. Recursion and fixed Point
  . factorial function
  function f(n) { if n = 0 then 1 else n*f(n-1)};
  -> 이것은 f의 definition이 아니라 equation이다.
  . Equation with recursive definition
  f(n) = if n = 0 then 1 else n * f(n-1)
  . Get the solution f with above equation, Let
  G = lf.ln.if n=0 then 1 else n*f(n-1)
  . f is the fixed point of function G, ex f = G(f)

. Fiexed point operator = Y-combinator, domain theory로 증명가능
  . http://en.wikipedia.org/wiki/Fixed_point_operator
  . fixed point combinator allow the definition of anonymous recursive functions
  . a fixed point operator Y
  Y = lf.(lx.f(x x))(lx.f(x x))
  . Show that Yf = f(Yf)
  Yf = (lf.(lx.f(x x))(lx.f(x x)))f
  => (lx.f(x x))(lx.f(x x))
  => f((lx.f(x x)) (lx.f(x x)))
  => f(Yf)

. original motivation for topic
  . denotational semantics를 배우는 이유
  . Precision
  . Use mathematics instead of english
  . avoid details of specific machines
  . Aim to capture "pure meaning" apart from implementation details
  . Basis for program analysis
  . Justify program proof methods
     . soundness of type system, control flow analysis
  . Proof of compiler correctness
  . Language comparisons

. Why study this?
  . Look at programs in a different way
  . Program analysis
  . Initialize befor use
  . Introduce historical debate: functional vs imperative programming
  . program expressiveness : what does this mean?
  . theory vs practice : we don't have a doog theoretical understanding of programming language "usefulness"

. Basic Principle of denotational semantics
  . compositionality
  . The meaning of a compound program must be defined from the meanings of its parts (not the syntax of its parts)
  . Exmaples
     . P; Q
       composition of two functions, state -> state
     . letrec f(x) = e1 in e2
       meaning of e2 where f denotes function

. Trivial example : binary numbers
  . syntax
  b ::= 0|1
  n ::= b|nb (binary number)
  e ::= n|e+e (e: starting symbol)

  . semantics value fuction E : exp -> numbers
  E[[0]] = 0
  E[[1]] = 1
  E[[nb]] = 2*E[[n]] + E[[b]]
  E[[e1+e2]] = E[[e1]] + E[[e2]]

  E : syntatic element를 mathematical(semantic) entity로 mapping
  무한한 mapping을 유한하게 describe해보자.(structural induction)

. Second ex : expressions
 
  . syntax
  b ::= 0|1|...|9
  n ::= b|nb (decimal number)
  e ::= s|n|e+e (e: starting symbol)

  . semantics
  value E : exp -> numbers
  state s : vars -> numbers

  E[[x]]s = s(x) : variable namve -> value
  E[[0]]s = 0
  E[[1]]s = 1
  E[[nb]]s = 2*E[[n]]s + E[[b]]s
  E[[e1+e2]]s = E[[e1]]s + E[[e2]]s

  E[[x]]s -> E[[x,s]] : argument가 2개로 늘어난 셈

  . x : variable - 어떤 state function s가 name x를 받으면 value를 return
  E[[x+z]]는 syntactic element뿐만 아니라 state에 따라 의미가 달라진다.
  S0 = {x->4, y->7, z->8}
  E[[x+4+y]]S0
  = E[[e1+e2]]S0
  = E[[e1]]S0 + E[[e2]]S0
  = E[[e3+e4]]S0 + E[[e2]]S0
  = E[[e3]]S0 + E[[e4]]S0 + E[[e2]]S0
  = E[[x]]S0 + E[[4]]S0 + E[[y]]S0
  = 4 + 7 + 4

. Semantics of imperative programs
  . Syntax
  . P ::= x:=3 | if B then P else P | P; P | while B do P
  . Semantics
  . C : Programs -> (State -> State)
  . State = Variables -> Values
     would be locations -> values if we wanted to model aliasing
. C: P의 semantic function
. Program : Initial state -> final state
. C[[P]]S0 = S1

. Semantics of Assignment
  . C[[x:=e]]
  is a function states -> states
  . C[[x:=e]]s = s'
  where s':variables->values is identical to s except
  s'(x) = E[[e]]s gives the value of e in state s

. Semantics of Conditional
  C[[if B then P else Q]]
  C[[if B then P else Q]]s
  = C[[P]]s if E[[B]]s is true
  = C[[Q]]s if E[[B]]s is false

. Semantics of iteration
  C[[while B do P]]
  is a function states -> states

  C[[while B do P]] = the function f such that
  f(s) = s if E[[B]] is false
  f(s) = f(C[[P]](s)) if E[[B]] is true

  f is recursively defined(사실 define이 아니라 equation)

  C[[P1;P2]]S0 = C[[P2]](C[[P1]]S0)
  C[[P1]]S0 = S1
  C[[P2]]S1 = S2

. Perspective
  . Denotational semantics
  . Assign mathematical meanings to programs in a
     structured, principled way
  . Imperative programs define mathematical functions
  . Can write semantics using lambda calculus, extended with operators like
     modify:(state x var x value) -> state
  . Impact
  . Influential theory
  . Application via
     abstraction interpretation, type theory
  C[[x:=e]]S0 = S1
  modify(S0, x, e) = S1

. What is a functional language?
  . No side effects = pure(LISP)
  . OK, we have side effects, but we also have higher-order functions

. No-side-effects language test
  = Referential transparency
  . within the scope of specific declarations of x1, x2, ..., xn
  all occurrences of an expression e containing only
  variables x1, x2, ..., xn must have the same value.
 
. Example languages
  . Pure Lisp
  atom, eq, car, cdr, cons, lambda, define
  . Impure Lisp: rplaca, rplacd
  . Common procedural languages are not functional
  . Pascal, C, Ada, C++, Java, Modula

. Backus' Turing Award
  . John Backus was designer of Fortran, BNF, etc.
  . Turing Award in 1977
  . Turing Award Lecture
  . functional prog better than imperative programming
  . Easier to reason about functional programs
  . More efficient due to parallelism
  . Algebraic laws
     . reason about programs
     . Optimizing compilers(translate가 쉽다.)

. Reasoning about programs
  . To prove a program correct
  . Must consider everything a program depends on
  . In functional programs
  . dependence on any data structure is explicit
  . therefore
  . easier to reason about function programs
  . Do you believe this?
  . This thesis must be tested in practice
  . Many who prove properties of programs believe this
  . Not many people really prove their code correct

. Haskell Quicksort
  . Very succinct program(5줄 밖에 안된다.)
  . This is the whole thing
  . No assignment - just write expression for sorted list
  . No array indices, no pointers, no memory management

. Compare : C quicksort 14줄

. Interesting case study
  . Abstraction이 잘 될수록 high-level이고 사람을 efficient하게 만든다.
  . Haskell이 가장 짧고 C++이 가장 길다.
  . Relational Lisp이 가장 시간이 적게 걸렸다.

. Disadvantages of functional prog
  . Assign이 없으므로 copy를 너무 자주한다.

. Von Neumann bottleneck = stored program
  . Von numann - data의 절반이 instruction
  . Mathematician responsible for idea of stored program
  . Von numann Bottleneck
  . Backus' term for limitation in CPU-memory transfer
  . Related to sequentiality of imperative langauges

. Eliminating VN bottleneck
  . No side effects
  . Evaluate subexpressions independently => parallel하게 계산가능
  . Does this work in practice? good idea but
  . Too much parallelism
  . Little help in allocation of processors to processes
  . Scheduling overhead가 크다.
  . communication overhead가 크다.
  . David shaw promised to build the non-Von
  . Effective, easy concurrency is a hard problem

. Functional vs Imperative programs
  . Denotational semantics shows
  . Every imperative program can be written as a
     functional program, using a data structure to
     represent machine states
  . This is a theoretical result
  . I guess "theoretical" means "it's really true"
  . What are the practical implications?
  . Can we use functional programming languages for practical application?
     compilers, graphical user interfaces, network routers.
     => I/O, meta circular evaluation이 중요

. Summary
  . parsing
  The real program is the disambiquated parse tree
  . lambda calculus
  Notation for functions free and bound variables
  Calculate using substitution, rename to avoid capture
  . Pure functional program
  May be easier to reason about
  Parallelism : eay to find, too much of a good thing
  . computability
  Halting problem makes some error reporting, optimization impossible

. Formal languages
  . language - a set of strings
  L = {if, while, for, then, else ...}
  . string - a finite sequence of symbols from some alphabet
  . alphabet - a finite set of symbols
  {a,b,c, ..., z}, {0, 1, ... , 9}
  . A generator generates members of the set
  . A recognizer recognizes members of the set

. Regular expressions
  . symbols, empty string, alternation, concatenation, repetition

. Context-free Grammars
  . S : statement
  . E : expression
  . Language : 우리가 작성한 모든 program의 set
  . S -> S;S
  . S -> id:=E
  . E -> num
  . E -> E + E
  . terminals - tokens :  최종적으로 program에 나오는 것
  . Nonterminal - symbols used in grammar (S, E)
  . Start symbol - a nonterminal
  . Derivation - generating phrases of language by replacing
  nonterminals with r.h.s of productions
  . 무한개를 생성할 수 있는 이유 : S와 E가 자기자신으로 정의된다.
  . Nested structures require recursion

. BNF(Backus Naur Form)
  . 1958년 algol의 syntax define시 사용
  . CFG define productions
  . Each production consists of a
  . left-hand side non-terminal, and a
  . right-hand side sequence of terminals and non-terminals
  . A grammar also defines a non-terminal start symbol

. BNF ex
  . Notation typically used for defining languages
  <statement> ::= <statement> ; <statement>
  <statement> ::= <identifier> := <expression>
  <expression> ::= <number>
  <expression> ::= <expression> + <expression>

  <statement>, <expression>, <identifier>, <number> : non-terminal(<   >로 기술)
  ;, :=, + : terminal

. Extended BNF
  . optioanl syntax : [] : 0 or 1번
  . Repeated syntax : {} : RE의 *와 비슷

. syntax diagrams

. Derivations
  . Begin with start symbol, replace any nonterminal with the rhs of some production for that nonterminal
  . leftmost - pcik leftmost nonterninal to expand
  . rightmost - pcik rightmost nonterninal to expand
  . ex) leftmost
  S -> id:= E
  -> id:= E + E
  -> id:= E + E + E
  -> id:= num + E + E
  -> id:= num + num + E
  -> id:= num + num + num

. Derivation algorithm
  1. Begin with the start symbol nonterminal
  2. Replace the non-terminal with the right-hand side of a matching production
  3. Choose a non-terminal from the resulting sequence
  4. repeat steps 2 and 3 until no non-terminals remain

. This algorithm includes two choices
  step2. choice of matching production
  step3. choice of non-terminal

. Parse tree
  . a tree representaion of a derivation
  . connect a symbol to the one from which it was derived
  . leaf node = treminal
  . left most derivation이면 left child들이 더 풍성하다.

. Ambiquity - syntax에는 문제가 없으나 semantics(evaluation)에서는 문제가 있다.
  . a grammar is ambiguous if a phase has two different derivations.
  . equivalently, two parse trees
  . example (rightmost derivation)

. Problems with ambiquity(= grammar의  므ㅠㅑ벼ㅑ쇼)
  . If the shape of the parse tree conveys meaning, then ambiquity in a language's grammar(syntax) translates into ambiquity in the language's meaning (semantics)
  . Problematic grammar
  <statement> :: <condition> | <assignment> | begin { <statement> } end

  해결책
  1. grammar를 다시 쓴다.(Unambiquous grammars) (C언어의 방법 - 가까운 곳에 match)
  2. endif를 도입
  3. then다음에 if가 올 수 없고 반드시 begin-end가 와야 한다.

. Denotational semantics
  http://en.wikipedia.org/wiki/Denotational_semantics
  formalizeing the semantics of system by constructing mathematical objects with express the semantics of these systems.

. Axiomatic semantics
  http://en.wikipedia.org/wiki/Axiomatic_semantics
  mathematical logic에 따라 program의 correctness를 따짐.

. Operational semantics
  http://en.wikipedia.org/wiki/Operational_semantics
  give meaning to computer programs in a mathematically rigorous ywa

. semantics of initialize-before-use

. single assignment
  . can bind a value to a name at most once.
  . oftern function languages.
  . Erlang, Haskell, SISAL

댓글 1개: