Context-Free Grammars and Parse Trees

The derivation of a string from a Context-Free Grammar (CFG) can be visualized as a parse tree. The root of the parse tree is always the start variable. Each branch corresponds to a rule in the CFG.

Derivations and Parse Trees

Consider the following grammar

    \begin{equation*} S \rightarrow SS | (S) | \epsilon \end{equation*}

This grammar can produce the string ()() using the following derivation

    \begin{equation*} $S \implies SS \implies (S)S \implies ()S \implies ()(S) \implies ()()$ \end{equation*}

An equivalent parse-tree for the above derivation is

Rendered by QuickLaTeX.com

The same string ()() can have another derivation

    \begin{equation*} $S \implies SS \implies S(S) \implies (S)(S) \implies ()(S) \implies ()()$ \end{equation*}

However, this derivation generates the same parse-tree as above.

Thus, two different derivations may point to the same parse-tree.

Ambiguity

The above grammar can have two different parse trees below to generate ().

Rendered by QuickLaTeX.com

Rendered by QuickLaTeX.com

A left-most derivation is a derivation in which the left-most variable is replaced in each step.

Every left-most derivation yields a parse-tree. Every parse-tree yields a left-most derivation. These are in one-to-one correspondence.

A CFG is ambiguous if some string in its language has two or more parse-trees (left-most derivations).

Ambiguity is a property of the grammar, not of the language. Some languages are inherently ambiguous. It is impossible to define an unambiguous grammar for these languages.

Chomsky Normal Form

A CFG in Chomsky Normal Form (CNF) will generate a parse-tree which is a binary tree. This is because each rule is of the form A \rightarrow BC or A \rightarrow a, where A, B and C are variables and a is a terminal.

Each internal node of the binary tree will be a variable. The leaves of the binary tree will be a terminal.

Consider a parse-tree to generate a string of length n \geq 1

This parse-tree will have n leaves – one for each symbol in the string. These correspond to A \rightarrow a type rules. A binary tree with n leaves will have n-1 internal nodes. These correspond to A \rightarrow BC type rules.

Thus, in total, there will be 2n - 1 application of rules to generate a string of length n.

If however, n = 0, the string generated is \epsilon. In this case, there will be only one application of the rule S \rightarrow \epsilon

Leave a Reply

Your email address will not be published. Required fields are marked *