Closure Properties of Context-Free Languages

Consider Context-Free Languages (CFLs) L_1 and L_2, and assume that L_1 = L(G_1) for some Context-Free Grammar (CFG) G_1 = (V_1, \Sigma, R_1, S_1) and L_2 = L(G_2) for some CFG G_2 = (V_2, \Sigma, R_2, S_2). Note that both the CFGs share the same alphabet \Sigma

Context-Free Languages are closed under the following operations

Union

Consider the following grammar

    \begin{equation*} G' = (V_1 \cup V_2 \cup \{S\}, \Sigma, R_1 \cup R_2 \cup \{S \rightarrow S_1 | S_2\}, S) \end{equation*}

Then L(G') = L(G_1) \cup L(G_2), and therefore CFLs are closed under union. In essence, we are adding a new start variable S and the new start variable can go to either of S_1 or S_2.

Concatenation

Consider the following grammar

    \begin{equation*} G' = (V_1 \cup V_2 \cup \{S\}, \Sigma, R_1 \cup R_2 \cup \{S \rightarrow S_1S_2\}, S) \end{equation*}

Then L(G') = L(G_1)L(G_2), and therefore CFLs are closed under concatenation. In essence, we are adding a new start variable S and the new start variable goes to S_1S_2.

Kleene Star

Consider the following grammar

    \begin{equation*} G' = (V_1 \cup \{T\}, \Sigma, R_1 \cup \{T \rightarrow S_1T\}, T) \end{equation*}

Then L(G') = L(G_1)^*, and therefore CFLs are closed under star. In essence, we are adding a new start variable and the new start variable T goes to S_1T.

For the following operations, consider a CFL L, and assume that L = L(G) for some CFG G = (V, \Sigma, R, S) in Chomsky Normal Form.

Reversal

When w = w_1 \cdot \cdot \cdot w_n \in \Sigma^*, rev(w) = w_n \cdot \cdot \cdot w_1; and rev(L) = \{rev(w) : w \in L\}.

For example, let L=\{0^n1^n : n \geq 0\}. Then rev(L)=\{1^n0^n : n \geq 0\}.

A derivation of a string in this grammar can be visualized as a binary parse tree, with the left-right traversal of leaves of the tree generating the string w that is derived. A right-left traversal of the leaves of the same tree will generate rev(w). Thus a mirror image of the parse tree will generate the reverse of a string. So the following grammar will generate all strings in rev(L).

Form a CFG G' = (V, \Sigma, R' , S) where R' is determined as follows

  1. S \rightarrow \epsilon is in R' if and only if it is in R.
  2. For A \in V and a \in \Sigma, A \rightarrow a is in R' if and only if it is in R.
  3. For A, B, C \in V , A \rightarrow CB is in R' if and only if A \rightarrow BC is in R.

Then L(G') = rev(L), so rev(L) is context-free.

Prefix

When w = w_1 \cdot \cdot \cdot w_n \in \Sigma^*, pref(w)=\{w_1 \cdot \cdot \cdot w_i :0 \leq i \leq n\}; and pref(L)= \bigcup_{w \in L} pref(w).

For example, let L=\{0^n1^n : n \geq 0\}. Then pref(L)=\{0^n1^m : n \geq m \geq 0\}.

Suppose that G has no useless variables, and form a CFG G' = (V \cup V' , \Sigma, R \cup P, S) where V' = \{A' : A \in V\} and P is determined as follows

  1. S \rightarrow S' is in P.
  2. For A \in V and a \in \Sigma, A' \rightarrow a | \epsilon is in P if and only if A \rightarrow a is in R.
  3. For A, B, C \in V , A' \rightarrow B' | BC' is in P if and only if it A \rightarrow BC in R.

Then L(G') = pref(L), so pref(L) is context-free.

To explain this, the original variables generate what they did before. The primed variables generate prefixes of what the original variables generated. We need to keep both because if we had A \rightarrow BC, B derives x, and C derives y so that A derives w = xy, there may be strings in pref(x)pref(y) that are not in pref(w). In fact pref(w) = pref(xy) = pref(x) \cup x \cdot pref(y).

Subsequence

When w = w_1 \cdot \cdot \cdot w_n \in \Sigma^, subseq(w)=\{w_{i_1} \cdot \cdot \cdot w_{i_l} :0 \leq l \leq n and 1 \leq i_1 < i_2 < \cdot \cdot \cdot < i_l \leq n\}; and subseq(L) = \bigcup_{w \in L} subseq(w).

For example, let L=\{0^n1^n : n \geq 0\}. Then subseq(L)=0^*1^*.

A derivation using this CNF can be visualized as a binary parse tree, with the left-right traversal of leaves of the tree generating the string w = w_1 \cdot \cdot \cdot w_n that is derived. In constructing a subsequence, each w_i may be present or absent. Thus, any number of the n terminals in this string can be replaced by \epsilon to get a subsequence (when the leaves of the parse-tree are traversed left-right) of w. In fact, all possible combinations of doing this will be 2^n. So the following grammar will generate all strings in subseq(L).

Form a CFG G' = (V, \Sigma, R \cup X, S) where X is determined as follows

  1. For A \in V, A \rightarrow \epsilon is in X if and only A \rightarrow a is in R for some a \in \Sigma \cup \{\epsilon\}.

Then L(G') = subseq(L), so subseq(L) is context-free.

Leave a Reply

Your email address will not be published. Required fields are marked *