Visualizing the Pumping Lemma for Regular Languages - Splitting into Substrings

All finite languages are regular. However, languages which accept infinite strings may or may not be regular. The pumping lemma describes a property of all regular languages. The intuition behind the lemma is that a sufficiently long string will have a middle section that can be pumped – repeated any number of times.

The pumping lemma holds true for all strings in a language that is regular. If the lemma does not hold true for some string that is in the language, then the language is not regular. Note that, the pumping lemma cannot be used to prove whether a language is regular. To prove a language is regular, one must define an NFA/DFA or produce a regular expression.

Formal Definition

Let $L$ be a regular language. There exists an integer $p \geq 1$ (called the pumping length), such that for every string $w$ in $L$ of length at least $p$ , $w$ can be written as $w = xyz$ , satisfying the following conditions

$\begin{align*} \vert xy \vert & \leq p \\ \vert y \vert & \geq 1 \\ xy^iz & \in L \quad \forall i \geq 0 \end{align*}$

In essence, a string can be broken down into three parts – $x, y, z$ . The middle substring $y$ can be pumped up indefinitely and yet the string remains in the language.

Visualizing the Substrings

Consider the following string of length greater than $p$ . Each individual symbol is shown as a square block.

$\vert xy \vert \leq p$

This condition defines the span of the $xy$ region. The remaining span makes up $z$ . Notice that $xy$ need not take up all $p$ symbols. However, $xy$ must start from the start of the string. The two spans for $xy$ and $z$ are shown in different colors below. Notice that for any possible split of $xy$ and $z$ the pumping lemma holds true.

$\vert y \vert \geq 1$

This condition defines the span of the $y$ region with the $xy$ substring. For one of the above $xy$ and $z$ splits, all possible $x$ and $y$ substrings are shown below.

The last case is when $x$ is the empty string, $\epsilon$ , and $y$ takes up all the symbols in $xy$ . Notice that for any possible split of $x$ and $y$ the pumping lemma holds true.

$xy^iz \in L \quad \forall i \geq 0$

Consider all possible splits of $w$ into $x$ , $y$ and $z$ by following the above constraints. No matter how $x$ , $y$ and $z$ are chosen, while pumping the $y$ part of the string, the string remains in the language.

$\begin{align*} xz & \in L \\ xyz & \in L \\ xyyz & \in L \\ xyyyz & \in L \\ \vdots \\ xy^iz & \in L \quad \forall i \geq 0 \\ \end{align*}$

Using the Pumping Lemma to prove a Language is not Regular

Here is a general blueprint to use the Pumping Lemma to prove a Language is not Regular. An example problem is solved here.

Choosing a string $w$

Pick a string $w$ that belongs to the language. $w \in L$
$w$ must be expressed in terms of $p$ .
The length of $w$ should be at least $p$ . $|w| \geq p$

Contradicting the Pumping Lemma

Consider all possible splits of $w$ into $x$ , $y$ and $z$ , such that $|xy| \leq p$ .
Consider all possible splits of $xy$ into $x$ and $y$ , such that $|y| \geq 1$ .
For all these cases, prove that, upon pumping, the string $xy^iz$ goes out of the language for some $i \geq 0$ .

Since, all the conditions of the Pumping Lemma are not satisfied, the Language is not Regular.

The Beard Sage

Visualizing the Pumping Lemma for Regular Languages – Splitting into Substrings