Cryptography 4: Constructing Secure Stream Ciphers from PRGs

Posted on: August 13, 2025 | Reading time: 12 minutes | Word count: 2490 words | Author: Dang Duong Minh Nhat

In previous posts, I introduced the concepts of perfectly secure encryption and semantically secure encryption. The problem with perfect security is that achieving it requires the use of very long keys. The notion of semantic security was introduced as a weaker concept that might allow us to build secure schemes using only relatively short keys; however, we have not yet constructed such a scheme. We will now study a type of scheme that can do just that: the stream cipher.

I recommend you read Part 3 and other blogs in the series first to understand the theories for this post. Okay, let’s get started.

Pseudo-random Generators

Recall the One-Time Pad: Here, the key, plaintext, and ciphertext are all $L$-bit strings. However, we want to use a much shorter key. The idea is to use an $\ell$-bit seed $s$ (where $\ell$ is much smaller than $L$) as the encryption key, and then “stretch” this seed into a long $L$-bit string to use as a mask for the plaintext (and to unmask it during decryption).

This string $s$ is stretched by an efficient, deterministic algorithm $G$ that maps $\ell$-bit strings to $L$-bit strings. Thus, the key space for this modified version of the One-Time Pad is $\left\lbrace 0,1\right\rbrace^\ell$, while the plaintext and ciphertext spaces are $\left\lbrace 0,1\right\rbrace^L$.

For $s \in \left\lbrace 0,1\right\rbrace^\ell$ and $m, c \in \left\lbrace 0,1\right\rbrace^L$, we define:

$E(s, m) := G(s) \oplus m \quad \text{and} \quad D(s, c) := G(s) \oplus c.$

This modified version of the One-Time Pad is called a stream cipher, and the function $G$ is called a pseudo-random generator (PRG).

If $\ell < L$, then by Shannon’s Theorem, this stream cipher cannot achieve perfect security; however, if $G$ satisfies an appropriate security property, this scheme can achieve semantic security.

Suppose $s$ is a random $\ell$-bit string and $r$ is a random $L$-bit string. Intuitively, if an attacker cannot efficiently distinguish $G(s)$ from $r$, then they should also be unable to distinguish this stream cipher from the One-Time Pad; furthermore, since the One-Time Pad is semantically secure, this stream cipher should be as well. To make this argument rigorous, we need to formalize the notion that “an attacker cannot efficiently distinguish $G(s)$ from $r$.”

An algorithm used to distinguish a pseudo-random string $G(s)$ from a truly random string $r$ is called a statistical test. It takes a string as input and outputs either $0$ or $1$.

The test is called effective if the probability that it outputs $1$ on a pseudo-random input is significantly different from the probability that it outputs $1$ on a truly random input.

Even a difference of about $1%$ is considered significant; indeed, with just a $1%$ difference, if we obtain a few hundred independent samples (all pseudo-random or all truly random), we can infer with high confidence which type of string we are dealing with. However, if the difference is non-zero but negligible, such as $2^{-100}$, it is not useful.

How to design an effective statistical test:

A basic approach is: given an $L$-bit string, compute some statistic, and then check if this statistic differs significantly from its expected value if the string were truly random.

Example: A simple and easy-to-compute statistic is the number of $1$s in the string, let’s call it $k$.

For a truly random string, we expect $k \approx L/2$.
If the PRG $G$ has a bias towards the bit $0$ or $1$, we can detect this with a statistical test that, for instance:

$$\text{outputs } 1 \ \text{if} \ |k - 0.5L| < 0.01L, \ \text{otherwise outputs } 0.$$

This test would be quite effective if $G$ indeed has a significant bias.

The above test can be enhanced by considering not just individual bits but also pairs of bits.

Divide the $L$-bit string into approximately $L/2$ pairs of bits.
Count the number $k_{00}$ of 00 pairs, the number $k_{01}$ of 01 pairs, the number $k_{10}$ of 10 pairs, and the number $k_{11}$ of 11 pairs.
For a truly random string, each expected value would be around:

$$L/2 \cdot \frac14 = L/8.$$

A natural test would be to check if the distance between each of these values and $L/8$ is smaller than a certain threshold.

Or we could sum the squares of these distances and check if that sum is less than a given threshold – this is the classic $\chi^2$ (chi-squared) test in statistics.

Clearly, this idea can be extended from pairs of bits to any $n$-bit block.

There are many other simple statistics that can be checked. However, such simple tests often do not exploit the deeper mathematical properties of the algorithm $G$ that a sophisticated attacker might leverage to design a specialized statistical test against $G$.

For example: There are PRGs for which the simple tests in the previous two paragraphs are completely ineffective, yet they are entirely predictable if enough output bits are known; that is, given a sufficiently long prefix of $G(s)$, an attacker can compute all the remaining bits of $G(s)$, or even compute the seed $s$.

Thus, the security definition of a PRG will be formalized with the idea that no efficient (and efficiently computable) statistical test exists.

Definition of a Pseudo-random Generator

A pseudo-random generator (PRG) is an efficient, deterministic algorithm $G$ that, when given a seed $s$ as input, computes an output $r$.

The seed $s$ belongs to a finite seed space $\mathcal{S}$, and the output $r$ belongs to a finite output space $\mathcal{R}$. Typically, $\mathcal{S}$ and $\mathcal{R}$ are sets of bit strings of a fixed length (for example, in the discussion above, we had $\mathcal{S} = \left\lbrace 0,1\right\rbrace^\ell$ and $\mathcal{R} = \left\lbrace 0,1\right\rbrace^L$). We say that $G$ is a PRG defined on $(\mathcal{S}, \mathcal{R})$.

The definition of security for a PRG aims to capture the intuition that if $s$ is chosen randomly from $\mathcal{S}$ and $r$ is chosen randomly from $\mathcal{R}$, then no efficient adversary can reliably distinguish between $G(s)$ and $r$: these two cases are computationally indistinguishable.

This definition is described in terms of an attack game.

The PRG Attack Game. For a PRG $G$ defined on $(\mathcal{S}, \mathcal{R})$ and an adversary $\mathcal{A}$, we define two experiments: Experiment 0 and Experiment 1. For $b \in \left\lbrace 0, 1\right\rbrace$, we define:

Experiment b: The challenger computes $r \in \mathcal{R}$ and sends $r$ to the adversary.

If $b = 0$: $s \xleftarrow{R} \mathcal{S}$, $r \leftarrow G(s)$
If $b = 1$: $r \xleftarrow{R} \mathcal{R}$

The Adversary: Upon receiving $r$, the adversary computes and outputs a bit $\hat{b} \in \left\lbrace 0, 1\right\rbrace$.

Let $W_b$ be the event that “the adversary $\mathcal{A}$ outputs $1$ in Experiment b”.

We define the advantage of $\mathcal{A}$ against $G$ as follows:

$$\mathtt{PRG}\text{adv}[\mathcal{A}, G] := \left| \Pr[W_0] - \Pr[W_1] \right|.$$

Definition (Secure PRG). A PRG $G$ is called secure if the value $\mathtt{PRG}\text{adv}[\mathcal{A}, G]$ is negligible for every efficient adversary $\mathcal{A}$.

As mentioned in the post Cryptography 3, the PRG Attack Game can be rewritten as a “bit-guessing” game, where instead of running two separate experiments, the challenger randomly chooses $b \in \left\lbrace 0, 1\right\rbrace$ and then runs Experiment b with the adversary $\mathcal{A}$.

In this game, we measure the bit-guessing advantage of $\mathcal{A}$:

$$\mathtt{PRG}\text{adv}^\ast[\mathcal{A}, G] := \left| \Pr[\hat{b} = b] - \frac12 \right|.$$

The general result applied here is:

$$\mathtt{PRG}\text{adv}[\mathcal{A}, G] = 2 \cdot \mathtt{PRG}\text{adv}^\ast[\mathcal{A}, G]. \tag{1}$$

Stream Ciphers: Encryption with a PRG

Let $G$ be a PRG defined on $(\left\lbrace 0, 1\right\rbrace^\ell, \left\lbrace 0, 1\right\rbrace^L)$; that is, $G$ expands an $\ell$-bit seed into an $L$-bit output. The stream cipher $\mathcal{E} = (E, D)$ constructed from $G$ is defined on $(\left\lbrace 0, 1\right\rbrace^\ell, \left\lbrace 0, 1\right\rbrace^{\le L}, \left\lbrace 0, 1\right\rbrace^{\le L})$; for $s \in \left\lbrace 0, 1\right\rbrace^\ell$ and $m, c \in \left\lbrace 0, 1\right\rbrace^{\le L}$, the encryption and decryption operations are defined as follows: if $|m| = v$, then

$$E(s, m) := G(s)[0 \dots v - 1] \oplus m,$$

and if $|c| = v$, then

$$D(s, c) := G(s)[0 \dots v - 1] \oplus c.$$

As the reader can easily verify, this satisfies the definition of an encryption scheme (in particular, the correctness property).

Note that, for analyzing the semantic security of $\mathcal{E}$, the length associated with a message $m$ in the Semantic Security Attack Game is its natural length $|m|$ in bits. Also, note that if $v$ is much smaller than $L$, then for many practical PRGs, it is possible to compute the first $v$ bits of $G(s)$ much more quickly than computing all $L$ bits of $G(s)$ and then truncating.

The main result of this section is:

Theorem 1. If $G$ is a secure PRG, then the stream cipher $\mathcal{E}$ constructed from $G$ is a semantically secure encryption scheme.

Specifically, for any $\mathtt{SS}$ adversary $\mathcal{A}$ targeting $\mathcal{E}$ as in the Semantic Security Attack Game, there exists a PRG adversary $\mathcal{B}$ targeting $G$ as in the PRG Attack Game, where $\mathcal{B}$ is just a simple wrapper around $\mathcal{A}$, such that

$$\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}] = 2 \cdot \mathtt{PRG}\texttt{adv}[\mathcal{B}, G]. \tag{2}$$

Proof. Suppose $\mathcal{A}$ is an efficient adversary targeting $\mathcal{E}$ in the Semantic Security Attack Game. We want to show that $\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}]$ is negligible, assuming $G$ is a secure PRG. Instead of proving (2) directly, we will work with the bit-guessing version of the SS attack game by proving:

$$\mathtt{SS}\texttt{adv}^*[\mathcal{A}, \mathcal{E}] = \mathtt{PRG}\texttt{adv}[\mathcal{B}, G] \tag{3}$$

for an efficient adversary $\mathcal{B}$. Then (2) will follow from Theorem 3 in the Cryptography 3 post. Furthermore, under the assumption that $G$ is a secure PRG, the value $\mathtt{PRG}\texttt{adv}[\mathcal{B}, G]$ will be negligible, and therefore $\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}]$ will also be negligible (which is what we need to prove).

Next, consider $\mathcal{A}$’s attack on $\mathcal{E}$ in the bit-guessing version of the Semantic Security Attack Game. In this game, $\mathcal{A}$ gives the challenger two messages $m_0, m_1$ of the same length; the challenger randomly chooses a key $s$ and a bit $b$, and encrypts $m_b$ under key $s$, giving the ciphertext $c$ to $\mathcal{A}$; finally, $\mathcal{A}$ outputs a bit $\hat{b}$. The adversary $A$ wins if $\hat{b} = b$.

Let’s call this game Game 0. The challenger’s logic in this game can be written as follows:

Receive $m_0, m_1 \in \left\lbrace 0, 1\right\rbrace^v$ from $\mathcal{A}$, with $v \le L$, then:

$b \xleftarrow{R} \left\lbrace 0, 1\right\rbrace$
$s \xleftarrow{R} \left\lbrace 0, 1\right\rbrace^\ell$, $r \leftarrow G(s)$
$c \leftarrow r[0 \dots v-1] \oplus m_b$
send $c$ to $\mathcal{A}$.

Let $W_0$ be the event $\hat{b} = b$ in Game 0. By definition:

$$\mathtt{SS}\texttt{adv}^*[\mathcal{A}, \mathcal{E}] = | \Pr[W_0] - 1/2 |. \tag{4}$$

Next, we modify the challenger of Game 0 to create Game 1, which is identical to Game 0 except that the challenger uses a truly random string instead of a pseudo-random one. The logic of Game 1 is as follows:

Receive $m_0, m_1 \in \left\lbrace 0, 1\right\rbrace^v$ from $\mathcal{A}$, with $v \le L$, then:

$b \xleftarrow{R} \left\lbrace 0, 1\right\rbrace$
$r \xleftarrow{R} \left\lbrace 0, 1\right\rbrace^L$
$c \leftarrow r[0 \dots v-1] \oplus m_b$
send $c$ to $\mathcal{A}$.

Let $W_1$ be the event $\hat{b} = b$ in Game 1. We have:

$$\Pr[W_1] = 1/2. \tag{5}$$

This is because in Game 1, the adversary is attacking a variable-length one-time pad. The output $\hat{b}$ of $\mathcal{A}$ and the challenger’s secret bit $b$ are independent.

Finally, we need to construct a PRG adversary $\mathcal{B}$ that uses $\mathcal{A}$ as a subroutine, such that:

$$|\Pr[W_0] - \Pr[W_1]| = \mathtt{PRG}\texttt{adv}[\mathcal{B}, G]. \tag{6}$$

$\mathcal{B}$ receives $r \in \left\lbrace 0, 1\right\rbrace^L$ from its PRG challenger and plays the role of the challenger for $\mathcal{A}$:

Receive $m_0, m_1 \in \left\lbrace 0, 1\right\rbrace^v$ from $\mathcal{A}$, with $v \le L$, then:

$b \xleftarrow{R} \left\lbrace 0, 1\right\rbrace$
$c \leftarrow r[0 \dots v-1] \oplus m_b$
send $c$ to $\mathcal{A}$.

When $\mathcal{A}$ outputs $\hat{b}$, $\mathcal{B}$ outputs $\delta(\hat{b}, b)$, where:

$$ \delta(x, y) := \begin{cases} 1 & \text{if } x = y, \\ 0 & \text{if } x \ne y. \end{cases} \tag{7} $$

Let $p_0$ be the probability that $\mathcal{B}$ outputs $1$ when the PRG challenger is running Experiment 0 of the PRG Attack Game, and let $p_1$ be the probability that $\mathcal{B}$ outputs $1$ when the PRG challenger is running Experiment 1 of the PRG Attack Game. By definition,

$$\mathtt{PRG}\texttt{adv}[\mathcal{B}, G] = |p_1 - p_0|.$$

Furthermore, if the PRG challenger is running Experiment 0, then the adversary $\mathcal{A}$ is essentially playing our Game 0, and thus $p_0 = \Pr[W_0]$. Similarly, if the PRG challenger is running Experiment 1, then $A$ is essentially playing our Game 1, and thus $p_1 = \Pr[W_1]$. Therefore, we can immediately obtain equation $(6)$.

Combining $(4), (5)$, and $(6)$ yields $(3)$. ✷

In the theorem above, we have reduced the security of $\mathcal{E}$ to the security of $G$ by showing that: if there exists an efficient $\mathtt{SS}$ adversary $\mathcal{A}$ that attacks $\mathcal{E}$, then there exists an efficient PRG adversary $\mathcal{B}$ that attacks $G$ satisfying:

$$\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}] \le 2 \cdot \mathtt{PRG}\texttt{adv}[\mathcal{B}, G]$$

We have shown that if $G$ is secure, then $\mathtt{PRG}\texttt{adv}[\mathcal{B}, G]$ is negligible, so by the inequality above, we conclude that $\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}]$ is also negligible. Since this holds for every efficient adversary $\mathcal{A}$, we conclude that $\mathcal{E}$ is semantically secure.

Another way to frame the proof is using the contrapositive. Indeed, if we assume $\mathcal{E}$ is not secure, then there would exist an efficient adversary $\mathcal{A}$ such that $\mathtt{SS}\texttt{adv}[\mathcal{A}, \mathcal{E}]$ is non-negligible, and the reduction (along with the inequality above) gives us an efficient adversary $\mathcal{B}$ such that $\mathtt{PRG}\texttt{adv}[\mathcal{B}, G]$ is also non-negligible. In other words, if we can break $\mathcal{E}$, we can also break $G$. Although logically equivalent, this style of proof has a different “feel” in that we start with an adversary $\mathcal{A}$ that breaks $\mathcal{E}$, and show how to use $\mathcal{A}$ to build a new adversary $\mathcal{B}$ that breaks $G$.

We also note that the proof of the theorem above follows the same basic logical pattern as the analysis of the “Internet roulette” game in Cryptography 3. This is a proof template that will be repeated and extended many times throughout this book. The reader is encouraged to study both of these analyses to ensure they truly understand the process.

In this post, we have journeyed from a theoretical problem—how to achieve secure encryption with a short key—to a concrete and elegant solution: the stream cipher. By replacing the truly random keystream of the One-Time Pad with the output of a Pseudo-random Generator (PRG), we have successfully constructed a semantically secure scheme.

The linchpin of this entire construction is the formal definition and assurance that the PRG’s output is computationally indistinguishable from a truly random string. The proof technique of “reducing” the stream cipher’s security to the PRG’s security is not just a theoretical exercise; it is one of the most fundamental and powerful thinking patterns in modern cryptography. Understanding this method opens the door to analyzing many more complex protocols and schemes that we will explore later.

References

Dan Boneh and Victor Shoup. A Graduate Course in Applied Cryptography. Retrieved from https://toc.cryptobook.us/

Connect with Me

Connect with me on Facebook, LinkedIn, via email at dangduongminhnhat2003@gmail.com, GitHub, or by phone at +84 829 258 815.