Reading: Katz-Lindell Section 3.3, Boneh-Shoup Chapter 3
The nature of randomness has troubled philosophers, scientists, statisticians
and laypeople for many years.1 Over the years people have given different
answers to the question of what does it mean for data to be random, and what is
the nature of probability. The movements of the planets initially looked random
and arbitrary, but then the early astronomers managed to find order and make
some predictions on them. Similarly we have made great advances in predicting
the weather, and probably will continue to do so.
So, while these days it seems as if the event of whether or not it will rain a week from today
is random, we could imagine that in with time we will be able to predict the weather further into the future.
Even the canonical notion of a random experiment -tossing a coin -
turns out that it might not be as random as you'd think, with about a
51% chance that the second toss will have the same result as the first one.
(Though see also this experiment.)
It is conceivable that at some point someone would discover some function
If a quantity is hard to compute, it might as well be random.
Much of cryptography is about trying to make this intuition more formal, and harnessing it to build secure systems. The basic object we want is the following:
A function
We say that
This definition (as is often the case in cryptography) is a bit long, so you want to take your time parsing it. In particular you should verify that you understand why the condition prgdefeq{.eqref} is the same as saying that for every polynomial
Note that the requirement that
Suppose that
On input
It is not hard to show that if
Conjecture (The PRG conjecture): For every
$n$ , there exists a pseudorandom generator$G$ mapping$n$ bits to$n+1$ bits.^[The name "The PRG conjecture" is non-standard. In the literature this is known as the conjecture of existence of pseudorandom generators. This is a weaker form of "The Optimal PRG Conjecture" presented in my intro to theoretical CS lecture notes since the PRG conjecture only posits the existence of pseudorandom generators with arbitrary polynomial blowup, as opposed to an exponential blowup posited in the optimal PRF conjecture.]
As was the case for the cipher conjecture, and any other conjecture, there are two natural questions regarding the PRG conjecture: why should we believe it and why should we care. Fortunately, the answer to the first question is simple: it is known that the cipher conjecture implies the PRG conjecture, and hence if we believe the former we should believe the latter. (The proof is highly non-trivial and we may not get to see it in this course.) As for the second question, we will see that the PRG conjecture implies a great number of useful cryptographic tools, including the cipher conjecture (i.e., the two conjectures are in fact equivalent). We start by showing that once we can get to an output that is one bit longer than the input, we can in fact obtain any number of bits.
Suppose that the PRG conjecture is
true. Then for every polynomial
The proof of this theorem is very similar to the length extension theorem for ciphers, and in fact this theorem can be used to give an alternative proof for the former theorem.
The construction is illustrated in lengthextendprgfig{.ref}.
We are given a pseudorandom generator
Now suppose otherwise, that there exists some adversary
On input of string
The proof of lengthextendprgthm{.ref} is indicative of many practical constructions of pseudorandom generators. Many operating systems keep track of an initial seed of randomness, and supply a system call rand
such that every call to rand
applies a pseudorandom generator
The notion that being random is the same as being "unpredictable" can be formalized as follows.
One can show that a random variable
We now show a connection between our two notions:
If the PRG conjecture is true then so is the cipher conjecture.
It turns out that the converse direction is also true, and hence these two conjectures are equivalent, though we will probably not show the (quite non-trivial) proof of this fact in this course. (We might show some weaker version of this harder direction.)
The construction is actually quite simple, recall that the one time
pad is a perfectly secure cipher but its only problem was that to encrypt an
and
Just like in the one time pad,
Claim: For every
The claim implies the security of the scheme, since it means that
Proof of claim: Suppose that there was an efficient adversary
for some non-negligible
If the PRG outputs
The following is a cute application of pseudorandom generators. Alice and Bob want to toss a fair coin over the phone. They use a pseudorandom generator
- Alice will send
$z\leftarrow_R{0,1}^{3n}$ to Bob \ - Bob picks
$s\leftarrow_R{0,1}^n$ and with probability$1/2$ sends$G(s)$ (case I) and with probability$1/2$ sends$G(s)\oplus z$ (case II).\ - Alice then picks a random
$b\leftarrow_R{0,1}$ and sends it to Bob. \ - Bob reveals what he sent in the previous stage and if it was case I, their output is
$b$ , and if it was case II, their output is$1-b$ .
It can be shown that (assuming the protocol is completed) the output is a random coin, which neither Alice or Bob can control or predict with more than negligible advantage over half. (Trying to formalize this and prove it is an excellent exercise.)
So far we have made the conjectures that objects such as ciphers and
pseudorandom generators exist, without giving any hint as to how they would
actually look like. (Though we have examples such as the Caesar cipher, Vignere, and Enigma of what secure ciphers don't look like.)
As mentioned above, we do not know how to prove that
any particular function is a pseudorandom generator.
However, there are quite simple candidates (i.e., functions that are conjectured to be secure pseudorandom generators), though care must be taken in constructing them.
We now consider candidates for functions that maps
Just to get started, let's show an example of an obviously bogus pseudorandom generator.
We define the "counter pseudorandom generator"
You should really pause here and make sure you see why the "counter pseudorandom generator" is not a secure pseudorandom generator. Show that this is true even if we replace the least significant digit by the
LFSR can be thought of as the "mother" (or maybe more like the sick great-uncle) of all psuedorandom generators.
One of the simplest ways to generate a "randomish" extra digit given an
LFSR's have several good properties- if the function
A more interesting property is that (if the function is selected properly) every two coordinates are independent from one another.
That is, there is some super-polynomial function
There is a more general notion of a linear generator where the new state can be any invertible linear transformation of the previous state. That is, we interpret the state
All these generators are unfortunately insecure due to the great bane of cryptography- the Gaussian elimination algorithm which students typically encounter in any linear algebra class.^[Despite the name, the algorithm goes at least as far back as the Chinese Jiuzhang Suanshu manuscript, circa 150 B.C.]
There is a polynomial time algorithm to solve
Despite its seeming simplicity and ubiquity, Gaussian elimination (and some generalizations and related algorithms such as Euclid’s extended g.c.d algorithm and the LLL lattice reduction algorithm) has been used time and again to break candidate cryptographic constructions.
In particular, if we look at the first
The above means that it is a bad idea to use a linear checksum as a
pseudorandom generator in a cryptographic application, and in fact in any
adversarial setting (e.g., one shouldn't hope that an attacker would not be able
to reverse engineer the algorithm5 that computes the control digit of a credit card
number). However, that does not mean that there are no legitimate cases
where linear generators can be used
. In a setting where the application is not adversarial and you have an ability to test if the generator is actually successful, it
might be reasonable to use such insecure non-cryptographic generators.
They tend to be more efficient (though often not by much) and hence are often the default
option in many programming environments such as the C rand()
command. (In
fact, the real bottleneck in using cryptographic pseudorandom generators is
often the generation of entropy for their seed, as discussed in the previous
lecture, and not their actual running time.)
It is often the case that we want to "fix" a broken cryptographic primitive, such as a pseudorandom generator, to make it secure. At the moment this is still more of an art than a science, but there are some principles that cryptographers have used to try to make this more principled. The main intuition is that there are certain properties of computational problems that make them more amenable to algorithms (i.e., "easier") and when we want to make the problems useful for cryptography (i.e., "hard") we often seek variants that don't possess these properties. The following table illustrates some examples of such properties. (These are not formal statements, but rather is intended to give some intuition )
Easy Hard
Continuous Discrete
Convex Non-convex
Linear Non-linear (degree
Many cryptographic constructions can be thought of as trying to transform an easy problem into a hard one by moving from the left to the right column of this table.
The discrete logarithm problem is the discrete version of the continuous real logarithm problem. The learning with errors problem can be thought of as the noisy version of the linear equations problem (or the discrete version of least squares minimization). When constructing block ciphers we often have mixing transformation to ensure that the dependency structure between different bits is global, S-boxes to ensure non-linearity, and many rounds to ensure deep structure and large algebraic degree.
This also works in the other direction. Many algorithmic and machine learning advances work by embedding a discrete problem in a continuous convex one. Some attacks on cryptographic objects can be thought of as trying to recover some of the structure (e.g., by embedding modular arithmetic in the real line or "linearizing" non linear equations).
One approach that is widely used in implementations of pseudorandom generators is to take a linear generator such as the linear congruential generators described above, and use for the output a "chopped" version of the linear function and drop some of the least significant bits. The operation of dropping these bits is non-linear and hence the attack above does not immediately apply. Nevertheless, it turns out this attack can be generalized to handle this case, and hence even with dropped bits Linear Congruential Generators are completely insecure and should be used (if at all) only in applications such as simulations where there is no adversary. Section 3.7.1 in the Boneh-Shoup book describes one attack against such generators that uses the notion of lattice algorithms that we will encounter later in this course in very different contexts.
Let's now describe some successful (at least per current knowledge) pseudorandom generators:
Here is an extremely simple generator that is yet still secure6 as far as we know.
# seed is a list of 40 zero/one values
# output is a 48 bit integer
def subset_sum_gen(seed):
modulo = 0x1000000
constants = [
0x3D6EA1, 0x1E2795, 0xC802C6, 0xBF742A, 0x45FF31,
0x53A9D4, 0x927F9F, 0x70E09D, 0x56F00A, 0x78B494,
0x9122E7, 0xAFB10C, 0x18C2C8, 0x8FF050, 0x0239A3,
0x02E4E0, 0x779B76, 0x1C4FC2, 0x7C5150, 0x81E05E,
0x154647, 0xB80E68, 0xA042E5, 0xE20269, 0xD3B7F3,
0xCC5FB9, 0x0BFC55, 0x847AE0, 0x8CFDF8, 0xE304B7,
0x869ACE, 0xB4CDAB, 0xC8E31F, 0x00EDC7, 0xC50541,
0x0D6DDD, 0x695A2F, 0xA81062, 0x0123CA, 0xC6C5C3 ]
# return the modular sum of the constants
# corresponding to ones in the seed
return reduce(lambda x,y: (x+y) % modulo,
map(lambda a,b: a*b, constants,seed))
The seed to this generator is an array seed
of 40 bits, with 40 hardwired constants each 48 bits long (these constants were generated at random, but are fixed once and for all, and are not kept secret and hence are not
considered part of the secret random seed).
The output is simply
This generator is loosely motivated by the "subset sum" computational problem, which is NP hard. However, since NP hardness is a worst case notion of complexity, it does not imply security for pseudorandom generators, which requires hardness of an average case variant. To get some intuition for its security, we can work out why (given that it seems to be linear) we cannot break it by simply using Gaussian elimination.
This is an excellent point for you to stop and try to answer this question on your own.
Given the known constants and known output, figuring out the set of potential seeds can be thought of as solving a single equation in 40 variables. However, this equation is clearly overdetermined, and will have a solution regardless of whether the observed value is indeed an output of the generator, or it is chosen uniformly at random.
More concretely, we can use linear-eqaution solving to compute (given the known constants
The following is another example of an extremely simple generator known as RC4 (this stands for Rivest Cipher 4, as Ron Rivest invented this in 1987) and is still fairly widely used today.
def RC4(P,i,j):
i = (i + 1) % 256
j = (j + P[i]) % 256
P[i], P[j] = P[j], P[i]
return (P,i,j,P[(P[i]+P[j]) % 256])
The function RC4
takes as input the current state P,i,j
of the generator and returns the new state together with a single output byte.
The state of the generator consists of an array P
of 256 bytes, which can be thought of as a permutation of the numbers P
is a completely random permutation and P
from a shorter seed as well.
RC4 has extremely efficient software implementations and hence has been widely implemented. However, it has several issues with its security. In particular it was shown by Mantin7 and Shamir that the second bit of RC4 is not random, even if the initialization vector was random. This and other issues led to a practical attack on the 802.11b WiFi protocol, see Section 9.9 in Boneh-Shoup. The initial response to those attacks was to suggest to drop the first 1024 bytes of the output, but by now the attacks have been sufficiently extended that RC4 is simply not considered a secure cipher anymore. The ciphers Salsa and ChaCha, designed by Dan Bernstein, have a similar design to RC4, and are considered secure and deployed in several standard protocols such as TLS, SSH and QUIC, see Section 3.6 in Boneh-Shoup.
We now show that, if we don't insist on constructivity of pseudorandom generators, then we can show that there exists pseudorandom generators with output that exponentially larger in the input length.
There is some absolute constant
The proof uses an extremely useful technique known as the "probabilistic method" which is not too hard mathematically but can be confusing at first.^[There is a whole (highly recommended) book by Alon and Spencer devoted to this method.]
The idea is to give a "non constructive" proof of existence of the pseudorandom generator
The above discussion might be rather abstract at this point, but would become clearer after seeing the proof.
Let
Claim I: For every fixed NAND program / Boolean circuit
Before proving Claim I, let us see why it implies prgexist{.ref}.
We can identify a function
For every NAND program / Boolean circuit
To understand this proof it is crucial that you pause here and see how the definition of
Now, the number of programs of size
Hence conclude the proof of prgexist{.ref}, it suffices to prove Claim I.
Choosing a random
Footnotes
-
Even lawyers grapple with this question, with a recent example being the debate of whether fantasy football is a game of chance or of skill. ↩
-
In fact such a function must exist in some sense since in the entire history of the world, presumably no sequence of $100$ fair coin tosses has ever repeated. ↩
-
Because we use a small input to grow a large pseudorandom string, the input to a pseudorandom generator is often known as its seed. ↩
-
A ring is a set of elements where addition and multiplication are defined and obey the natural rules of associativity and commutativity (though without necessarily having a multiplicative inverse for every element). For every integer $q$ we define $\Z_q$ (known as the ring of integers modulo $q$) to be the set ${0,\ldots,q-1}$ where addition and multiplication is done modulo $q$. ↩
-
That number is obtained by applying an algorithm of Hans Peter Luhn which applies a simple map to each digit of the card and then sums them up modulo 10. ↩
-
Actually modern computers will be able to break this generator via brute force, but if the length and number of the constants were doubled (or perhaps quadrupled) this should be sufficiently secure, though longer to write down. ↩
-
I typically do not include references in these lecture notes, and leave them to the texts, but I make here an exception because Itsik Mantin was a close friend of mine in grad school. ↩