This post is related to the ongoing Polymath 4 project [Blog, Wiki] which looks at the problem of deterministically generating primes of a given size. Many interesting approaches have been discussed. One that I found particularly interesting was Albert Asterias‘s derandomization approach based on Nisan-Wigderson Pseudorandom Generators. I will not explain Asterias’s idea here, but it should be clear how it inspired this post.
As the title suggests, this post is not about the problem of designing pseudorandom generators for the purpose of finding primes. This post is about the completely opposite question:
How hard is it for a pseudorandom generator to avoid producing primes?
I will show that if a pseudorandom generator fails to generate primes in reasonable quantity then this must be partly explained by some specific types of irregularities in the distribution of primes. It is not at all surprising that such irregularities prevent some pseudorandom generators from generating primes. However, it is not immediately clear that all failures lead to such irregularities. A heuristic analysis of these irregularities suggests that it is quite difficult for a reasonable pseudorandom generator to avoid producing primes, but I also argue that these irregularities cannot explain all ways that a pseudorandom generator can fail to produce primes.
The irregularities in question are of two specific types. Let be the binary representation of some
-bit number
For
I define
and
as follows:
is the (possibly negative) excess of
-bit primes whose lower order bits are
versus those
-bit primes whose lower order bits are
is the (possibly negative) excess of
-bit primes whose higher order bits are
versus those
-bit primes whose higher order bits are
Thus measures the difference in the number of
-bit primes in two adjacent intervals of length
and dually
measures the difference in the number of
-bit primes in two “adjacent” arithmetic progressions with step
Since this is unusual in certain contexts, let me warn you that the sign of the difference is important.
Irregularity Lemma. Let be a random variable that takes values in the set
of
-bit integers. Suppose that
is rather bad at generating primes, more precisely suppose that
for some where
is uniformly distributed in
Then
must hold for some
must hold for some
Before I turn to the proof of the Irregularity Lemma, I’ll discuss what the conclusion of the lemma says.
Understanding the Irregularities
The condition is perhaps most striking. The arithmetic progression
contains only one prime, whereas the arithmetic progression
contains
primes. Therefore,
So only when
is even more often than it is odd, which is clearly a bad idea for generating primes. (Though the special case when
is instructive.)
The condition is very different. When
is even, then
and so the even contribution to the expectation is negligible. When
is odd, then
is the excess of the number of
-bit primes that are
versus the number of
-bit primes that are
or vice versa. For large enough
we expect to have roughly the same number of primes in these two congruence classes. Indeed, the Siegel-Walfisz Theorem guarantees that
which means that this type of irregularity cannot possibly occur for large enough
In fact, assuming the Generalized Riemann Hypothesis, we have
for
and so the condition
cannot hold for
When is close enough to
the expectation
can be large enough. However, this situation can still have a very unusual feel to it. The extreme case
is enlightening. Let
be the random variable that differs from
only in the most significant bit (i.e.,
). Then
So a large means that while
is not often prime, the closely related random variable
is prime more often. In general,
for large
means that while
may not be prime, it is close to many primes in the dyadic metric.
The situation with is similar, but has a slightly different feel. Again, when
is close to
the inequality
is unlikely since we expect two intervals of length
to contain roughly the same number of primes. Indeed, assuming the Riemann Hypothesis, we have the bound
which means that
cannot hold for
When is small enough, the expectation
can be large enough. Again, this situation can have a very unusual feel. For example, a large value of
means that
is often prime. In general,
for small
means that while
may not be prime, it is close to many primes (in the usual metric).
When neither the inequality
nor the inequality
is very surprising. However, it is unusual for both inequalities to hold simultaneously. Indeed, arithmetic progressions with step
and intervals of length
are somewhat orthogonal to each other so it would seem that the two types of irregularities would tend to interfere and cancel each other out. Such a
should take a somewhat sparse set of values. This is not very restrictive, so this is likely where most irregularities can be found.
Seeing that many of these irregularities are either impossible, unlikely, or very visible, it is tempting to conclude that any reasonable looking pseudorandom generator will inevitably generate a decent amount of primes. This is overly optimistic as the irregularities are relatively blind to some more global properties that can prevent from generating primes. For example, if
is such that
(In other words, has equally many
bits in even position as it has
bits in odd positions.) Then
is always a multiple of
However, it is not at all clear how to explain this global property in terms of irregularities as described above.
Proof of the Irregularity Lemma
I will only prove the first conclusion that
for some The proof of the second conclusion is symmetric.
Let’s write
where are (usually not independent nor uniform) binary random variables. Similarly, let’s write
where are independent uniform binary random variables.
Consider the hybrid random variables
for (so
and
). Since
there must be an such that
Note that equals
where is the random variable
(So differs from
only in the
-th bit.) Since
is independent of
we have
Thus,
and, from we obtain
On the other hand,
So QED