Commit 8ff3d50e authored by Raymond Hettinger's avatar Raymond Hettinger

SF patch 658251: Install a C implementation of the Mersenne Twister as the

core generator for random.py.
parent 224ee46b
......@@ -8,30 +8,26 @@
This module implements pseudo-random number generators for various
distributions.
For integers, uniform selection from a range.
For sequences, uniform selection of a random element, and a function to
generate a random permutation of a list in-place.
For sequences, uniform selection of a random element, a function to
generate a random permutation of a list in-place, and a function for
random sampling without replacement.
On the real line, there are functions to compute uniform, normal (Gaussian),
lognormal, negative exponential, gamma, and beta distributions.
For generating distribution of angles, the circular uniform and
von Mises distributions are available.
For generating distributions of angles, the von Mises distribution
is available.
Almost all module functions depend on the basic function
\function{random()}, which generates a random float uniformly in
the semi-open range [0.0, 1.0). Python uses the standard Wichmann-Hill
generator, combining three pure multiplicative congruential
generators of modulus 30269, 30307 and 30323. Its period (how many
numbers it generates before repeating the sequence exactly) is
6,953,607,871,644. While of much higher quality than the \function{rand()}
function supplied by most C libraries, the theoretical properties
are much the same as for a single linear congruential generator of
large modulus. It is not suitable for all purposes, and is completely
unsuitable for cryptographic purposes.
The functions in this module are not threadsafe: if you want to call these
functions from multiple threads, you should explicitly serialize the calls.
Else, because no critical sections are implemented internally, calls
from different threads may see the same return values.
the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as
the core generator. It produces 53-bit precision floats and has a
period of 2**19937-1. The underlying implementation in C
is both fast and threadsafe. The Mersenne Twister is one of the most
extensively tested random number generators in existence. However, being
completely deterministic, it is not suitable for all purposes, and is
completely unsuitable for cryptographic purposes.
The functions supplied by this module are actually bound methods of a
hidden instance of the \class{random.Random} class. You can
......@@ -39,58 +35,19 @@ instantiate your own instances of \class{Random} to get generators
that don't share state. This is especially useful for multi-threaded
programs, creating a different instance of \class{Random} for each
thread, and using the \method{jumpahead()} method to ensure that the
generated sequences seen by each thread don't overlap (see example
below).
generated sequences seen by each thread don't overlap.
Class \class{Random} can also be subclassed if you want to use a
different basic generator of your own devising: in that case, override
the \method{random()}, \method{seed()}, \method{getstate()},
\method{setstate()} and \method{jumpahead()} methods.
Here's one way to create threadsafe distinct and non-overlapping generators:
\begin{verbatim}
def create_generators(num, delta, firstseed=None):
"""Return list of num distinct generators.
Each generator has its own unique segment of delta elements
from Random.random()'s full period.
Seed the first generator with optional arg firstseed (default
is None, to seed from current time).
"""
from random import Random
g = Random(firstseed)
result = [g]
for i in range(num - 1):
laststate = g.getstate()
g = Random()
g.setstate(laststate)
g.jumpahead(delta)
result.append(g)
return result
gens = create_generators(10, 1000000)
\end{verbatim}
That creates 10 distinct generators, which can be passed out to 10
distinct threads. The generators don't share state so can be called
safely in parallel. So long as no thread calls its \code{g.random()}
more than a million times (the second argument to
\function{create_generators()}, the sequences seen by each thread will
not overlap. The period of the underlying Wichmann-Hill generator
limits how far this technique can be pushed.
Just for fun, note that since we know the period, \method{jumpahead()}
can also be used to ``move backward in time:''
\begin{verbatim}
>>> g = Random(42) # arbitrary
>>> g.random()
0.25420336316883324
>>> g.jumpahead(6953607871644L - 1) # move *back* one
>>> g.random()
0.25420336316883324
\end{verbatim}
As an example of subclassing, the \module{random} module provides
the \class{WichmannHill} class which implements an alternative generator
in pure Python. The class provides a backward compatible way to
reproduce results from earlier versions of Python which used the
Wichmann-Hill algorithm as the core generator.
\versionchanged[Substituted MersenneTwister for Wichmann-Hill]{2.3}
Bookkeeping functions:
......@@ -104,18 +61,6 @@ Bookkeeping functions:
If \var{x} is not \code{None} or an int or long,
\code{hash(\var{x})} is used instead.
If \var{x} is an int or long, \var{x} is used directly.
Distinct values between 0 and 27814431486575L inclusive are guaranteed
to yield distinct internal states (this guarantee is specific to the
default Wichmann-Hill generator, and may not apply to subclasses
supplying their own basic generator).
\end{funcdesc}
\begin{funcdesc}{whseed}{\optional{x}}
This is obsolete, supplied for bit-level compatibility with versions
of Python prior to 2.1.
See \function{seed} for details. \function{whseed} does not guarantee
that distinct integer arguments yield distinct internal states, and can
yield no more than about 2**24 distinct internal states in all.
\end{funcdesc}
\begin{funcdesc}{getstate}{}
......@@ -134,17 +79,20 @@ Bookkeeping functions:
\end{funcdesc}
\begin{funcdesc}{jumpahead}{n}
Change the internal state to what it would be if \function{random()}
were called \var{n} times, but do so quickly. \var{n} is a
non-negative integer. This is most useful in multi-threaded
Change the internal state to one different from and likely far away from
the current state. \var{n} is a non-negative integer which is used to
scramble the current state vector. This is most useful in multi-threaded
programs, in conjuction with multiple instances of the \class{Random}
class: \method{setstate()} or \method{seed()} can be used to force
all instances into the same internal state, and then
\method{jumpahead()} can be used to force the instances' states as
far apart as you like (up to the period of the generator).
class: \method{setstate()} or \method{seed()} can be used to force all
instances into the same internal state, and then \method{jumpahead()}
can be used to force the instances' states far apart.
\versionadded{2.1}
\versionchanged[Instead of jumping to a specific state, \var{n} steps
ahead, \method{jumpahead(\var{n})} jumps to another state likely to be
separated by many steps.]{2.3}
\end{funcdesc}
Functions for integers:
\begin{funcdesc}{randrange}{\optional{start,} stop\optional{, step}}
......@@ -279,8 +227,32 @@ these equations can be found in any statistics text.
\var{beta} is the shape parameter.
\end{funcdesc}
Alternative Generator
\begin{classdesc}{WichmannHill}{\optional{seed}}
Class that implements the Wichmann-Hill algorithm as the core generator.
Has all of the same methods as \class{Random} plus the \method{whseed}
method described below. Because this class is implemented in pure
Python, it is not threadsafe and may require locks between calls. The
period of the generator is 6,953,607,871,644 which is small enough to
require care that two independent random sequences do not overlap.
\end{classdesc}
\begin{funcdesc}{whseed}{\optional{x}}
This is obsolete, supplied for bit-level compatibility with versions
of Python prior to 2.1.
See \function{seed} for details. \function{whseed} does not guarantee
that distinct integer arguments yield distinct internal states, and can
yield no more than about 2**24 distinct internal states in all.
\end{funcdesc}
\begin{seealso}
\seetext{M. Matsumoto and T. Nishimura, ``Mersenne Twister: A
623-dimensionally equidistributed uniform pseudorandom
number generator'',
\citetitle{ACM Transactions on Modeling and Computer Simulation}
Vol. 8, No. 1, January pp.3-30 1998.}
\seetext{Wichmann, B. A. \& Hill, I. D., ``Algorithm AS 183:
An efficient and portable pseudo-random number generator'',
\citetitle{Applied Statistics} 31 (1982) 188-190.}
......
This diff is collapsed.
from test import test_support
#!/usr/bin/env python
import unittest
import random
import time
from test import test_support
class TestBasicOps(unittest.TestCase):
# Superclass with tests common to all generators.
# Subclasses must arrange for self.gen to retrieve the Random instance
# to be tested.
def randomlist(self, n):
"""Helper function to make a list of random numbers"""
return [self.gen.random() for i in xrange(n)]
def test_autoseed(self):
self.gen.seed()
state1 = self.gen.getstate()
time.sleep(1)
self.gen.seed() # diffent seeds at different times
state2 = self.gen.getstate()
self.assertNotEqual(state1, state2)
def test_saverestore(self):
N = 1000
self.gen.seed()
state = self.gen.getstate()
randseq = self.randomlist(N)
self.gen.setstate(state) # should regenerate the same sequence
self.assertEqual(randseq, self.randomlist(N))
def test_seedargs(self):
for arg in [None, 0, 0L, 1, 1L, -1, -1L, 10**20, -(10**20),
3.14, 1+2j, 'a', tuple('abc')]:
self.gen.seed(arg)
for arg in [range(3), dict(one=1)]:
self.assertRaises(TypeError, self.gen.seed, arg)
def test_jumpahead(self):
self.gen.seed()
state1 = self.gen.getstate()
self.gen.jumpahead(100)
state2 = self.gen.getstate() # s/b distinct from state1
self.assertNotEqual(state1, state2)
self.gen.jumpahead(100)
state3 = self.gen.getstate() # s/b distinct from state2
self.assertNotEqual(state2, state3)
self.assertRaises(TypeError, self.gen.jumpahead) # needs an arg
self.assertRaises(TypeError, self.gen.jumpahead, "ick") # wrong type
self.assertRaises(TypeError, self.gen.jumpahead, 2.3) # wrong type
self.assertRaises(TypeError, self.gen.jumpahead, 2, 3) # too many
def test_sample(self):
# For the entire allowable range of 0 <= k <= N, validate that
# the sample is of the correct length and contains only unique items
N = 100
population = xrange(N)
for k in xrange(N+1):
s = self.gen.sample(population, k)
self.assertEqual(len(s), k)
uniq = dict.fromkeys(s)
self.assertEqual(len(uniq), k)
self.failIf(None in uniq)
def test_gauss(self):
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used
# by (and only by) the .gauss() method.
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
self.gen.seed(seed)
x1 = self.gen.random()
y1 = self.gen.gauss(0, 1)
self.gen.seed(seed)
x2 = self.gen.random()
y2 = self.gen.gauss(0, 1)
self.assertEqual(x1, x2)
self.assertEqual(y1, y2)
class WichmannHill_TestBasicOps(TestBasicOps):
gen = random.WichmannHill()
def test_strong_jumpahead(self):
# tests that jumpahead(n) semantics correspond to n calls to random()
N = 1000
s = self.gen.getstate()
self.gen.jumpahead(N)
r1 = self.gen.random()
# now do it the slow way
self.gen.setstate(s)
for i in xrange(N):
self.gen.random()
r2 = self.gen.random()
self.assertEqual(r1, r2)
def test_gauss_with_whseed(self):
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used
# by (and only by) the .gauss() method.
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
self.gen.whseed(seed)
x1 = self.gen.random()
y1 = self.gen.gauss(0, 1)
self.gen.whseed(seed)
x2 = self.gen.random()
y2 = self.gen.gauss(0, 1)
self.assertEqual(x1, x2)
self.assertEqual(y1, y2)
class MersenneTwister_TestBasicOps(TestBasicOps):
gen = random.Random()
def test_referenceImplementation(self):
# Compare the python implementation with results from the original
# code. Create 2000 53-bit precision random floats. Compare only
# the last ten entries to show that the independent implementations
# are tracking. Here is the main() function needed to create the
# list of expected random numbers:
# void main(void){
# int i;
# unsigned long init[4]={61731, 24903, 614, 42143}, length=4;
# init_by_array(init, length);
# for (i=0; i<2000; i++) {
# printf("%.15f ", genrand_res53());
# if (i%5==4) printf("\n");
# }
# }
expected = [0.45839803073713259,
0.86057815201978782,
0.92848331726782152,
0.35932681119782461,
0.081823493762449573,
0.14332226470169329,
0.084297823823520024,
0.53814864671831453,
0.089215024911993401,
0.78486196105372907]
self.gen.seed(61731L + (24903L<<32) + (614L<<64) + (42143L<<96))
actual = self.randomlist(2000)[-10:]
for a, e in zip(actual, expected):
self.assertAlmostEqual(a,e,places=14)
def test_strong_reference_implementation(self):
# Like test_referenceImplementation, but checks for exact bit-level
# equality. This should pass on any box where C double contains
# at least 53 bits of precision (the underlying algorithm suffers
# no rounding errors -- all results are exact).
from math import ldexp
expected = [0x0eab3258d2231fL,
0x1b89db315277a5L,
0x1db622a5518016L,
0x0b7f9af0d575bfL,
0x029e4c4db82240L,
0x04961892f5d673L,
0x02b291598e4589L,
0x11388382c15694L,
0x02dad977c9e1feL,
0x191d96d4d334c6L]
self.gen.seed(61731L + (24903L<<32) + (614L<<64) + (42143L<<96))
actual = self.randomlist(2000)[-10:]
for a, e in zip(actual, expected):
self.assertEqual(long(ldexp(a, 53)), e)
def test_long_seed(self):
# This is most interesting to run in debug mode, just to make sure
# nothing blows up. Under the covers, a dynamically resized array
# is allocated, consuming space proportional to the number of bits
# in the seed. Unfortunately, that's a quadratic-time algorithm,
# so don't make this horribly big.
seed = (1L << (10000 * 8)) - 1 # about 10K bytes
self.gen.seed(seed)
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used by
# (and only by) the .gauss() method.
class TestModule(unittest.TestCase):
def testMagicConstants(self):
self.assertAlmostEqual(random.NV_MAGICCONST, 1.71552776992141)
self.assertAlmostEqual(random.TWOPI, 6.28318530718)
self.assertAlmostEqual(random.LOG4, 1.38629436111989)
self.assertAlmostEqual(random.SG_MAGICCONST, 2.50407739677627)
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
for seeder in random.seed, random.whseed:
seeder(seed)
x1 = random.random()
y1 = random.gauss(0, 1)
def test__all__(self):
# tests validity but not completeness of the __all__ list
defined = dict.fromkeys(dir(random))
for entry in random.__all__:
self.failUnless(entry in defined)
seeder(seed)
x2 = random.random()
y2 = random.gauss(0, 1)
def test_main():
suite = unittest.TestSuite()
for testclass in (WichmannHill_TestBasicOps,
MersenneTwister_TestBasicOps,
TestModule):
suite.addTest(unittest.makeSuite(testclass))
test_support.run_suite(suite)
test_support.vereq(x1, x2)
test_support.vereq(y1, y2)
if __name__ == "__main__":
test_main()
......@@ -545,6 +545,25 @@ Library
and 'stop' arguments so long as each is in the range of Python's
bounded integers.
- Thanks to Raymond Hettinger, random.random() now uses a new core
generator. The Mersenne Twister algorithm is implemented in C,
threadsafe, faster than the previous generator, has an astronomically
large period (2**19937-1), creates random floats to full 53-bit
precision, and may be the most widely tested random number generator
in existence.
The random.jumpahead(n) method has different semantics for the new
generator. Instead of jumping n steps ahead, it uses n and the
existing state to create a new state. This means that jumpahead()
continues to support multi-threaded code needing generators of
non-overlapping sequences. However, it will break code which relies
on jumpahead moving a specific number of steps forward.
The attributes random.whseed and random.__whseed have no meaning for
the new generator. Code using these attributes should switch to a
new class, random.WichmannHill which is provided for backward
compatibility and to make an alternate generator available.
- New "algorithms" module: heapq, implements a heap queue. Thanks to
Kevin O'Connor for the code and François Pinard for an entertaining
write-up explaining the theory and practical uses of heaps.
......
This diff is collapsed.
......@@ -316,6 +316,8 @@ class PyBuildExt(build_ext):
libraries=math_libs) )
exts.append( Extension('datetime', ['datetimemodule.c'],
libraries=math_libs) )
# random number generator implemented in C
exts.append( Extension("_random", ["_randommodule.c"]) )
# operator.add() and similar goodies
exts.append( Extension('operator', ['operator.c']) )
# Python C API test module
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment