Why randomness? While much interesting work can be done with completely deterministic models, sooner or later you will want to simulate some real-life stochastic phenomenon which occurs in a manner resembling an identifiable statistical distribution, for example (to take the canonical simulation example) the time intervals between customers arriving to join a queue in front of a bank teller. Or perhaps you just want to add some controlled unpredictability to the behavior of your agents.
For these purposes, we need random generators.
A "random generator" is a device or algorithm that puts out "random" numbers. Without going into a long philosophical discussion about the true meaning of randomness, this means that the device or program puts out numbers that are such that even knowing the whole past output of the device or program does not help us predict the next number to come out.
A device that provides truly random numbers in this sense can be built; it would have to rely on such natural processes as counting cosmic particles hitting the Earth, or digitizing pictures of a Lava lamp. Few computer applications actually make use of such truly random devices. Most current cryptography applications that need randomness rely on collecting various `random' (i.e. unpredictable to an outsider) data from your computer, such as the time between your keystrokes etc.
But this is not really what we want for simulation experiments, because we do want, on occasion, to rerun simulations exactly as they ran before, and that`s not possible with a truly random generator. So what we use instead are `pseudo-random' generators, which are algorithms that provide streams of numbers with certain statistical properties. We can make them repeat their output streams by initializing them with a given starting `seed' (an integer value).
Such a generator provides output from a set of possible integers, e.g. all integers between 0 and 232-1. It delivers a stream of output which visits all possible integers before it starts repeating itself. The length of the cycle depends, among other things, on the number of bits used to hold the internal state of the generator (its `memory'). The larger the state, the longer the cycle possible. If the cycle is longer than the desired set of output integers, the generator may run through the set of such integers several different ways before exhausting its `meta-cycle'.
(It should be obvious that the output of such a generator is not truly random, since you can perfectly predict the next output given the state and the algorithm employed. A computer executing a pseudorandom algorithm is, after all, a perfectly deterministic little finite state machine!)
Other things being equal, we'd like a generator that has a single cycle that is as long as possible, that runs fast and uses little memory. (These wishes are of course in conflict with each other!) And the generator should perform `acceptably' in a statistical sense. Readers who wish to pursue what this might mean are referred to the bibliography.
This library implements a large number of different algoritms, some old ones included mostly for historical interest, and newer better ones drawn from recent literature. These generators have been subjected to various statistical tests, and the results of these tests are described in Advanced Usage Guide.
Every Swarm simulation has a predefined random generator of type MT19937 allocated (in random/random.m); its name is `randomGenerator'. This generator can be accessed from anywhere (any object or agent) in your simulation; its name is a global variable.
You can draw random numbers from this generator to your heart's content and not worry about any statistical problems. The generator has a period close to 219937 (1 x 106001), so there is no danger of running a simulation long enough for the generator to repeat itself. [At one microsecond per call, it would take about 3.2 x 105987 years to exhaust this generator. For comparison, the age of the Universe is `only' 2 x 1010 years!]
You obtain `random' numbers of type unsigned integer this way:
unsigned int myUnsigned; myUnsigned = [randomGenerator getUnsignedSample]; |
The values returned will be uniformly distributed in the range [0,4294967295] = [0,232-1].
Or, if you need floating-point values instead, you can say
double myDouble; myDouble = [randomGenerator getDoubleSample]; |
The returned values will be uniformly distributed in the range [0.0,1.0), i.e. they may be equal to 0.0 but never 1.0.
Whenever a random generator is created, its state has to be initialized. What the state is initialized to determines where in its cycle the generator will start. To make life easy for the user, the Swarm generators can be initialized to a predictable and repeatable state using a starting `seed', an integer between 1 and 232-1 (0 is not allowed.) Every time you initialize a given generator with a particular seed, you should get the same sequence of numbers from it.
You initialize a generator with a specific seed this way:
unsigned int randomSeed; randomSeed = 4532657; [randomGenerator setStateFromSeed: randomSeed]; |
You may do this any time during a simulation, not just at the start.
If you start your simulation in the normal way (with ./mysim or ./mysim -b), your generator will be started with the same default starting seed every time. This means it will produce the exact same sequence of numbers for each run, which makes replication easy.
However, if you want to run the same simulation several times but with different starting seeds, so that each run uses a different sequence of random numbers, this is easily accomplished by specifying -s or --varyseed on the command line.
Should you want to create your own random generators, e.g. to give each agent its own source of randomness, you do the following:
#import <random.h> id <SimpleRandomGenerator> myGenerator; unsigned int mySeed; |
Then, to create a generator and start it with a specific seed value:
mySeed = 123776; myGenerator = [RWC8gen create: [self getZone] setStateFromSeed: mySeed]; |
Or, to create a generator and start it with the default system seed:
myGenerator = [PSWBgen createWithDefaults: [self getZone]]; |
The default system seed is normally a specific, static value, which is the same for each run, unless you run with the --varyseed command line argument, in which case it is a different seed value for each run.
In either case, you can (re-)set the seed at any time during a run:
mySeed = 345; [myGenerator setStateFromSeed: mySeed]; |
(All the generators except two conform to the `SimpleRandomGenerator' protocol. The two `split' generators that do not, C2LCGX and C4LCGX, are described in the Generator Usage Guide.)
Finally, you can of course create your new generator in whatever memory zone you choose, not just the creating object's own zone:
myGenerator = [SWB3gen createWithDefaults: globalZone]; |
(`globalZone' is the only predefined zone; other zones you would need to create yourself. See the Defobj Library to read more about zones.)
For a more detailed description of methods available from generator objects, see Generator Usage Guide.
A `distribution' is an object which takes as its input a stream of (uniform) numbers from a random generator, and delivers as it output a stream of numbers that conforms to the desired statistical distribution. Most distribution objects tailor their output on the basis of parameters that you can set, for example Mean and Variance for the Normal distribution.
Each Swarm simulation comes with 3 pre-defined uniform distribution objects (defined in random/random.m):
id <UniformIntegerDist> uniformIntRand; id <UniformUnsignedDist> uniformUnsRand; id <UniformDoubleDist> uniformDblRand; |
These distribution objects have all been set to draw their random numbers from the predefined random generator `randomGenerator' (discussed above). These objects can be accessed from anywhere in your program.
You may draw (pseudo)random numbers uniformly distributed over a specified range this way:
int imin=-10, imax=10; myInteger = [uniformIntRand getIntegerWithMin: imin withMax: imax]; unsigned int umin=600, umax=900; myUnsigned = [uniformUnsRand getUnsignedWithMin: umin withMax: umax]; double dmin=0.375, dmax=0.665; myDouble = [uniformDblRand getDoubleWithMin: dmin withMax: dmax]; |
Should the `min' value that you specify be greater than the `max' value, the distribution will switch them for you. If they are equal, then that value will be the result returned. (Note that if you only need uniform floating-point values between 0.0 and 1.0, you don't need a distribution -- any random generator will give you that. See above.)
Each distribution object must be assigned a random generator to draw from when it is created. You may create a new generator for each distribution. Or, you may connect multiple distribution objects to one generator, so that they end up drawing output from the generator in an interleaved fashion. (This is what has been done with the predefined distributions.)
If you create distribution objects using the +createWithDefaults method Distribution Usage Guide, each distribution object is assigned its own, newly created, private random generator. Each distribution class uses a different class of default random generator, just to keep things as statistically independent as possible.
HOWEVER: note that if you do not use the --varyseed command line switch, two different distribution objects of the same class, created with the +createWithDefaults method, will end up with generators of the same class that use the SAME starting seed, and so their output will be the exact same sequences. Their output will then be perfectly correlated, rather than statistically independent which is what we normally want. Beware!
To create your own distribution object, for example a Normal distribution, you would do this:
#import <random.h> id <NormalDist> myNormalDist; |
To create this distribution object and connect it to the predefined MT19937 generator:
myNormalDist = [NormalDist create: [self getZone] setRandomGenerator: randomGenerator]; |
Or, if you want it to have its own private generator:
myNormalDist = [NormalDist createWithDefaults: [self getZone]]; |
Note that in this case, if you want to set the generator's seed you can do it as follows:
[[myNormalDist getGenerator] setStateFromSeed: 9874321]; |
Each distribution has its own set of key parameters. You may deal with these parameters in three different ways:
you assign a set of default parameter values to the object on creation, and draw from the distribution using those parameters. For example:
#import <random.h> id <NormalDist> myNormalDist; double sample; myNormalDist = [NormalDist create: [self getZone] setGenerator: randomGenerator]; [myNormalDist setMean: 0.0 setVariance: 2.1]; sample = [myNormalDist getDoubleSample]; |
you may refrain from assigning default parameters, in which case you must specify the (possibly different) desired parameters on each call.
#import <random.h> id <NormalDist> myNormalDist; double sample; myNormalDist = [NormalDist create: [self getZone] setGenerator: randomGenerator]; sample = [myNormalDist getSampleWithMean: 0.0 withVariance: 1.3]; |
You can (re-)set the default parameters any time, and you may call for a variate with specified parameter values even if different default parameters have been set. But note well: doing so does *not* reset the default parameters. Thus if you set the default parameters:
[myNormalDist setMean: 0.0 setVariance: 2.1]; sample = [myNormalDist getDoubleSample]; // from N[0.0,2.1] |
sample = [myNormalDist getSampleWithMean: 1.0 withVariance: 3.6]; |
sample = [myNormalDist getDoubleSample]; |
For a more detailed description of the methods available from distribution objects, see the Distribution Usage Guide.
DO NOT use generators with bad statistical properties. See Advanced Usage Guide for a discussion of the generators implemented in this library.
DO NOT use generators whose maximum cycle length is too short for the intended application; you don't want your generators to start repeating themselves. Be especially aware of this if you use the PMMLCGgen class of generator; these have good properties but a fairly short cycle. See Advanced Usage Guide to read more about how to select a generator.
AVOID having generators in your simulation run in `lock-step', producing output that is statistically correlated. This may happen if you have several generators of the same class, all started with the same default seed.
Be aware that even the best generators can have unexpected correlations with particular implementations of some models. As a result, in some cases using a "better" random number generator can result in worse (less correct) model behavior than one could obtain when using a "bad" generator. If you suspect your model may have this kind of problem, you probably should re-run some experiments using a different underlying generator, to make sure the results are (statistically at least) the same. (For examples of this, see the references [Ferrenberg et al 1992] and [Nature 1994].)