How random is Rnd

  • Thread starter Thread starter Joseph Atie
  • Start date Start date
J

Joseph Atie

Ive got a piece of code that is supposed to generate a random 9 digit number.
the problem comes when i take the output from this function and use it to
generate some sql to insert records to a mysql db

the 9 digit code is supposed to be my primary key in the mysql db.
unfortunate im getting duplications on my primary key.

ive tried to increase the how random the number is by mutlipling 3
iterations of the rnd fuction (it is randomize each time)

but no matter what i try to make the number more random i still end up with
say 5 duplications over 1000 records. at the duplications

here is the code im using

Public Function RandomNumbers(Lowest As Long, Highest As Long, _
Optional Decimals As Integer)
If IsMissing(Decimals) Or Decimals = 0 Then
Randomize
factor = Rnd
Randomize
factor = factor * Rnd
Randomize
factor = factor * Rnd
RandomNumbers = Int((Highest + 1 - Lowest) * factor + Lowest)
Else
Randomize
factor = Rnd
Randomize
factor = factor * Rnd
Randomize
factor = factor * Rnd
RandomNumbers = Round((Highest - Lowest) * factor + Lowest, Decimals)
Randomize
RandomNumbers = RandomNumbers * Rnd + Lowest
End If
End Function

Any thoughts?
 
Joseph said:
Ive got a piece of code that is supposed to generate a random 9 digit number.
the problem comes when i take the output from this function and use it to
generate some sql to insert records to a mysql db

the 9 digit code is supposed to be my primary key in the mysql db.
unfortunate im getting duplications on my primary key.

ive tried to increase the how random the number is by mutlipling 3
iterations of the rnd fuction (it is randomize each time)

but no matter what i try to make the number more random i still end up with
say 5 duplications over 1000 records. at the duplications

<snip>

Any thoughts?

I'd say it's working as expected. The thing is that Rnd() give you
random results. It does not guarantee that the result won't be unused.
Else, it wouldn't be much of a random generator, no?

The solution I've seen implemented is to create table containing the
full range of possible combinations... in your case, 9 digits numbers,
then do something like this:

SELECT id INTO @newSeed
FROM theSeed
WHERE unUsed = True
ORDER BY Rand()
LIMIT 1;

UPDATE theSeed
SET unUsed = False
WHERE id = @newSeed;

You can wrap this in a stored procedure and do it as a single
transaction (if you're using InnoDB) and call the procedure from Access
to get the new seed.

HTH.
 
i like that

i think ill just use a query to check the value is unique before i write it
to the table, else run again, that will fix my problem nice and easy.

thanks alot

that being said the math says the chance of getting the same number from a 9
digit code in the limited space of 1000 records is very slim, getting 5 or 6
duplicates should be astronomical.
 
Joseph said:
i like that

i think ill just use a query to check the value is unique before i write it
to the table, else run again, that will fix my problem nice and easy.

thanks alot

You're welcome. It's just easier to guarantee that you're working with a
pool of available seeds than selecting one blindly then check if the
seed is available after the fact, not to mention much more efficient.
that being said the math says the chance of getting the same number from a 9
digit code in the limited space of 1000 records is very slim, getting 5 or 6
duplicates should be astronomical.

Well, the trouble with any RNGs is that they're not really random. I
honestly don't know how well designed Rnd() in Access or Rand() in MySQL
is, but I've heard someone recommending against Rnd() because they
weren't quite random enough.

When you think about it, we're trying to get a random number from a
deterministic machine... a feat in itself! :) The RNGs rely on various
mathematical functions that can only approximate random generation.

Whether getting 5-6 out of 1000 often enough is a flaw/bug/limitation, I
do not know.

HTH.
 
You should NOT multiply the random numbers together. Doing so, you highly
bias the result toward 0. Indeed, x*x*x when 0<= x < 1 is probably a
number closer to 0 than 1. As example, the ''mean" could be roughly
approximated as being 0.5 for each 'draw', so if you multiply three draws,
the mean approximatively becomes 0.5 * 0.5 * 0.5 = 0.125, in other words,
roughly half your numbers would fall between 0 and 1/8.


I would also randomize just once.

If you highest-lowest is low over the number of draws, it is 'normal' to get
duplicated values. As example, if you draw 100 times, over values from 1 to
6, you are due to get a lot of dups. The problem is more about the
restricted interval, than about rnd itself.



Vanderghast, Access MVP
 
I'm not familiar with MySQL,

But you should be able to set a Unique index on that field, and then just
trap for the error which will occur if you try to write a duplicate. If the
error occurs, just generate a new value.

I believe this would be quicker than running a query to determine if the
number already exists, but would try it to see which is quicker.
 
Having heard someone saying something is a very weak technical point... and
how, exactly, the claim can be dismissed, or corroborated? on what
technicality? on what aspect? what was trying to accomplish the one who said
that and in what is it similar to what you want to accomplish? In what is it
helping anyone?


Vanderghast, Access MVP
 
vanderghast said:
Having heard someone saying something is a very weak technical point...
and how, exactly, the claim can be dismissed, or corroborated? on what
technicality? on what aspect? what was trying to accomplish the one who
said that and in what is it similar to what you want to accomplish? In
what is it helping anyone?

First, I'm glad that you pointed out about multiplying rnd() biasing the
results. I didn't know that and am glad to know it now.

As for the (unsubstantiated!) claims I made, I suppose that was one time
where it might was better to stay silent. If I had the link to the post
where I saw, I would have had linked it. I'm willing to be shown wrong,
but I've always understood that the bias of RNGs is a general problem;
we can make it better but never as good than if we used specialized
hardware to accept input from outside environment as a seed. In case of
Access's Rnd(), I understand that the default seed is timestamp and does
decently but it will show a bias not because Rnd() is flawed but because
of the machine's nature. If I'm wrong, I'd love to learn more.

My intention was merely to illustrate the general problem of coaxing a
non-deterministic result from a deterministic machine, and while I
probably didn't make it quite clear, the bias will become more evident
as the pool of available candidates become smaller which was why I
believed it made more sense to choose from the available pool, rather
from the all possibilities then testing after the fact to see if it's
not used yet.

Hope it helps clears up my intention for the meandering thought I left
behind.
 
Thanks to supply precisions.

It is true that supplying a pure random number is not trivial (if we add the
constraint that the distribution should be uniform, ie, any number has the
same probability to occur), but the linear congruential method pass many
statistical tests ( See "The art of computer programming", Vol. 2, by Donald
Knuth, at Addison Wesley, Chap. 3 ), while it may also fails specific
requirements such as those required by a Monte Carlo simulation (still
accordingly to Knuth, it is way over my head, or at least, over my
interest).


There is a somehow small interesting theorem that pops out of reading that
section a new time. Assuming we use:


X[n+1] <- ( a * X[n] + c ) modulo m


then, if "m" is a multiple of 2 , then taking "a" so that a modulo 8
= 5
or if "m" is a multiple of 10, then taking "a" so that a modulo 200 =
21

and c = 1 or "a", or not have a common factor with m,

then "the random number generator will produce all m different possible
values of X before it starts to repeat" ( ref already given, section 3.6
ii) )


So, that is a technique to produce "m" numbers without repetition.




Vanderghast, Access MVP
 
vanderghast said:
Thanks to supply precisions.

It is true that supplying a pure random number is not trivial (if we add
the constraint that the distribution should be uniform, ie, any number
has the same probability to occur), but the linear congruential method
pass many statistical tests ( See "The art of computer programming",
Vol. 2, by Donald Knuth, at Addison Wesley, Chap. 3 ), while it may also
fails specific requirements such as those required by a Monte Carlo
simulation (still accordingly to Knuth, it is way over my head, or at
least, over my interest).

Thank you very much for the reference. I've heard the name before.
Probably should try and see if I can find his book somewhere. It also
sounds like I may have had overstated the bias of RNGs. A while back, I
came across a blurb (and we know that everything on internet is true ;)
) that most RNGs are not in fact really random- they're pseudo-random at
best, leaving me with impression that they can produce seemingly random
results, they fall short of the stated goals.

Anyway, do we know whether Access's Rnd() uses linear congruential method?
So, that is a technique to produce "m" numbers without repetition.

Fascinating, indeed! I suppose that considering the OP was using MySQL
server, it should be easy to put this into a stored procedure and get
the appropriate number. The only trouble, though is that if there's
already a pool of used keys, this may not work and in such cases, I'd
just create a table of "available seeds" and randomly choose between
those instead of trying to devise an algorithm that will reliably choose
unused key out of a full range of possibility where some keys within the
range may have been used.
 
I cannot say that Access Rnd uses congruential method, but I am almost sure
it uses a method common to all VBA-flavors out there, including VB6, and
that Microsoft developers know about Donald Knuth's books (it is an 'old'
book... 1969 for the first edition ). Note that the congruential method *is*
a pseudo-random generator, but one which is relatively quite strong.


Vanderghast, Access MVP
 
Randomicity does not imply uniqueness. In true randomicity it would be
perfectly acceptable to generate the same number three times in a row.
Highly unlikely but, perfectly acceptable.

There was a great discussion on the subject on thedailywtf.com.

Anyway, as has been mentioned elswhere in this thread, multiplying
your Rnd()'s actually decreases your randomicity. It looks like you
hit on the right thing to do: generate your number then check the
table to see if it has already been used. If it has, generate a new
number and check again.

Regards,
RD
 
Back
Top