mmmPI wrote: Thu May 22, 2025 12:50 pm
If i understand properly, you talk only about the last step in the logic, the one where a very large number in the order of magnitude of the seed// 32 bits is made into one in a range that would be the user end result ? by doing %N operation on it, if 0-N is the desired range for RNG in user's mind ?
This would be the same for both posted setup i think; because that's also the math i used to cap the randomly generated number in a range, that's why i'm curious, and don't want to miss a nuance. I don't understand when using %8 or %10 would yield a result that is "not correct" when using %2 would. like could you pick a few number to illustrate the modulo bias ( for dummy ? ).
Yes, this is what I meant, and it seems like you understand that part pretty well from your own research now. Usually when generating random numbers on a computer we do think about it as two separate steps like this: getting "some randomness" from somewhere (e.g. our 32 bit output from the LCG), and then using that randomness to pick a specific value from the distribution that we actually want.
The second step isn't usually considered part of the random number generator itself because the same algorithms can be used with any source of randomness (as you mention here, we can do the same thing with the LFSR or the LCG, it doesn't matter) - this step is "just math", but if done incorrectly it can introduce new biases that weren't present in the original randomness, like the modulo bias.
%8 is fine though here, because 8 is a power of two: it's 2^3, and so divides evenly into our original 32-bit random value. But, yeah, if we want a value between 0 and 9, just using %10 is going to be slightly more likely to generate 0-5 and slightly less likely to generate 6-9.
We can get a uniformly distributed random value between 0 and 9 by first taking the value %16, which *does* give us a uniform random distribution between 0 and 15 since 16 is a power of two, and then if the result is 10-15 we just throw it away and start over. This will usually only take a couple of tries, but it might take longer if we get unlucky, and it's quite inconvenient to have a variable-time algorithm like this when using Factorio combinators, so unless we *really* care about the result being perfectly uniform we probably wouldn't bother.
That may be "beyond my current mathematical understanding" but it seem that just test "frequencies of occurences" and is not the same as "uniformly distributed" in that if you just do x+1 => x, it would "pass" the test and show every digit being choosen the same amount of time on average, like "fair dice", but it would be "bad randomness" , because of strong corellation between a number and the next one ? Was that what i'd missed earlier ?
You are correct that just testing how often each value occurs doesn't tell you that it is really random, but if the values *don't* occur equally often then it's definitely *not* a uniform random distribution - having them occur equally often is necessary but not sufficient. Correlations between the values in a supposedly-random sequence, as you say, is another important property, and this is where things get extremely complicated, because there's a pretty-much infinite number of possible ways that the values might be correlated with each other, and therefore also a pretty-much infinite number of tests we could try.
Your example here suggests one simple test: we could take the *difference* between consecutive values in the sequence and see if the differences look uniformly distributed. The "x+1 => x" generator would fail this test very quickly, since the difference would always be 1 - we wouldn't need much data to conclude that there's a correlation. But, other generators might pass this test but still fail other more complex tests. Which tests are most useful is a whole complex area of math that's beyond my expertise - I'm just a computer scientist, not a mathematician
There's also a separate property here that's important for security/cryptography/other cases where you have an adversary, though, which is "given some numbers from the random sequence, how easy is it to predict future values in the sequence". The LCG discussed here is terrible at this: if we know the specific parameters being used (the multiplicand and addend) then all we need to know is any one value from the sequence, and then we can predict *all* future values perfectly. And, since the algorithm here is a common, standard one, and the parameters used are well-known ones, we can easily guess all of this: if we have two values from the sequence, we can very quickly try a bunch of different common PRNG algorithms/parameters and see if any of them do in fact turn the first value into the second value.