Avalanche analysis of the Arc4 CSPRNG

New Message Reply About this list Date view Thread view Subject view Author view

Bill Frantz (frantz@communities.com)
Mon, 02 Nov 1998 13:09:46 -0800


On October 2, 1998, I proposed using the Arc4 algorithm as a Continuously
Seeded PseudoRandom Number Generator (CSPRNG). The major insight in that
proposal was that if you did not initialize the permutation which is the
heart of the internal state of Arc4, and ran the key schedule again, both
the old state and the new seed would effect the subsequent state.

I later received an anonymous communication sent thru a well known
anonymous remailer suggesting that the effects of seeding Arc4 did not have
enough avalanche, and suggesting empirical testing. The suggested test
method was to check that the output of 2 Arc4s differed in 1/2 of their
bits when the seed differed by one bit. This post is the result of that
suggestion.

Since this is a long post, I will start with the conclusions:

(1) Anonymous is indeed correct that a single bit change in a 128 bit key
does not avalanche rapidly. The numbers don't stabilize until 256 bytes
have been withdrawn from the reseeded generator.

(2) There is an avalanche problem with the 256th byte to be withdrawn
because it is processed when the Arc4 I == 0. Its statistics are much
worse than the 255th or 257th bytes. This problem also affects the Arc4
cypher.

(3) Anonymous' suggestion of modifying the key schedule to run from the
current Arc4 I backwards and loop around to I+1 instead of running from 0
to 255 does not improve the avalanche enough to be a fix for the problem.

Roos' weak key attack on the Arc4 cypher effects the early outputs, so it
is important to analyse them separately from later bytes. Since the bulk
of the state is encoded in the permutation of the 256 possible bytes in the
S box, just checking whether 1/2 of the bits are different might not detect
a lack of avalanche.

The tests used random numbers obtained from java.security.SecureRandom
(Java 1.1.6). This generator is based on running SHA1 over a seed and a 64
bit counter. I believe it is sufficiently random for use in Monte Carlo
testing.

The inner loop of the test uses two instances of Arc4, which have been
seeded with a random seed that differs in one randomly selected bit. It
updates two arrays of counters. The index for these arrays is the output
order, with the first byte after reseed being index zero. One array keeps
track of the number of times the output from both instances was the same
(i.e. the Xor was zero). The other array keeps track of the number of bits
that were different.

The outer loop runs this test a number of times. Each time the test is
run, it uses one of the Arc4 instances in it's current state and clones it.
 The major testing parameters are the number of times the outer loop is
run, and the number of bytes extracted from each instance by the inner
loop. For most of the tests, the outer loop was run for 1,000,000 times,
and 1024 bytes were compared from each instance.

The output routines compare the observed number of equal outputs with the
expected number. If there is a greater than 10% discrepancy, the
observation is printed. In any case, the maximum and minimum values
observed are reported as a percentage of the expected value.

In addition, the number of differences in the output bits is compared with
the expected value and printed.

As a reality check on the code, a run was made where the seeds differed by
zero bits. The output had the expected values. Another run was made where
the seeds were two different random numbers. The output looked like:

    Separate random numbers for seedin, normal key schedule
    private static final int LOOP_COUNT = 1000000;
    private static final int BYTES_PER_TEST = 1024;

Making random number generator
Starting tests

expected number of zeroes=3906.25
   skipped 1024
MaxPercentOver=104.6272 MinPercentUnder=94.848
expected bits per nonzero byte=4.0156865 observed=+0.005341053,-0.00450325
totalzeroes=4000625 totalLeading=3963

When the test was run on the version of the Arc4 CSPRNG I described on
October 2, the output showed significant problems with avalanche. (N.B.
The zeroes reported in the output are when the Xor of the two outputs is
zero.)

 LOOP_COUNT = 1000000, BYTES_PER_TEST = 1024;

expected number of zeroes=3906.25
zeroesByOrder[0]=570085, 14594%
zeroesByOrder[1]=455614, 11663%
zeroesByOrder[2]=366720, 9388%
zeroesByOrder[3]=295586, 7567%
zeroesByOrder[4]=238699, 6110%
zeroesByOrder[5]=193725, 4959%
zeroesByOrder[6]=157646, 4035%
 the values continue descending (following are samples)
zeroesByOrder[33]=8940, 228%
zeroesByOrder[34]=9063, 232%

zeroesByOrder[62]=5997, 153%
zeroesByOrder[63]=5935, 151%
zeroesByOrder[64]=5917, 151%
zeroesByOrder[65]=5831, 149%

zeroesByOrder[126]=4321, 110%
zeroesByOrder[127]=4324, 110%
zeroesByOrder[128]=4349, 111%

zeroesByOrder[137]=4303, 110%
   skipped 117 - less than 10% difference from expected
zeroesByOrder[255]=113991, 2918%
   skipped 768 - less than 10% difference from expected

MaxPercentOver=14594.176 MinPercentUnder=95.5648
expected bits per nonzero byte=4.0156865 observed=+0.0037994385,-0.014258385
totalzeroes=7318254 totalLeading=1382696

Reading 256 bytes from each instance as part of the seeding process seems
to have cured most of the avalanche problems.

    LOOP_COUNT = 1000000, BYTES_PER_TEST = 1024;

expected number of zeroes=3906.25
   skipped 1024
MaxPercentOver=105.4976 MinPercentUnder=94.9504

expected bits per nonzero byte=4.0156865 observed=+0.0040750504,-0.004702568

totalzeroes=4010561 totalLeading=4064

Anonymous suggested running the seeding operation from the current value of
Arc4's I down to zero and then looping around back to I+1. I tried it and
it doesn't look much better than running it in the normal direction.

    One bit difference in seedin, key schedule descends from myArc4I
    private static final int LOOP_COUNT = 1000000;
    private static final int BYTES_PER_TEST = 1024;

Making random number generator
Starting tests

expected number of zeroes=3906.25
zeroesByOrder[0]=570204, 14597%
zeroesByOrder[1]=455621, 11663%
zeroesByOrder[2]=366186, 9374%
zeroesByOrder[3]=294396, 7536%
zeroesByOrder[4]=238275, 6099%
zeroesByOrder[5]=193729, 4959%
zeroesByOrder[6]=158027, 4045%

zeroesByOrder[32]=9512, 243%
zeroesByOrder[33]=9060, 231%

zeroesByOrder[62]=5826, 149%
zeroesByOrder[63]=5807, 148%
zeroesByOrder[64]=5875, 150%
zeroesByOrder[65]=5820, 148%

zeroesByOrder[126]=4460, 114%
zeroesByOrder[127]=4377, 112%
zeroesByOrder[128]=4309, 110%

zeroesByOrder[190]=4013, 102%
zeroesByOrder[191]=4077, 104%
   skipped 63
zeroesByOrder[255]=113857, 2914%
   skipped 768
MaxPercentOver=14597.223 MinPercentUnder=95.3856
expected bits per nonzero byte=4.0156865 observed=+0.0035147667,-0.007976055
totalzeroes=7314658 totalLeading=1381841

Future work:

Investigate whether using 20 byte seeds (e.g. from SHA1) makes any
difference in avalanche effect.

Investigate whether using Knuth's algorithm P instead of the Arc4 key
schedule makes any difference.

Investigate avalanche behavior with 256 bytes seeds. This size seed will
result in only one bit difference in the expanded seed used by the key
schedule.

Bill Frantz Electric Communities
Capability Security Guru 10101 De Anza Blvd.
frantz@communities.com Cupertino, CA 95014
408/342-9576 http://www.communities.com


New Message Reply About this list Date view Thread view Subject view Author view

 
All trademarks and copyrights are the property of their respective owners.

Other Directory Sites: SeekWonder | Directory Owners Forum

The following archive was created by hippie-mail 7.98617-22 on Sat Apr 10 1999 - 01:17:17