Eric Young (eay@cryptsoft.com)
Wed, 11 Mar 1998 23:03:55 +1000 (EST)
On Wed, 11 Mar 1998, Svend Olaf Mikkelsen wrote:
> On Tue, 10 Mar 1998 21:48:06 -0800, Alex Alten <Andrade@ix.netcom.com>
> wrote:
> >Interesting.  A while back I looked at the DES key setup costs.  Roughly 80%
> >of the time was spent in key setup as I changed the key for each small payload
> >(about 100 bytes).
> For DES, key setup time can be traded for memory. My guess would be
> that key setup using 32K tables can be as fast as an encryption.
I'll give numbers for my library, I've compiled everything using gcc under
linux, pentium pro 200.  The key setup is always in C, the cipher speed is for
encrypting 8 bytes in cbc mode.  These numbers are only approximate, but they
give a good approximation.
                set_key		C		asm		cost
DES		  4.830uS	 2.658uS	1.709uS		   14.5 bytes
3-DES		 14.490uS	 7.141uS	4.827uS		   16.2 bytes
IDEA		  1.507uS enc	 2.930uS			    4.1 bytes
                 72.787uS dec					  198.7 bytes
RC2		 16.532uS	 4.451uS			   29.7 bytes
RC4		 18.467uS	 0.521uS	0.355uS		  283.5 bytes
RC5-32-12	 10.610uS	 1.580uS	0.585uS		   53.7 bytes
Blowfish	997.606uS	 2.116uS	1.042uS		 3771.6 bytes
CAST5		  5.575uS	 2.476uS	1.124uS		   18.0 bytes
'cost' is the number of bytes that could have been crypted during a key setup.
So as you can see, a blowfish key setup takes the same amount of time as
encrypting 3.7k bytes.  DES is actually very good for key setup cost.  IDEA is
fast for encryption, but decryption is rather ugly.  A half way value (100
bytes if probably a reasonable value if there is a bi-directional channel.
(I've compaired C set_key code to C code cbc mode encryption).
<gratuitous personal trumpet blowing mode on>
I put the most work into the DES key setup, it uses a 2k table.  When I
implemented DES (many many years ago) I was mostly playing catch up to people
like Richard Outerbridge's implementation.  The key-setup is where I feel I
contributed something new to the state of DES software.  For the other
ciphers, I put a reasonable amount of effort into optimising the key setup,
and for C code they are quite good.  Normally I put in differnt version of the
same code which helps differnt CPU/compiler combinations.  Sparcs like
*(a++), x86/gcc likes a[i], a[i+1], a[i+2], a[i+3], i+=4.  I have not done
this yet for RC5.
<gratuitous personal trunmpet blowing mode off>
Ignoring aspects like security :-), des and cast are the best ciphers for lots
of small messages with differnt keys.  Most ciphers (except blowfish
and IDEA decrypting) seem to have approximatly the same key setup time, the
problem is when the ciphers are really fast, the differnce between setup and 
encrypting is more obvious :-).
For digests this is even more pronounced, digesting lots of small blocks is
very expensive vs a few large blocks.   There are a few more
optimisations I could do for small blocks (the following numbers are for x86
assember on the above mentioned ppro 200),
The 'numbers' are in 1000s of bytes per second processed.  The 'bytes' is
the size of the block being repeatedly digested.
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               1926.01k    11470.49k    22459.73k    29396.99k    32549.55k
hmac(md5)          743.07k     5237.76k    14173.78k    24719.70k    31716.69k
sha1              1343.88k     6485.17k    11660.33k    14642.18k    15794.18k
rmd160            1044.77k     5140.44k     8980.74k    10987.86k    11840.17k
rc4              14856.79k    21498.45k    22432.68k    22529.02k    22629.03k
So the cost of doing an sha1 mac is 11 times the cost of RC4 encrypting the
data. In SSLv3 terms, sha-1 digest 8 bytes, then RC4 encrypt the 28 bytes(?),
the digesting takes 7 times as long as the encryption.  For TLS, it becomes
almost twice as bad, even when using md5 for the HMAC. 
anyway, enough numbers for today :-)
eric
The following archive was created by hippie-mail 7.98617-22 on Fri Aug 21 1998 - 17:15:56 ADT