comparing CPUs running blowfish [Re: fastest blowfish.asm?]

New Message Reply About this list Date view Thread view Subject view Author view

Adam Back (aba@dcs.ex.ac.uk)
Wed, 25 Mar 1998 01:57:40 GMT


This could be read as a story about the vagaries of attempting to tune
assembler language on the various pentiums and clones...

Follows is timings of a AMD K6 MMX, AMD K5 and Intel MMX all clocked
at 166 Mhz.

The programs being tested are Eric's blowfish C code in libbf-0.7.2m
and the C code in libbf-0.8.2b and the 586 tuned assembler code in
libbf-0.8.2b, and the 686 (pentium pro) tuned? (or was it not tuned
but just happened to be faster?) assembler code in libbf-0.8.2b.
(Code available from [1], compilation script [2])

Using raw ecb bytes / sec as performance metric.

                        AMD k6 AMD k5 Intel Intel Pentium Pentium
                        MMX non-MMX MMX non-MMX pro II
======================================================================
bfs-072m-gcc 5.8 4.9 4.6
bfs-082b-gcc 3.2 3.3 2.9
bfs-082b-586-asm 5.9 5.8 8.3
bfs-082b-686-asm 4.8 5.9 6.8
======================================================================

perhaps someone with appropriate hardware could clock their CPU to 166
(or even just adjust results, it adjusts fairly linearly as it's I
think all in primary cache).

The Intel MMX really wins with Eric's VTuned code as compared to the
other CPUs. Interestingly the AMD k6 seems to do generally better on
the C code version. I am left wondering how an AMD k6 would perform
if there were a hand tuned version targetted for it. (I guess VTune
doesn't know about AMD k6 specific coding tricks).

None of the CPUs seem to like the gcc (C only) code in libbf-0.8.2b
(at least under linux with plain gcc-2.7.2). the libbf-0.7.2m C code
is performing much faster. Unless I am doing something dumb, or there
is a bug in the timing code for one of the versions.

k5 and k6 is a different architecture as well as k6 having MMX
instructions and k5 not, this explains them being faster on different
tests. Similar perhaps to comparing Pentium Pro with Pentium which I
think may performs worse than a Pentium on pentium specific code.

Adam

[1] ftp://ftp.psy.uq.oz.au/pub/Crypto/libeay)

[2] (untar libbf-0.7.2m as that dir (it splurges into current dir),
and libbf-0.8.2b into libbf-0.8.2b (it extracts into `bf')).

Then to reduce risk of different compilation flags across different
makefiles cut and paste this lot.

mkdir test
cd libbf-0.7.2m
gcc -O3 -fomit-frame-pointer -DBF_PTR2 -m486 -DCPU=586 -c *.cgcc -o bfspeed bf_cbc.o bf_cfb64.o bf_ecb.o bf_enc.o bf_ofb64.o bf_skey.o bfspeed.o
cp bfspeed ../test/bfs-072m-gcc
cd ..
cd libbf-0.8.2b
gcc -O3 -fomit-frame-pointer -DBF_PTR2 -m486 -DCPU=586 -c *.c
gcc -o bfspeed bf_cfb64.o bf_ecb.o bf_enc.o bf_ofb64.o bf_skey.o bfspeed.o
cp bfspeed ../test/bfs-082b-gcc
cd asm
perl bf-586.pl cpp > bx86unix.cpp
cd ..
gcc -E -DELF asm/bx86unix.cpp | as -o asm/bx86-elf.o
gcc -o bfspeed bf_cfb64.o bf_ecb.o asm/bx86-elf.o bf_ofb64.o bf_skey.o bfspeed.o
cp bfspeed ../test/bfs-082b-586-asm
cd asm
perl bf-686.pl cpp > bx86unix.cpp
cd ..
gcc -E -DELF asm/bx86unix.cpp | as -o asm/bx86-elf.o
gcc -o bfspeed bf_cfb64.o bf_ecb.o asm/bx86-elf.o bf_ofb64.o bf_skey.o bfspeed.o
cp bfspeed ../test/bfs-082b-686-asm
cd ../test


New Message Reply About this list Date view Thread view Subject view Author view

 
All trademarks and copyrights are the property of their respective owners.

Other Directory Sites: SeekWonder | Directory Owners Forum

The following archive was created by hippie-mail 7.98617-22 on Fri Aug 21 1998 - 17:16:13 ADT