Re: Optimizing for speed

New Message Reply About this list Date view Thread view Subject view Author view

Marcus Watts (mdw@umich.edu)
Tue, 17 Feb 98 20:09:20 -0500


You wrote:
> Date: Tue, 17 Feb 1998 15:23:19 -0800
> From: Chuck McManis <cmcmanis@freegate.com>
> To: CodherPlunks@toad.com
> Subject: Optimizing for speed
>
> So I've got some algorithms that I'm trying to squeeze more speed
> out of, and while I'm not quite as fanatical as the DES search
> team I do find myself at a loss for tools. My environment is
> VC++5 on a Win95 platform and what I'd like is something like :
>
> long long t1, t2;
> t1 = snap_time(); /* time in nSecs */
> ... do something ...
> t2 = snap_time();
> printf("This run took %ld nS\n", (long)(t2 - t1));
>
> I've yet to find something in my search through various
> manuals...
>
> --Chuck
>

You (probably) won't find the exact function you want, because
what you're asking for is very machine specific, and also because
the exact thing you asked for isn't real useful. There are a *lot* of
nano-seconds in a second, which means the function you want won't
work for anything that takes more than 2 seconds. What some
systems (for instance, AIX) supply is a function, gettimer, that
can be used to obtain a structure that can (in theory) supply
timer resolution in nano-seconds. (In practice, the resolution
depends on the hardware, and clock_getres acn be used to
determine the actual resolution).

*HOWEVER*, there is a second problem -- in many cases, the hardware
simply doesn't permit that much resolution. 1 ns works out to
1000 Mhz, faster than the clock that drives most current computers,
and faster than even most TTL logic. It is more common to find
timers that have, at best, microsecond resolution, and it may
be "complicated" even to get that much resolution. For instance,
the 8253 programmable interval timer used in the IBM AT (and still
present, in some form or another, on most modern pentium systems)
increments at 1.19Mhz, for a resolution of only about 838 ns.
It can also be "expensive" to make use of this much resolution.
For instance, just reading the timer may require wait states (it's
nmos logic, not ttl) and since the timer is byte addressable,
there's math involved to convert that into a useful number.
In fact, so far as I can tell, freebsd doesn't bother to do
any of this, but increments the realtime clock in units
of 10 milliseconds.

Even if the hardware permits that much resolution, and the OS &
library can deliver you the numbers, it may still not be useful.
At the nano-second level, a lot of things are happening in the
machine, that can have a drammatic effect on timing. For instance,
any interrupt processed by the machine is likely to take many
microseconds to process. Some interrupts (clock) will be periodic,
other interrupts (page faults, system calls) will be synchronous with
respect to system activity, yet other interrupts (disk, keyboard,
ethernet) will be asynchronous. There are other factors as well,
which are even harder to take into account, such as cache hits,
instruction pre-fetching, MMU overhead, and other factors, which
make timing down at the nano-second level pretty dicey.

Many Unix systems support "gettimeofday" and "getrusage" which
can be used to get the real time clock, or the CPU clock for
a particular process, in units of microseconds. Again, the actual
resolution is likely to be more on the order of 10 ms.

Instead of doing this kind of timing, it's more common to
run the thing in question many times, then divide the accumulated
time by the number of runs, to get an average resulting performance.
This isn't exactly realistic, because it won't account for
page faults and cache misses, but it should give a good
"best" case number for how fast the thing is.

Now, having said all that, I have to admit I don't know much about the
facilities win95 provides. If you were using a Unix machine, I'd point
you at http://www.cyberspace.org/~mdw/mdwbench.html but you aren't, so
that won't be too helpful. One facility win95 almost certainly
provides is "unsigned long time(unsigned long *)", which should return
the realtime # of seconds (nominally since 1 Jan 1970, but windows
almost certainly uses a different epoch) [time will both return it, and
store it in the indicated pointer if it's not null. Think of it as
an opportunity for creative expression.] A really simple solution
would be to call time, run your function 1000000000 times, then call
time again - thus:
        unsigned long st, en, i;
        st = time(0);
        for (i = 1000000000; --i >= 0; )
                do it
        en = time(0);
        printf ("It took %d ns on avg\n", en-st);
this will return elapsed time, not cpu time. So, make sure you don't
have any background processes eating CPU when you run this. Also,
run it several times, to be sure you get numbers that are close to each
other. You may want to adjust the counter & math -- this function will
take about 4 hrs to run if "it" takes only 20 microseconds.
If you can schedule a timer interrupt and do the math yourself,
it may be easier to time unknown functions.

I saw a nifty book at Borders on Win32 "threads" programming.
I didn't bother to get it, because I don't do much windows programming,
but I think this might contain much useful information for you.

                                -Marcus Watts
                                UM ITD PD&D Umich Systems Group


New Message Reply About this list Date view Thread view Subject view Author view

 
All trademarks and copyrights are the property of their respective owners.

Other Directory Sites: SeekWonder | Directory Owners Forum

The following archive was created by hippie-mail 7.98617-22 on Fri Aug 21 1998 - 17:14:55 ADT