| January
28, 2001
v3.03+ results latest: 28
January 2001 (9) |
A
thought provoking last few days...
One of the greatest concerns with
benchmarking is relevance. Does
the result allow comparison of the
relative performance between
systems. Hence the eventual change
to a 0.417 angle range WU just
before Christmas. For those who
don't know the unit was actually
submitted and tested by Beyond
from a large pool he accumulated
specifically for the purpose. He
contributed greatly
to the discussion on
whether to change the WU and it
seemed a small honour to allow him
to choose it (with a few criteria
that I felt were warranted for
inclusion in a benchmark WU).
Though it is possible to construct
arguments over it's absolute
validity - (I'd say nit-pick) I
think it was good choice and the
uptake of benchmarking with it has
been above my expectations - I
wrongly believed that the extra
time involved (combination of
greater processing incurred by
v3.03 and having to using a slower
WU - lower angle ranges process
significantly slower than faster
ones) would deter a lot of you.
A
little benchmark history…
The next
greatest concern was
security/falsification of results.
In my time with RB
compiling the benchmarks, there
were many weird and wonderful
results submitted. But, you may be
surprised to learn, none that
could be labelled
'malicious' or intentionally
trying to misrepresent the speed a
system processed WU's at. Almost
all 'dodgy' submissions were
correctible errors that a
conversation with the owner sorted
out. I felt quite confident that
the numbers put up on the results
tables were to the best of our
abilities accurate and honest. One
up for the SETI/TLC
community I felt. I think it is
safe to say that by
far the majority of TLC
members put processing accuracy
before processing speed
Even those who appear to
froth eagerly when trying to
achieve a few seconds gain would
be quite upset if they discovered
their systems were returning
corrupt data. They would alter it
to a more stable, SETI
useful configuration (we are
talking reducing the clock speed
mainly but there are other
factors) if they knew there was a
problem.
As processing time depended almost
exclusively on the angle range of
a WU there was some merit in the
idea of allowing people to submit
any time as long as it utilised
a 0.417 WU. This was vetoed as no
easy checks could be made on the
submissions accuracy. With the
introduction of the new WU
benchmark it was possible to
tighten up on the security of the
numbers you submitted by asking
for the
result.sah
(a file generated by the SETI
client containing everything
that Berkeley
wanted to know about you and the
data you had extracted from the
WU). At the time of asking I
didn't appreciate how important
this would become, initially only
thinking of it as a useful
double-check. I could scan the
header info for the cpu time, OS
and client version and even
whether the right WU had been used
to run the benchmark. Even more
you could examine the
spike/pulse/triplet details for
anomalies.
Unsettling
news...
The 3.03
benchmarks started to trickle in
and a few results were obviously
in question because they contained
extra spikes or the values were
ridiculous. About this time along
came Roelof
and the fun started - not
content with my slow old ways he
cobbled together some code to
check the result.sah far
more thoroughly than a mere
eyeballing could achieve. Boy! Did
the results ever give us a shock.
Many of them produced on
highly-overclocked processors
contained errors. If you overclock
there’s a good chance you fall
into this category. Many of you OC
to the point where it locks and
then reduce it a few MHz believing
it to be now ‘stable’. The
debate about whether overclocking
and its ramifications was
acceptable in a scientific
enterprise suddenly loomed up. It
seemed that although machines
completed the benchmark in
seemingly reasonable times the
results showed that errors aplenty
had been generated in the result.sah
file! So actually reaching 100%
completion of a WU is not a satisfactory
measure of your systems
reliability. You cannot be sure
that your machine is producing
kosher results just because it
completes WU's at a close to
average time! Just because you can
play games, burn CD's and run the SETI
client concurrently does
not give any guarantee your system
is error free.
Criteria
for accepting bench results.
There have been several 'amusing'
reports on alt.sci.seti and
the TLC
forum of people who have had whole
strings (hundreds even) of 2 or 3
minute completions, great for
stats but an obvious anomaly and a
clear indication that some
component in your box was
F.U.B.A.R. But now we know that
entirely acceptable looking
systems are also capable of
creating bad results. Just
altering the overclock by a few
MHz can push your system into
uncharted, error-producing
territory and you will never know.
As far as the results table was
concerned we both agreed that if
the result.sah contained
any errors it would not be
included as a genuine bench. This
obviously set a clear new standard
for benchmarking. Accurate results
became the instant, absolute priority. This new
knowledge is a bit disturbing as
it makes me wonder fundamentally
about the TLC
benchmarks for earlier client
versions. Good faith not
withstanding many could be
erroneous.
History, I guess.
Problems
in paradise...
Ignoring the small matter of TLC
benchmarks, there are some serious
implications here...Berkeley
collects all your downloaded sahs
and compares them for each WU to
cross-check and thereby
authenticate the data (multiple
duplication being an intentional
and necessary statistical
validation) - as soon as two
results (or three or whatever the
requirement is) for a WU are
identical then that becomes the
‘result’ and anything
different can be discarded as in
error. They have a massive
database of results all referenced
by a user id number. A little
analysis would immediately show up
which users were submitting duff
results and which were reliable.
Hard to believe that they haven't
already done data sifting along
these lines already! How much do
they know about corrupt results.
And have they decided to let
people keep on downloading WU's to
avoid bad publicity about the link
between 'competition' (driven in
part by overclockers) and wasted
effort? Very recently the sad,
disturbing events concerning the
‘gti’ hacked client
being used
have surfaced. It’s
initial use by a small caucus of
people has lead to a number of
names high up the SETI
top users page being deleted
or modified to ‘waiting’ (6th,
9th, 11th, 16th and others at time
of writing). I can only assume
that the Berkeley
crowd are furiously sifting
through the many hundreds of
thousands (perhaps topping a
million) of results they submitted
to discover the extent of the
damage. Remember that if the only
results returned for a WU came
from the hacked client then in
effect the WU was not processed at
all. A monster cross-referencing
job is being done right now to
locate unprocessed
or invalidated WU’s.
Since a significant percentage of
WU’s legitimately do not return
any data in the result.sah
in the first place the hacked
clients blank results are as bad
as false data. A superficially
embarassing and unpleasant prank
has actually fundamentally
compromised the SETI
projects results.
The whole distributed arm of the
project has become a multi-headed
monster that though not beyond
control is certainly not firmly on
the rails. It will be sorted out
but the damage is done.
What this
means to us…
Are your overclocked results of
any value at all? Are your
normally clocked results of any
value? Only one way to find out at
present...we now have a reliable
test of stability, run the
benchmark and let Roelof
compare it with his
latest software. The numbers
involved in the result.sah
data have lots of decimal places.
If you produce identical results
you can be confident that for SETI
purposes your
under/normal/over-clocked system
is running clean, smooth and
producing valid data. Roelof
will reply to your
result submission with a short
email letting you know whether
your benchmark was flawed or okay.
Eventually the acceptable ones
will be added to those already in
the table. Even if you were not
thinking of submitting a benchmark
it now becomes the best tool for
validating your system for SETI
work.
Final
thoughts
The
average number of resendings of WU’s
is probably far higher than
expected – due to corrupt result.sahs
it might take several users to
process a WU before two results
were the same. Is there a list of
users whose results are ignored
routinely due to regular
'unrepeatable' returns? A
significant number of results come
from people like us who pride
ourselves on running kit that is
fast but stable (because we think
we know how to do it properly) -
are we deluding ourselves, are
many of our overclocked boxes
producing junk!
(this was written
at leisure several days before the
’hacked’ client surfaced and
has been hastily edited by
Roelof and myself in light of that
fact).
Max
out.
| January
25,
2001
v3.03+ results latest: 24
January 2001 (8) |
Short,
sweet & fast (sounds like my
life)...
An updated results page from Roelof
is up and we have a confirmed kill
sub-5 from Tim
Cole at a monster 4:39.
This is the fastest reliably
verified benchmark so far. I will
scribble more on that subject
(verification) soon. Also new to
the table are ten more results
including Duron, PII Xeon, PowerPC
G3 and Celeron silicon.
Max
out.
| January
23,
2001
v3.03+ results latest: 22
January 2001 (7) |
Ladies
and Gentlemen the Analyst has
entered the building...
As you might have
noticed the results table has been
revised extensively by Roelof
(TLC Benchmark Analyst) and
includes a number of extra columns
which should give the
data-devourers and
comparison-kiddies amongst you a
small thrill. For details about CpF
(cycles per flop) you'd be best
off at the SETI
Spy site which has quite
(understatement) a detailed
explanation. Suffice to say it
gives a reasonable estimate of the
relative efficiencies of
processors.
On the benchmarking scene a number
of things stand out from
submissions so far. Looking at the
systems appearing it seem that
everyone and their 'pets'
are running monster hardware. A
longer time to complete a
benchmark WU does not seem to have
put people off as much as I had
expected! One exception to this is
older slower systems (Pentiums IIs
and K6-2s for instance). There is
also a dearth of Celerons and
Durons. So if you have a little
time and want a small line in the TLC
table crunch the benchmark
and submit.
Though the top of the table is
occupied by Gibbo205
at 5:05 the 5:09 result
immediately (RogerW)
underneath gives an indication of
how well Alpha processors can
compete while being only half the
speed of the competition. Not that
I think you are going to find too
many salespeople quoting SETI
benchmarks to prospective business
clients (though admins. might have
it in the back of their minds)! A
happy thought.
At present breaking 6 hours is
impressive and everything under
6:20 has needed 1GHz or more
(excepting the raw power of the
Alpha 21264 of course). But
sub-fives can only be an update
way! Stay tuned.
As a matter of policy the
results table gets priority for
updating over my fulsome verbiage
so I'm including the 'results
latest...' on the date header for
your information - one of those
many small things that needed
implementing. Any technical
points, errors, corrections or
discrepancies results-wise to Roelof,
anything pleasant, helpful, funny
or thoughtful to me (Max),
anything else to Hanser.
Max
out.
Hello
goodbye, tears and cheers...
RB's
decision to go back to the real
world and family life is a sad
loss to TLC
in general and his mentorship of
my efforts will be very much
missed. He put considerable time,
effort and humour into the site
and helped me get up to speed long
ago when I first volunteeered to
help out. So the wheel turns and
his resignation
brings to a close his excellent
contribution to TLC.
I wish you luck and fun to your
family in whatever direction you
decide to head in.
Of course you can never know when
something unexpected (that being
the definition of the word) is
going to pop up to brighten an
otherwise miserable day. Roelof
Engelbrecht contacted zAmboni
and myself shortly after RB
officially announced his return to
more important things, children,
partners, work etc. Roelof
has volunteered to help on the
benchmarking page and I don't
think I can imagine anyone better
to bring some light and
understanding to this corner of
the web. Initially he is going to
oversee benchmarking results
collation but we shall see what
the future brings by way of his
input. For those of you not
familiar with Roelof's
contribution to SETIdom he is the
author of SETI
Spy, speedy squasher of bugs
and keen respondent on
alt.sci.seti. He is always ready
to supply factual advice to forum
devotees being generally
knowledgeable on SETI
implememtation on a wide range of
hardware. If you have ever emailed
him, you already know that his
almost legendary support speed for
SETI
Spy is justified. His
appearance here brings wisdom and
enlightenment in abundance. [Good
enough Roelof or do you need
more?]
Minor but important note to
benchmarkers: the result.sah
is a vital piece of authentication
to include with your submission.
Almost everyone has been gracious
enough to include this small file
and as such I am now making it
mandatory for inclusion of your
results in the table.
Max
out.
Backroom
action only...
I've put more systems
up on the results page - don't you
just love those sexy olive hues,
I'm an Autumn person myself. Also
there are a couple of minor extras
to the 'benchmark file' and
'submit' pages...so things are
buzzing along in fits and starts
and I'll try and keep the
benchmarks table a little more up
to date. If you really like
hunting for changes take a look
around but there's nothing that
will explode your footwear
(probably a good job). For H.Oda
fans there's been a new WCPUid
out for a few days (thanks Roelof,
I should check more often).
Max
out.
Catching-up
backwards
A new results table for
v3.03 has started (at last) and
thanks to the contributors so far.
The obvious reluctance of manic
crunchers to benchmark while the
sands are still flowing for v3 is
understandable. Plus of course the
0.417 benchmark takes longer and
there is not a great deal left (if
anything) to discern from such
activities - in the land of WU,
Angle Range is King. For those of
you who wish to 'competitively
benchmark' you can begin all over
again with v3.03, submit your
times and they'll be entered (in
the table). Fun for all the
family.
We (the Royal 'we') are receiving
upgrade messages from Berkeley and
when it becomes mandatory everyone
will be 'even' for a while - who
knows the next quirk of SETI
software progress. As promised
there will be a round up of the
final v3 submissions and words on
the v3.03s so far but I will be
pleased just to put this in place
for tonight...
Just a final thought - Being a
very conservative IE user (and
Netscrape at work by imposition
from on high) I decided it was
time to try out Opera 5.01 and
very pleasant it is too, except of
course for the TLC
site! I have roamed a fair few net
nooks and crannies and it has
displayed delightfully. Yet here
on my own patch it produces some
rather ugly formatting quirks.
Could be me and my devotion to
Front Page 2000 (sad) or just some
Operatic non-adherence to standard
html tags - I don't know. But to
those of you (and they are
multiplying) using Opera my
apologies and I understand a
little of your angst now!
Max
out
A
short emptying of the head to
start the year afresh...
It was very good having
a break and now I'm back to the
grind there's nothing too major to
report, but you know the style -
some minor housekeeping activity
here, archiving, posting final v3
old bench WU table (at
last)...worth noting that it loads
and then appears to hang with blue
bar and 'done' message in IE5.5
(well mine anyway) - give it a few
seconds and up it comes,
interesting. Nice (slight
understatement) to see the old
firm making it to number one
(officially), Mike
Ober being too popular
for his own good (and now has a
new host it appears), the Berkeley
machines being a 'little' more
consistent in sending out WU's, in
updating data and being able to
write '2001' for the year gives me
a little thrill every time! Just
in case you feel like taking a
crunching vacation give SUN
a couple of weeks to decide
whether to lie down and accept the
situation or whether they'll find
an extra few thousand boxes to
fight us with. All to the good
methinks but as Larry
Loen (amongst several)
pointed out and many of you have
probably started to grasp is, what
next for TLC?
The forum is awash with questions
but few answers, when will 3.03
become mandatory is one of the
more practical ones and so far the
Berkeley Boys'n'Girls have been
anything but predictable. Hang in,
enjoy the ride and make your
thoughts known. The last few
unacknowledged v3 benches will get
some words later this week. Any
corrections, errors noticed or
worthy words to me and I'll try
and make good the imperfections in
my little patch of this planet
that I have some control over.
Max
out.
|