|
|
Preview Hardware freaks with no working knowledge of HTML |
|
Version
3.0 Preview July 5th brought us the first beta test of the version 3.0 client (dubbed v2.66). The new iteration of the SETI@Home client promised to bring the SETI crunchers of the world three major improvements over the previous versions of the client. The two most important improvements happen to deal with additional science that the client will perform on the work unit. Version 3.0 will do pulse and triplet searches on the data, in addition to gaussian signal fitting that has been done in previous version. These added calculations would significantly increase the processing time of a work unit, unless they made improvements to the processing algorithms themselves. Thus we bring in the third improvement, a new and improved Fast Fourier Tranformation (FFT) routine. In previous versions of the S@H clients, the one sore spot from many was the inefficient FFT routines which spawned several and conflicting attempts to "patch" and improve the FFT routines in the version 1.x and 2.x clients. The new faster FFT promised to help alleviate the burden of pulse and triplet searches. But alas, you cant have everything...it was projected that the version 3.0 client would run approximately 25% longer than the current 2.4 client.
The major graphical change in the beta client is on the top left of the client window. This portion of the client window is shown below. Directly underneath the Data Analysis title is where the client shows what it is currently working on. It will either state "Computing Fast Fourier Transform", "Chirping Data", "Searching for Pulses", "Searching for Gaussians" or "Searching for Triplets". The status bar to the right of these messages shows the % completion of the current search. Below this line is where the "Best Results" are shown. The second and third lines show the details of the analysis results, and below this is a graphical representation of the result, shown as a continuous red line. When the client is showing the "Best Gaussian" it shows the raw data as a red line, and the best gaussian fit is a white line overlay on the raw data. If it shows the best triplet, there are 3 short vertical white lines pointing out detected triplet. As the work unit processes, it alternates showing the three different results. But there is a possibility that there may not be a gaussian, or triplet, or pulse detection.....or even none of the above. The client only shows results found....and alternates showing them. If there are no results found in the data, this area is blank. Finally...underneath the graphical data representation is the overall status bar. This shows the % completion of the entire work unit, and the current amount of CPU time used on the current work unit.
The release of the v2.66 beta was highly anticipated, but that anticipation quickly turned into disappointment for many. On analysis of my first work unit, it was apparent very quickly that this client needed some work. The 2.66 client crawled along at a snail's pace, and the estimated time of completion was way longer than anything processed with the version 2.0 client that I had been using. I let the work unit finish overnight....OK it didn't finish overnight...It was well into the evening when the client finally finished. The completion time on a PIII 600E overcloked to 944MHz was a whopping 25 hours. This definitely did not fit into the "25% longer" that was quoted on the alt.sci.seti newsgroups. While processing of the first work unit was still on going, a browse through the alt.sci.seti group showed many seen the same problem. Needless to say, that was the first and last work unit that was processed with the v2.66 client. Within the next two days Eric Korpela announced that they had an idea what was going wrong, and they were working on a fix. The problem turned out to be the client waiting and idling while the on screen graphics were updating. Back to the drawing board. v2.70
Beta Run times for this new beta on the first 5 or so work units I completed ranged from 3:10 to 6:46. Upon completion of a bit more work units, the work unit times fell into two categories. The majority of the run times were in the range of 3:20 - 4:00, and the smaller category was in the 6 - 7 hour range. I will touch on this in a bit. Many of the people on the newsgroups finished their first work units with the beta...and there was much rejoicing. Of course everyone loved the faster times....almost all of them ran faster than the previous version 2.x clients (even faster than the CLI!). Apparently with the new beta, the overhead in drawing the graphics was totally eliminated with the GUI minimized, and even with comparing some other machined it appears that the newer client is more cache friendly. Times from CPUs with large caches didn't vary that much from an equivalent speed CPU with smaller cache. It almost seems that work unit times are more dependent on raw CPU speed, and less dependent on CPU cache size and memory bandwidth. But it is early in the game right now, and needs to be further investigated. After running through some of the "slower" work units, I noticed something different in the processing of the work units. With the faster work units, the pulse detection seemed to be mostly at the beginning of the work unit run, while later on in the run, gaussian searches took over and there were no pulse searches....All of the "slower" work units, no gaussians were detected in the final analysis, but during the analysis, there didn't seem to be any gaussian searches being performed in the run. Instead, there was pulse searches throughout the entire run, instead of only at the beginning. I haven't heard back if this is "normal" or if this is a sign of some glitch in the matrix. Just a half hour ago Eric Korpela replied to one of my posts about slow work units with:
Well he was right....the angle_range on that work unit was 0.023. But, he didn't say that was expected or not...... bulletin....bulletin....bulletin Late Breaking News! (don't you love it reading the newsgroups while you are writing?) Eric just chimed in with an explanation. I will let him explain!
To paraphrase....at lower range angles, the antenna is "looking" at the same point in the sky longer. This enables them to do a more sensitive search of the area. This sensitive search takes longer. On the older clients (with out the pulse detection) these work units did not contain gaussians, and therefore usually had shorter run times...but now with the pulse searches, these work units will probably take longer. Is it worth it? I guess it is up for them to decide! I think I am going to end this part of the preview here. I will add things as they come available! Update
(7/23/00)
Welp...It looks like the beta has a working set that fits completely into the Coppermine's 256kb L2 cache, but is still too large to fit into the Celeron's cache. What does this mean? It looks as if the client is now no longer memory bandwidth limited in *any* PIII CPU. Memory tweaks will probably show no improvement in client times. But it doesn't stop there! The BX motherboard may not be SETI king anymore, the the VIA Apollo Pro boards should perform as well as the BX boards now....but with the added advantage of underclocking the memory and allowing for higher CPU speeds!. Athlons, both classic and T-bird will probably perform as well as a PIII, the Xeon will probably now lose ground to the PIII. The Celeron doesn't fare well...but what about its arch-enemy the Duron? Word is still out on this. The Duron may
be the best price/performance
solution for version 3.0.
But you say "ahem...the Duron
only has 64kb of L2 cache...and
only 128kb of L1 cache...it should
stink also". Not
necessarily. The one thing
about the Duron is that the L2
cache is Update
(7/30/00)
From the benchmark times the full speed L2 cache and the 256 bit L2 bus compensates exactly for the extra hit on the slow memory that the CuMine must take compared to the Katmai (1/2 speed L2 cache and 64 bit L2 bus. The Celerons slower time can be explained by a combination of smaller L2 cache, the 64 bit L2 bus and the extra hits accessing system memory. Where
would this put the T-Bird and the
Duron???
I do want to remind you this is only a guess here. With the above table how would the T-Bird and Duron perform compared to the CuMine? Both the T-Bird and the Duron would have the advantage of a couple more stages in the very fast L1 Cache. But the L2 Bus of the T-Bird and Duron is only 64 bit....plus it has the disadvantage of the Memory Bandwidth of the Athlon Chipsets. For sure the Duron would run slower than the CuMine, but would the T-Bird? I would say no. Well I can actually say for sure(?) that the answer is no with the T-Bird, because I have some benchmarks! I have a couple of times sent to me from tim Wilkens who ran the benchmark work unit with the 2.70 beta with both a T-Bird and Athlon Classic. I ran the benchmark on my machine with the beta also. Here is the meat:
The CuMine is definitely kicking some ass. But there are some caveats here. Did you see how I slipped in a (?) when I said "for sure". This may not really be that fair of a comparison. Yes the PIII is kicking some serious ass, but take a look at that FSB setting. I believe that this ass kicking is more due to the PIII cache's running at a significantly higher clock speed, and the memory bandwidth totally blowing away those of the AMD machines. Because of this you cant really determine the cache dependencies of these different CPUs. A better comparison would be for a PIII, T-Bird and Duron at similar multipliers and FSB settings. Ancala has done the benchmark on the beta with a 700Mhz Duron and turned a 5:07. Sometime in the next day or two I will clock back my PIII back down to 650 and do a bench with that setting to see how the PIII compares to the Duron. Hey it isn't exact....but what the hell! I will keep ya informed
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||