v3.0 Discussion

 

LAMB CHOP HOME

JOIN TEAM LAMB CHOP

TEAM STATISTICS

MEMBER CHARTS

MEMBER PROJECTED

MEMBER GRAPHS

ACTIVE MEMBERS

MEMBER OTHER

OVERALL CHARTS

OVERALL GRAPHS

CHART LEGEND

SETI BENCHMARKING

SETI TIPS, FAQs, et. al.

ARCHIVES

PUBLIC QUEUES

ARS SUPERCOMPUTER

SETI@HOME PAGE

ARS DISTRIBUTED FORUM

TEAM BEEF ROAST

TEAM CHILI PEPPER

TEAM CRAB CAKE

TEAM EGG ROLL

TEAM FROZEN YOGURT

TEAM PRIMORDIAL SOUP

TEAM PRIME RIB

TEAM STIR FRY

TEAM VODKA MARTINI

THE SUSHI BAR

ARS TECHNICA

LINKAGE

 

PERSONAL STATS LOOKUP:
SETI@Home ACCOUNT:

COMMENTS? EMAIL: WEBMASTER
(remove NO.SPAM)

Mad props go out to IronBits for hosting the site and all the others who have kept this site going for the past couple of years!.

MIndless Ramblings About v3.0
Soon as version 3.0 hit the SETI ftp sites, there was some discussion on benchmarking of the new client.  Most of that discussion centered around how appropriate the 'classic' benchmark work unit would be for the new client.  The bench WU as many know was a pretty fast WU on the version 2.x client, and would run faster than probably 90+% of the work units people would do on a regular basis.  The reason why the bench WU would run fast was due to the client did not do gaussian searches on this work unit, and mainly checked for spikes.  This decreased the amount of calculations needed to be done on the work unit, and therefore the faster run times.  The bench WU ran pretty fast, that was good because it was a fast WU to do a benchmark on....then people could go about their merry way and crunch 'normal' work units for fun and profit.  The problem with this work unit is that it is not representative of the "average" work unit that will come a cruncher's way.  The angle range of the work unit is unusually high (6.718), while the vast majority of work units fall in the 0.400 to 0.500 range.  Those work units took significantly longer than the bench WU, but it was never clear for most people (and me even) why those work units took longer to process, and why in general there were variations in run times on different work units.

Enter Roelof.  Roelof Engelbrecht is the creator of SetiSpy, and along with Lawrence Kirby are two people who are very familiar with the technical aspects of the S@H clients, and have offered valuable insight on how the client works and what kind of calculations the CPU needs to do to complete a work unit.  (I highly advise browsing through the alt.sci.seti and sci.astro.seti newsgroups and seeking out their posts)  Roelof sent some input in about the choice of the bench WU and also included some equations he developed from work unit FLOP calculations by Lawrence Kirby.   These equations associated the angle range with the amount of FLOPs needed to finish a work unit.  First I would like to point you to a chart that is shown in the SETI@Home faq which details what kind of calculations the work unit does depending on the angle range of the work unit.  This chart is shown below:

slewrate angle_range 128K 64K 32K 16K 8K 4K 2K 1K 512 256 128 64 32 16 8
------ -------- 0.075 0.149 0.298 0.596 1.192 2.384 4.768 9.537 19.07 38.15 76.29 152.59 305.18 610.35 1220.70
0.000000 0.000000 --- --T --T --T -PT -PT -PT -PT -PT -PT -PT -PT -PT --T --T
0.001000 0.107374 --- --- --T --T -PT -PT -PT -PT -PT -PT -PT -PT -PT --T --T
0.002000 0.214748 --- --- --- --T -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT --T
0.003000 0.322123 --- --- --- G-T GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.004000 0.429497 --- --- --- G-- GPT GPT GPT

GPT

GPT GPT GPT GPT GPT GPT GPT
0.005000 0.536871 --- --- --- G-- GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.006000 0.644246 --- --- --- G-- GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.007000 0.751620 --- --- --- G-- GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.008000 0.858994 --- --- --- G-- G-- GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.009000 0.966368 --- --- --- G-- G-- GPT GPT GPT GPT GPT GPT GPT GPT GPT GPT
0.011000 1.181117 --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.012000 1.288491 --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.013000 1.395865 --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.014000 1.503240 --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.015000 1.610614 --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.016000 1.717988 --- --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.017000 1.825362 --- --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.018000 1.932736 --- --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.019000 2.040111 --- --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT
0.020000 2.147485 --- --- --- --- --- --- -PT -PT -PT -PT -PT -PT -PT -PT -PT

(G = Gaussian Search, P = Pulse Search, T = Triplet Search, yellow shows the FFT length)

The things to note about the chart above is that there are three different sections, which I split up according to color.  The version 3.0 client does different calculations on the work unit based on the angle range of the work unit.  At low angle ranges (green) the client only does triplet and pulse or triplet only searches while processing, at mid angle ranges (orange) the client does all three type searches, and at high angle ranges (blue) the client does pulse and triplet searches exclusively.  Just a cursory look at the chart, one would expect that the mid angle range work units to take longer to process than either of the two ends, because the client has to do gaussian searches than the other two regions.  You would expect work units in the blue region to process the fastest because they do less searches total than the other two regions.  In terms of the benchmarking work unit, its angle range of 6.718 would put it in the blue region and in fact it would be off the bottom this chart!  An average work unit of the range between 0.400 and 0.500 angle range would be in the orange region. 

Lawrence Kirby took a look at each of these regions and was able to determine the amount of work the client would do (in FLOPs) based on the angle range of the work unit.  There are three different equations, one for each region.  They are:

AR < 0.2255 then FLOPs = 2.40E12*Exp(0.31*AR)
0.2255 <= AR <= 1.1274 then  FLOPs = 2.34e12*AR^(-0.225)
AR > 1.1274 then FLOPs = 2.26e12*AR^(-0.0169)

Roelof took these equations and plotted them out to see how the FLOPs vs angle range would look like, and that is the graph on the right.  The plot of FLOPs (ok TeraFLOPs) for the version 3.0 client is in red, and for comparison there is a plot for the version 2.4 CLI also.  As stated before, the mid angle range work units have the largest amount of calculations being done, while the low and high angle ranges have significantly less work to do.  Also included in this graph is the yellow line which shows the distribution of angle ranges based on 200 or so different work units Roelof had processed.  The vast majority of work units had angle ranges within the range of 0.400 to 1.00, while the minority of the work units had lower or higher angle ranges.  With the version 2.0 client those types of work units were a "treat" and were processed significantly faster than the majority of work units.  The same appears to be the case with the version 3.0 client. 

But, I am getting away from things a bit here :).  Didn't this discussion start with the merits of the old benchmarking work unit versus getting a new one???   Well yea.  I'm going to cut this discussion short here and get to the point, cuz I have other things to touch upon.  In the processing of a work unit, the majority of the work unit processing time is consumed doing FFT routines.  This is going to be the major factor in determining the work unit time "range".  The final work unit time is going to be determined by the amount and types of searches being done on the work unit.  I am assuming here that all work units will go through the same FFT searches so in terms of benchmarking a CPU or system, the FFT routine processing should give a good idea on how the system will crunch a work unit.  The current benchmarking work unit does not do any gaussian searches (with version 3.0).  But the work unit will go through with some pulse and triplet searches, and this work unit should give good enough distinction between different CPUs and the way they handle the client/work unit during crunching.  Therefore, the current work unit should give us the information that we would want.  Granted, this isn't optimal....heck the best case scenario would be to use an "average" work unit in the 0.410 angle range region.....but we have a feeling that this wouldn't tell us anything different, and would tell us the same information that we can get with the current work unit.  So we are going to keep the same work unit for benchmarking the version 3.0 client as we had used for the version 2.x clients.

Angle Ranges vs. WU Times
Hrmmmmm.  You may ask yourself (where is my beautiful house?), now we have a relationship between the angle range and the amount of work (in TeraFLOPs) done by the client...can't we predict how the client will perform time-wise with this?  Possibly!  Well the computer's CPU is there for doing calculations, a lot of calculations...and aren't TeraFLOPs a measurement of calculations?  Theoretically the amount of calculations in a work unit, should be directly proportional to the completion time of the work unit.  OK....lets test them out. 

Above is a graph showing the same angle range vs TeraFLOP plot (in gray) as in the previous graph.  This time I have included some times and angle ranges of the work units that I have processed so far with the version 3.0 client.  On this graph I have moved the scale of the TeraFLOP plot up a bit to be superimposed over the data points for my PIII running at 925MHz.  It is early right now and I don't have *that* many data points on the graph, but there does appear to be a correlation between the number of TeraFLOPs in the work unit and work unit times. Above the times for the PIII are some Celeron @ 450MHz times.  There are some variability in work unit times at the same angle range for the PIII shown above, but that can be attributed to my daily use of this computer for doing a variety of things, especially doing the stats for the site here.  I would expect that the distribution of the Celeron times to follow the shape of the TeraFLOP plot, but I do also expect the range of work unit times to be a bit larger (i.e. a stretching of the TeraFLOP plot vertically a bit).  There appears to be less variability in the Celeron times though.   If you look closely, the slowest "data point" of the Celeron, is actually three different work unit times at the same angle range, and all three of the times are almost superimposed on each other.  The computer has little to do other than crunch SETI, and act as my SETIQueue server.

I have a feeling with the version 3.0 client, that work unit times should scale linearly vs. CPU clock, and a relationship can be established between work unit times and angle ranges such that we can predict work unit processing time based on the angle range of the work unit for different types of processor.  Just plug in a variable for the CPU type, CPU speed, and angle range, and get the projected work unit time from that equation.  Only time will tell if that will be a reality :).

Version 3.0 vs Version 2.70 beta

In this graph I have taken some data from the beta testing of the SETI clients and put it to some statistical use.  The green data points are run times of different work units processed with the version 2.0 client plotted against their angle ranges.   I also took the version 3.0 TeraFLOP graph and superimposed it on the version 2.70 data.  It was well known with the pre version 2.76 beta clients, very low angle range work units took a very long time to process.  You can see this pretty well with the graph above.  v 2.70 work units with angle ranges > 0.2 seem to fit the TeraFLOP plot for version 3.0 pretty well don't you think?  Actually if you extend the middle equation shown above (highlighted in orange above), back to about an angle range of 0.02, the work unit times seem to fit that part of the graph pretty well also.  At extremely low angle ranges this falls apart, and I added a flat line extension to sort of show the estimated TeraFLOPs needed for those version 2.70 work units.  I guess the main thing to take from this graph is that the version 2.70 data seems to fit the TeraFLOP graph pretty darn well at angle ranges > 0.20.  Therefore, I believe that the equations for the TeraFLOPs for each work unit will be a good guide to the relative work unit times for the version 3.0 client.  As a comparison I threw in the version 3.0 data I have done so far to compare the run times between the version 2.70 and version 3.0 clients.

Run Times vs Gaussians Found

The final thing I want to touch on here is the relationship of what the client finds compared to the work unit times.  I have always thought (and mistakenly so), that if the S@H client finds one or more gaussians in a work unit, that work unit would process slower than if it found no gaussians.  That isn't necessarily true.  This misconception came from the fact that I didn't realize how the client processed the different work units.  To me the majority of work units that did not have gaussians processed faster than work units that contained gaussians (in version 2.4 CLI).  What was happening with the 2.4 client is that high and low angle range work units didn't even do gaussian searching!  If you look at the very top graph of the TeraFLOP plot for the version 2.4 CLI, the flat portions of the plot are angle ranges in which no gaussian searches were performed.  These angle ranges also based on the TeraFLOP plot had less work to do on the WUs, and therefore had lower run times.  The only way to check this is to take a look at work units that actually did have gaussian searches being done.  In the graph above, I took data from the version 2.70 beta and plotted the run times versus the angle ranges.  The data points are colored based on how many gaussians were found in that work unit.

Honestly, I cannot make a definite correlation between the number of gaussians found and run times.  To show this you have to look at a specific angle range and compare different work units at that angle range.  But, there isn't any good correlation to be found.  This really isn't surprising.  The thing that determines the run times is the actual search being done....a work unit shouldn't run faster because it didn't or did find a gaussian since they all do the search.  If you lose your keys and search for them for 5 minutes.  You will not do any more work if you it takes you 5 minutes to find your keys than it will if you cant find your keys within 5 minutes. About the most interesting thing about this graph is that work units with higher angle ranges seem to have more gaussians found than lower angle range work units.....strange eh? 

Conclusions
Ok, I'm too tired to come up with any conclusions :).  We do have some equations which give the amount of calculations a work unit will do based on the angle range of that work unit.  Also we have shown that there does appear to be a correlation with the run times of the work units to the amount of calculations needed to finish a work unit.  There is a possibility that sometime in the future we may be able to accurately predict the finishing time for a work unit by plugging the angle range, CPU type variable, and the CPU speed into an equation....and finally, it appears that work unit run times are independent of any significant results that are found within the work unit.

Finally, I would like to thank Roelof and Lawrence Kirby for their hard work in helping explain a lot of the calculations being done in the SETI client processing.  Their work is greatly under appreciated.  Also I want to say that all of the data used in the graphs were pulled using SetiSpy. It is definitely a great add-on, and if you aren't using it, you should.

Update - About work unit time predictions!
Roelof sent me an email and said there is already a way to estimate work unit times.  There is an equation that relates the TeraFLOPs, computing efficiency (CpF), and Speed (MHz) of a CPU to give the "optimal" WU time.  That equation is:

Topt = 278(TeraFLOPs * CpF/MHz)

We have the equations from above (the colored ones way up top) which gives the # of TeraFLOPs based on angle_range, you know your computer speed in MHz, all you need is the CpF measurement. 

I guess the best estimate is to get the CpF from SetiSpy since the numbers that Roelof has on his pages seem to be for the version 2.0 client, and the CpF numbers do not seem to be correct for the version 3.0 client.  I have done a couple of calculations from the CpF that SetiSpy gave me for my computer, and it seems to overestimate the work unit conpletion time by maybe 15 or so minutes.  We will have to see if there will be any modifications to the numbers after some benchmarking work units get submitted. -zAmboni