|
By Larry Loen
This is not the official FAQ of the SETI@home project.
This FAQ is independent of SETI@home and not officially affilliated
in any way.
For official information, including a FAQ, see: http://www.setiathome.ssl.berkeley.edu.
That said, the official FAQ file doesn't cover every topic. This FAQ
is written entirely from the point of view of a participant, especially
the enthusiast who wants to make stronger than average contributions to
the project. It also covers topics frequently seen in the Ars Technica
forum and other SETI-related newsgroups on the net.
What is
SETI (the Search for Extraterrestrial Intelligence) and why do I care?
See the main SETI@home
site for a great explanation and pointers to many resources.
My personal explanation is simple: The goal of SETI is to see if we
can successfully identify a signal from an alien world. If we did, it would
be as momentous an occassion as any in human history, right up there with
Columbus' reaching the New World. We'd know, if we learned nothing else,
that once and for all we aren't alone in the universe. Who knows what that
would mean?
And, all anyone has to do to be part of it is contribute spare CPU cycles.
Pretty cool.
If that doesn't make you want to participate, there are plenty of other
worthy projects which will be happy to take your spare computer cycles,
but this is about SETI.
Who is "SETI"?
If you read the sci.astro.set and alt.sci.seti newsgroups, or the Ars
Technica SETI discussions, you'd think (most of the time) that there was
only one group doing SETI and one possible approach. Participants tend
to say "SETI" and really mean "SETI@home." But, the SETI@home project is
only one particular approach to the search for alien life-forms. There
are many others.
Will SETI
as a project really succeed?
Who knows? You can't find something if you never look. And, SETI@home
has reported having found many interesting signals worth more analysis
beyond what we do for them. All that said, there's a lot going against
it.
Some respected scientists say that the current SETI@home project in
particular is looking in altogether the wrong set of frequencies or not
in enough of the sky. There are serious alternative proposals for "optical
SETI." The SETI@home FAQ itself implies that interstellar distances may
simply be too great to allow us to detect alien radio signals from far
away that aren't explicitly designed for us to discover. Maybe the ET congresses
on other worlds will uniformly refuse to fund such a project that doesn't
benefit their district. If so, we're going to be looking, de facto,
at a much smaller, and closer, set of possible stars.
In short, one has to admit this is a bit of a longshot by any standard,
even if there are plenty of inhabited worlds out there, which itself may
not be so.
There's a famous equation, largely attributed to a scientist named Drake,
which attempted, very roughly, to figure out how many civilizations are
waiting to be found.
The findings are roughly this: Plenty of stars out there, so even if
life is rare, the many stars make up for that. But, whether a given planet
will achieve a stable radio-based civilization, required for SETI to succeed,
is a lot more chancy. If most civilizations nuke or pollute themselves
to oblivion within a century or two of when they achieve radio, odds are
against us hearing their signals. And, that's assuming all life-bearing
planets evolve a critter to produce that first radio.
It is a fascinating quirk of human nature that SETI@home participants
have all levels of expectation about the project's ultimate success, including
expecting it to be a wild goose chase. Yet, regardless, they participate.
Most people care passionately about the science, but there are other motivations
to participate than being excited about whether ET will actually be found.
And yet, the possibility is always there.
How do I
participate?
Download and install the SETI@home software from the official web site.
The installation procedure varies but is fairly painless and pretty typical
for the environment. A Unix version looks like any other Unix install (i.e.
a tarball). The Windows' version most participants start with is pretty
simple and normal to deal with. It functions as a screen saver. But, as
you'll discover, SETI fanatics configure the Windows version to run all
the time. And, without the screen saver graphics. This is surprisingly
benign for most computer users and lets you run SETI all the time, even
when using the computer normally.
How do I
get the SETI@home software?
The SETI@home license agreement specifically states that you must download
the code from their web site. The software is free, but you are legally
bound to download from them. That license provision is to protect you.
This is powerful software that runs all the time and accesses your disk
and your network frequently. Cyber-vandals and hackers would adore it if
people got in the bad habit of downloading this kind of thing from all
over the net. Ordinary prudence would say "don't accept this kind of code
from any unofficial source."
What preparation
must I do before executing SETI@home for the first time?
Have you got permission to run it? Not just from the boss at work (if
your work machine) but whomever runs the network? Do you know if you can
run SETI-related 3rd party programs (you may be able to run SETI@home and
not the 3rd party code)? Get this all straight ahead of time. Trust us
on this: You don't want to be explaining yourself after the fact. Get the
required permissions up-front.
What happens
when I run the SETI@home code for the first time?
When you start up SETI@home for the first time, you'll need to answer
a few benign questions (the main bit of information that counts is your
e-mail identifier). Then, to join Team Lamb Chop, you'll need to go to
the SETI@home site and join. The instructions on the site are easy to find
and follow. Welcome to the greatest team in SETI!
What is
the SETI "client"?
The SETI@home software is typically called the "client." This comes
from a bit of common computer jargon. Generally, when one computer hands
out work to lots of other computers, the one computer is called the "server"
and the rest are called the "clients." SETI@home and SETI@home participants
use this nomenclature. Thus, the SETI@home software is usually just called
"the client."
What is
the GUI? What is the graphical user interface version of the client all
about?
The Graphical User Interface is the "Windows screen saver" version of
the software. Being more intensely interested in the project, few of us
run this code much for reason that will become clear soon. But, there's
nothing wrong with starting out with this code and running with it. It
is stable, easy to run, and produces results. We usually call this the
"GUI" or "GUI client."
What is
CLI? What does Command Line Interface mean?
Most people who get enthusiastic about SETI will find the GUI is too
cumbersome. Fortunately, SETI has developed (originally for Unix machines)
a version that runs from a command line. This means, you open up a DOS
Window, move to the correct directory, and type the name of the command
(e.g. setiathome). Basically, in the Windows case, you download it, probably
rename it to something handier, and run it in the directory of your choice.
On many non Windows and non Mac machines, this "command line interface"
or "CLI" is all there is. While it has "winnt" in its name, it runs
on everything from Windows 95 on. To get started, see the Tips
page for an excellent .pdf guide.
What is
a "WU" or "work unit"?
SETI@home divides the project up into "work units." Each work unit (also
called WU by participants) is a sample of some location of the sky at a
known time and a narrow band of radio frequencies. SETI@home gets recorded
tapes from radio telescopes, such as the one at Aricebo, and breaks them
into these discrete and independent units. Eventually, if you install the
SETI client, you end up downloading one of these work units onto your computer
and the client on your computer analyzes it on SETI@home's behalf.
A work unit is about 350,000 bytes of input. When the client completes
its analysis, you'll end up with a result file on the order of 4,000 bytes,
though occassionally larger.
Even on a fast PC, a typical work unit will take many hours to run.
Once it completes a work unit, it will wish to upload the results and download
another work unit.
What's all
this talk about "crunch" and "crunching"?
This is a venerable bit of computer jargon. The "crunch" is simply the
act of processing a work unit in whole or in part. We sometimes call ourselves
or our machines "crunchers" as well.
What is
an "outage"?
This is a real-world project. That means that while the SETI@home site
is surprisingly robust, it is sometimes unavailable for many hours at a
time.
Why does this matter? Well, while it takes many hours to crunch a unit,
but the crunch does end. Once it ends, it's time to upload the results.
Your contribution is invisible until the answer is uploaded. If the site
is unavailable, then no upload, at least not for a while. Moreover, your
computer can run out of work and simply wait. Yuck! Fortunately, this isn't
as bad as it sounds. Team Lambchop is prepared!
The good news is that the SETI@home site has been available at least
95 per cent of the time, really more like 98 per cent. While this is very
good, objectively, it can be a strain to dedicated crunchers' minds when
an outage happens. Many of us just hate to see our machines stand idle.
Moreover, the outages tend to come in bunches. A famous case had a contractor
cut a key cable on Berkeley University's grounds somewhere. That took three
or four days to recover.
Even better news is that we have helped develop a technology called
"caching" to work around outages even when they happen. This will be described
later and it even has whole other FAQs devoted to it.
What is
a "brownout"? What is all this talk about "bandwidth restrictions"?
Recently, we've experienced a new and slightly different form of outage.
Instead of the SETI@home site being down, it is merely overloaded. Since
it has been so successful, SETI@home has an upper bound on how much communications
bandwidth it can consume. Sometimes, especially first shift US Pacific
Time, it reaches these limits. When it does, people have more or less difficulty
with both uploading and downloading. Persistence and caching (I promise,
we'll have plenty on caching in due course) are again the solutions.
What is
a "farm"?
Some of us have managed to get more than one machine, even many more
than one machine, running SETI@home. Somehow, running a bunch of machines
came to be known as a "farm" and the term stuck.
What is
"borging" or "assimilating"?
Running (with permission!) any number of machines at work. The idea
here is to convince friends to run SETI@home on their machines, ideally
on your behalf or at least for Team Lamb Chop. See the "Buying and Borrowing"
section for more.
I guess
this means that the client will have to access the Internet from time to
time. How can I get a new work unit when I'm away from my computer?
The simplest way is to run the client as-is. If you start with the easiest
method, running the Windows GUI client, this is how you will operate: The
client will need access to the Internet when it finishes one WU and needs
to download another. You can set it up to automatically download a new
unit (if you're on-line all the time) or have it to prompt you for the
new one so as to save on connect charges.
If you pay for Internet access by the minute, even the prompting technique
will become unwieldy. To be sure, the GUI code does a fairly good job of
letting you manage this if you own only a single computer. And, if you
have software that manages a feature called "dial on demand" skillfully,
you still let SETI@home automatically download new work. The "dial on demand"
feature can ensure that when the SETI@home client needs new work, the computer
will connect itself, download the unit, waste a bit of connect charges
(based on a time limit you specify), and disconnect. For some, this is
satisfactory.
But, many of us try to run SETI on many machines. We try to run them
twenty four hours a day, seven days a week. The facilities built into the
SETI GUI client frustrate most people who try and run many machines or
even one machine operated so intensely. No matter what speed your computer
is, running all the time means it will want a new work unit at 3AM. For
many people, this leads sooner or later to something undesired -- dial
tones in the middle of the night, expensive connect charges and so on.
There are several schemes that enable you to "cache" work units so that
you don't need to be baby sitting your computer all day and all night.
The CLI can be run under the "continuous connection" rules as the GUI
or it can be used with "caching" to manage this whole upload/download cycle
much better.
What about
firewalls? Can SETI@home work in a company setting?
SETI@home has built-in proxy configuration. You can use ordinary proxy
or socks proxy. See the instructions -- it is easy to set up and use.
What is
"caching" and why do I care?
The short answer is: Caching cancels problems. Those who cache will
see their machines busier, oftener.
Caching is any scheme that lets you have work units set aside ("in your
pocket," as it were) for later use. Reasons to cache include:
-
Overcoming outages at the SETI@home site.
-
Controlling "connect charges" if you have a dialup connection
-
Dealing with "brownout" conditions at SETI@home
-
Managing a local network of machines (see "farm").
-
Moving some specific work unit types to machines that "crunch" them better.
-
Obtaining statistics about the work units you processed and tracking progress.
For Windows users, there are many free "3rd party" products created
by SETI@home enthusiasts to manage the mechanics of caching. In addition,
some permit you to keep records of the type and duration of the work units
you run.
See our "Tips Page" for articles relating to
many available caching and monitoring tools.
Most Team Lamb Chop participants use at least one of these facilities
and love them. Typically, one machine is chosen as kind of an intermediate
server. It deals with uploading and downloading to the SETI@home site.
It, in turn, distributes work units to (usually several) other machines.
While these third party tools all seem to run on Windows, some have varying
capability of managing nonWindows machines.
Once you begin running the CLI, you really should give these tools a
chance. There are a few exceptions to this, especially for cases where
you can't have 3rd party software running or can't or don't wish your own
centralized server (which may idle many machines if it fails). A supplementary
FAQ is available to assist with these cases if they apply to you.
How Many
Units Should I Cache?
This is a surprisingly difficult question and getting more difficult
to answer as time goes on. The most obvious answer might be "as many as
you can get." Indeed, for quite a while, this was a good answer.
Factors:
-
Many members just can't abide to have their machine idle. Even one per
cent idle time is too much loss. This reality drives much of what is written
in the forum. Our demands for uptime (at least from SETI@home) are extreme!
-
We know that SETI@home sends out a given work unit more than once (largely
for security reasons).
-
We know that the first two results for a given work unit are likely to
be more important than any additional returns (see "Miscellaneous" for
more).
-
"Brownout" conditions, a more recent phenomena, make obtaining a lot of
cached units at least intermittently difficult and may change how and why
we cache.
Analysis of these factors suggest that a result ought to be returned
back within about a week of when it was obtained. The sooner your result
is returned, the more certain your contribution is genuine. Results returned
after a week are still credited to your account. Formally, these "late"
results are used with the earlier results, according to the SETI@home project
designers. In that sense, they contribute. But, there are technical reasons
to doubt that the third and fourth returned answer really contribute. While
late units "count" on your stats, they won't help find ET.
Accordingly, most Team Lamb Choppers have caches of several days to
a week's worth of production. After one has run SETI@home for a while,
one will know how many units this is. Having a week's worth "in hand" reflects
experiences like the time the cable was cut and SETI@home was offline for
many days. It has cost little or nothing, up to now, to have a "deep" cache
of this kind.
Only the new factor of "brownouts" could cause this very popular strategy
to change. Brownouts make filling a cache much more difficult to do. We
may eventually (and reluctantly) have to settle for fewer units in the
cache. The reasons for this are complex and dealt with (see the Tips
page for an appropriate article).
Some Team Lamb Chop members, writing in Ars Technica's forum, may go
so far as to suggest that it is impossible to enjoy the project without
the kinds of "deep" caches we're used to. Don't believe it. The author,
who has been forced by unusual circumstances to run without caching, can
state with confidence that for all the effort involved, the difference
between caching and not caching is only about three per cent a year.
For now, one should get two to seven days' worth of units in the cache
and wait to see if brownouts change our strategy. So far, brownouts are
intermittent, which means one can have the cache depth one wants nearly
all the time.
How do I
know if my machine is as fast as it should be?
The Team Lamb Chop site has a standard benchmark. It is a specific work
unit we have set aside just so we can measure our machines. Someone long
ago processed it and returned the result to SETI@home.
What we do is run it in a special mode provided by the SETI client software
just to see how long the work unit processing takes. The result isn't uploaded
nor is a new work unit downloaded. The client itself records the run time
to do the work unit processing, which is far and away the majority of the
time spent. That run time is recorded on this site by many participants.
See: bench/303results.htm
for results for many computers.
Keep in mind that these are often highly tuned machines. See the next
question.
Hey, my
machine is a lot slower than the benchmark results tables show, why?
There are a lot of reasons.
One is that many participants "overclock" their machines. Many of our
participants own machines that they know how to "tweak" in special ways.
Their BIOS allows them to do things ordinary users would never dream of.
This enables them to do two major things: Run the CPU faster than its official
MHz rating and, almost as important, run the main memory of their machine
faster than its official rating. Either can make the machine much faster
than it "looks." These tricks are too varied and arcane to list here. But,
Ars Technica is a great place to learn how to do them when and if you are
ready.
More basic slowdowns come from taking the defaults on the Windows GUI.
If you take the defaults, the screen saver runs all the time when SETI@home
is running. Moreover, SETI@home shuts down when not screen saving. Both
of these really stretch out the amount of time a work unit may take. It
is easy enough to set the Windows screen saver to both run the GUI and
to have the screen "blank" fairly soon after the GUI starts up. Access
the control panel and "display" to do this. This greatly speeds the SETI@home
time. It is also important, for high performance, to set the GUI to the
"run continuously" mode. This is a simple option in the SETI@home program
itself. Right click on the little SETI@home antenna and you'll soon find
this this option. Better still, switch to the CLI (command line interface)
and that will, by itself, ensure faster work unit processing.
Another can be that the work units vary in how much they cost. If you
are within 20 percent of the posted benchmark times, you are probably doing
OK. If you want that last 20 percent, or just want help, a posting in the
Ars Technica fora will get you prompt answers.
Should I
buy one or more computers to run SETI@home?
As you may have gathered, many of us run more than one machine on SETI@home.
But, it is a major personal watershed to actually buy a machine whose sole
or principal purpose is to run SETI@home.
Doing this is a very personal decision. It is not required for team
membership. That said, some people get the 'bug' very badly and do buy
their own machines (often "stripped down" in various ways so that they
really are just for SETI). However, the history of the 3.03 client (see
"History" later on) reminds us of what could happen and, to a degree, has
happened a couple of times now. If you buy a machine just for this project,
you must be prepared to see arbitrary changes made to the client software.
Most of them will 'devalue' how many work units you will be able to produce.
If you know this and understand this, then you can make an informed decision
about building up a SETI farm. You have been warned. That said, there's
a lot to be learned about building systems on the cheap, running Linux,
and overclocking standard Intel or AMD boxes that come from this.
How does
one "borg" machines at work?
There's a fine art to this. Always remember the other person is doing
you a favor. When I approach someone about running SETI@home on their machine,
I always prominently offer to run SETI@home for the benefit of whomever
has the machine. That is, I offer them the credit on their user ID.
With near universality, they are happy to let me have the credit instead
and are interested in letting me run the project, business condidtions
permitting. Two cautions: 1) Get permission. People have been criminally
prosecuted (really) for running this stuff without permission. Also, do
you want to be grousing in the unemployement line about the dumbo who fired
you for running SETI? 2) Don't interfere with real work, ever. You don't
want to talk to your boss about crashing the month end report program so
you could squeeze off a few work units. There's plenty of idle time --
I've left machines alone for weeks until the time was ripe. I'm thousands
of WU richer for it. I have also written bits of operational code
to make my running invisiable and painless. There are techniques
to run SETI@home out of the system tray, which can help you get permission
to run on work machines, including when its owner has signed off, but left
the machine running.
Aren't there
other machines than Intel machines involved in this project? How do I get
my system involved?
There are plenty of nonIntel machines involved. The author of this FAQ
runs many nonIntel machines himself. But, it is a fact of life that Intel
CPUs dominate the world in terms of raw CPU count, with the Intel-compatible
AMD rapidly moving into second place, if not already there. So, in terms
of sheer volume, postings on the project (especially for custom-built machines)
will sometimes be so Intel-oriented as to drown out other voice as a matter
of sheer demographics. And, in our group, AMD machines seem to have at
least equal footing with Intel these days. Since they are largely compatible,
a lot of comments for one applies to the other.
But, there is a substantial inventory of other machine types. As always,
the Macintosh crowd has made a good showing. All major Unix boxes are there
in force; Sun had been the leading team in terms of production for most
of the SETI@home period (through March of 2001 when we overtook them for
the number one team spot). Indeed, by now, nearly anything with any market
share has a SETI@home client. Most CPU types have not only their "home"
operating system (e.g. OS/400, Solaris) but also a Linux version available.
Things like BEOS, BSD, and OS/2 are also available if you like those operating
systems, so even the Intel crowd is fully represented by its various OS
alternatives.
Your machine and its operating system probably can be set up for SETI.
See the "text only" download page at the SETI site and look for your combination.
What are
motivations besides sheer science to run SETI@home?
Many of us get very excited about SETI@home as a sheer competition.
For some, this is the entire motivation to participate. Just like fishing
or golf, SETI@home can be done for fun or as a near blood-sport. Some even
admit that SETI is as addicting as either golf or fishing.
The only downside is that some people lose sight of the science. See
"Hacking".
I see references
to "gauntlets". What's that about?
Many members challenge each other as individuals or as organized "subteams"
to short contests to see who can produce the most over some relevant interval,
usually several weeks. Some of us have added resources that we do not always
apply to SETI@home. "Gauntlets" can bring such resources to bear on producing
more SETI@home results for individuals and for the team at large.
Has anyone
cheated and done any hacking of the SETI@home project?
Yes. However, the damage to the project itself so far appears to be
minimal. Most hackers seem motivated by "putting up big work unit completion
numbers," so the known hacks have been crude and easily segregated from
valid results. You may sometimes see arguments about the hacks that have
been done, and whether it has hurt the project. The controversy is: Have
any "hack" results been accepted as valid and if so, how many and what
does it mean?
The SETI@home FAQ admits that it has certain added information that
goes back with each result to help prove that the regular SETI client code
created the result. This has not been perfect, but it also means some work
needs to be done to hack in. In addition, they have also revealed in newsgroups
that they send out each work unit more than once and require "at least"
two results to be returned before a work unit is discarded. Even simple
modelling suggests requiring multiple results by itself is a very powerful
limit on hacks (to say nothing of hardware bugs. We also know some
overclocked machines have turned in incorrect results).
More fundamentally, any positive results will be reanalyzed by the SETI@home
scientists themselves, which will catch any bogus positive results. Thus,
any hacks that stay hidden would have to report "no interesting signal
found."
Ultimately, the most hackers can expect to do is reduce the number of
work units the project has available to process. When discovered, hacked
results are purged from the SETI statistics, which makes their "big numbers"
rather irrelevant.
Things aren't all sweetness and light, however. The most significant
known hack was self-confessed and of long duration. The hackers weren't
detected before the confession in that case. We've since seen certain "participants"
suddenly disappear from the project statistics. So, it is clear that some
of this is detectable. In a few cases, we helped the SETI@home administrators
find such people.
What's a replay attack and why should I care?
The most recent hack attack was simultaneously less technically challenging
and yet more exciting to participants. Basically SETI@home advertised that it has a check in
case someone tries the simplest hack of all -- simply sending back the same result
multiple times. The SETI@home client will return (and re-return) any result file it sees.
It turns out that if you work things just so, you can replay (return) the same
result to SETI@home over and over again without getting caught.
The reason SETI@home's check doesn't always work is thought to be a bit of economy. There is no doubt
that any replay unit can be easily detected and eliminated. We know this because the
SETI@home administrators done so for individual cheaters. Since the returns
are of validly crunched units, all this attack really does is boost up someone's
participation statistics. This gets honest participants excited, but at least there's no harm
to the project.
At this writing, most of the "replay" units seem to have been eliminated,
though some of the cheaters may be trying again with different user identifiers. If so,
we can expect these units to be removed whenever the SETI@home administrators feel like it.
Still, it was exciting for a while since the SETI@home administrators were not
"feeling like it." It took press attention for them to take organized action. One
of our own members, fragile, was particuarly energetic in uncovering and exposing this so
that action was taken at long last. This action protects the participants'
statistics (i.e. if you're in 540th place, the other 539 ahead of you are honest)
and renders this attack pointless in the extreme. Before SETI@home
took action, some of us were wondering if cheaters were going to dictate
things like who wins the team competition. Now it looks like there may be a few
small scale cheaters left flying below the radar, but any greedy cheaters (are there
any other kind, really?) will eventually get zeroed out.
I notice
a phrase WTK a lot. What's that?
See the "Hacking" question. One of the self-confessed hackers' name
began with a K. It has become a sort of swear word. In many forums, WTF
is "what the (expletive)." Substitute the name and you get the idea.
Some of those long duration hackers were members of Team Lamb Chop for
a while. This is a bit of an embarassment, in fact, but one can't control
who joins a given SETI team. Suffice to say that when they confessed, they
were read the riot act by us. They did have the good grace to leave our
team before confessing, taking the unwanted bogus work units with them,
but since they confessed in our Ars Technica forum, we still feel the sting
of their time with us. Hence, things like WTK.
This project has lived long enough to have a history. This section covers
a lot of this history. If you don't want to spend time on this now, come
back when the references in the forum get too obscure to follow.
I notice
there's a discussion about the "3.03 client" or the "2.4 client." Why has
SETI@home created so many versions of its code?
SETI@home has exceeded its own expectations. This has created both problems
and opportunities. It has created problems in that its sponsoring university
has been forced to put an arbitrary "cap" on how much communications bandwidth
it consumes. It has created opportunies in that one response to the large
number of participants, and the bandwidth problem, has been to add to the
amount of signal processing done in each work unit. This was done when
the current 3.03 client was created. This has resulted in several versions
of the code, each one looking a bit harder than the last for ET, and (except for
the 3.0 version), each taking
longer on the very same machine to calculate a work unit.
But, if
they need to conserve bandwidth, won't they someday have to confront this
directly instead of just adding more and more analysis to the client code?
Unknown. Veteran Team Lamb Chop posters strongly suspect this will be
true. However, the SETI@home team has so far claimed they keep adding new
science analysis to each client. Many prominent Team Lamb Chop participants,
however, strongly suspect that the 3.03 client did not add significant
scientific value over its immediate predecessor, 3.0, which delivered faster performance
for some added science (the only time this happened). When the March 2002
brownout problems began, postings from the project designers in the various
SETI@home newsgroups seem to have tacitly confirmed this long-held belief or
at least said there was nothing left to add.
They now appear to be attacking the bandwidth problem more directly. However,
at this project's scale, the costs of the bandwidth will be significant
and the project may end up with a more-or-less permanent bandwidth limit.
We are preparing ourselves for such a situation, if it occurs. But, SETI@home
has so far sidestepped this issue.
Can someone
explain why there's so much fuss about 3.03?
While it has pretty well died down now, you'll still see comments about
the 3.03 client. A couple of reasons. One is that 3.03 came out in
comparative haste, suggesting accountants more than scientists increased
the analysis added over 3.0. SETI@home had long trumpeted the improved
science and speed of the immediate predecessor, 3.0, which took longer to arrive.
The 3.03 client is much slower than the 3.0 client; 60 percent slower to
twice as slow. It is suspected of including a lot of marginal processing
simply to reduce bandwidth requirements at the main site. SETI@home conceded
all but the "marginal" part when 3.03 came out and circa March of 2002
when bandwidth apparently hit some internal limits.
Some also don't like the 3.03 because it (and its immediate 3.0 predecessor)
was re-coded to be less sensitive to large L2 caches. Earlier clients were
much faster with large L2 caches. Why does this matter? Some serious SETI@home
fans purchase their own machines just for SETI@home. This far back, some
of these users were using buying older and fairly inexpensive 400 MHz Xeon
processors, with large caches. These gave, for the 2.4 client, results
comparable to regular 800 MHz Pentiums, hot machines at the time.
When SETI refocussed its emphasis on its then real-world inventory (that
is, Pentium III 256 KB cache Coppermine chips, P IIs of all kinds, and
Celerons), even the 3.0 client came as an unpleasant surprise to these
Xeon owners. The 3.03 only added to the injury. The 400 MHz Xeons performed
like ordinary (and cheaper) 400 MHz Pentiums because SETI reduced its memory
sensitivity. It still cares about memory speed, but it now cares substantially
less than before.
Even without this, the net effect is that the 3.03 client greatly reduced
the value of computers purchased just for SETI@home.
Finally, the 3.03 client is the first to replace the old clients in
total. This was done by the draconian means of invalidating the original
web address of SETI@home as 3.03 "crossed over." The "forced march" nature
of this upgrade created practical problems for anyone who had managed to
get SETI running on a lot of machines.
What is
the lifetime of a work unit and why does it matter?
The exact details of the "life of a typical work unit" are not fully
known. Some details may be held back simply because they haven't gotten
around to telling anyone. Some may be held back for anti-hacking reasons.
But, a lot of interested, smart people have made some good surmises.
The site www.roving-mouse.com/setiathome
has pretty good looking analysis of the probable work flow. A work unit
starts life as part of a long, continuous recorded signal at a radio telescope
(usually, the "big one" at Aricebo). Tapes are sent from Aricebo to the
SETI site in Berkeley. The tapes are "broken up" or "split" into work units
of about 350,000 bytes each.
Once the WU is born, it has a name, is recorded in some data base at
SETI@home, and is part of a fairly large pool (probably about 150,000 work
units minimum) to be handed out to SETI@home clients. If roving mouse has
this right, a typical work unit is currently shipped out between 2.4 and
4 times. This has also been informally admitted by various SETI@home officials
in the fora.
What happens when the work unit results come back is less clear. What
is known is that once two results come in, the work unit can be deleted
(not made available for further crunching). That means that the first two
results for a work unit certainly contribute to the project. SETI officials
have said that every unit returned, no matter how late, will be used in
a resolution procedure to determine whether the WU has anything to do with
hearing an alien signal.
But, there is actually some room to doubt this, at least in practice.
It is certainly clear that the first two results, if agreeing within the
bounds of floating point accuracy, contribute, because it has a very practical
value in eliminating error and fraud.
Work unit that come in after that may officially participate in some
sort of resolution procedure, but if they are virtually identical to the
first two units, then all those later returned units really did was waste
SETI@home's time later on.
For enthusiasts, this matters, because if one's strategy for caching
delays the return of results too long, it means that SETI@home will have
handed out each of their work units to at least two others and those others
would have returned theirs long since. This means that one's crunching
could be only for personal statistics. Being one of the first two "holders"
of a given work unit to return data will ensure, under any scheme, that
the work counts the most significantly.
The obvious answer is to make sure results are returned as soon as possible.
Strategies that "dump" caches infrequently put the value of the crunch
somewhat at risk.
The current best guess (informed by a little crude modelling) is that
work units returned within a week (certainly within two or three days)
are highly likely to contribute to the project under any scenario.
What is the VLAR problem?
VLAR means "very low angle range." The angle range is an aspect of the
work unit and affects the content of the data. There is a chart
somewhere on the SETI@home site that describes this in exhaustive detail.
Suffice to say that what is looked for in a WU varies because some things won't be
detected at certain angle ranges.
Windows users of the client, analyzing this information, noticed that the
VLAR units were taking long, sometimes very much longer, than one would expect
given the declared analysis that should be taking place.
We still are not certain what the cause is. Some, including this author,
think it is in some of the operating system calls (the
client checkpoints about once a minute).
Others suspect the client itself. That such angle ranges are slower in
Windows is certain. Some of the queuing software has gone as far as
to steer such units to Linux machines, which do not have the slowdown. Newcomers
to SETI@home shouldn't get too excited about this. If it bothers you, find
out how to steer them to Linux or to your slower Windows machines.
I think
the SETI client should do multi-threading. How come it doesn't handle my
multiple processor machine?
Actually, it can and it will handle multiple processors. There is nothing
magical about multi-threading versus non-multi-threading applications.
The questions on caching have already talked about this. SETI is about
as "pure" a distributed application as there has ever been. A single copy
of the SETI program (using, therefore, only one "thread") will consume
99 per cent of a single CPU in all the environments this author knows of.
Therefore, the way to use multiple CPUs is to create another directory
and start up another copy of the SETI@home client that points to the new
directory. This will still eat up about 99% of both CPUs, which is about
as good as it gets. Multi-threading the SETI client could be done, but
in terms of raw production, it has no advantage whatever over this approach.
The author has successfully operated 24 way multi-processors this way,
with all the CPUs being very effectively utilized. Simple and effective.
Moreover, it is occassionally helpful to be able to start and stop individual
copies of the program, something difficult with multi-threading. Given
that, why build something more complicated than necessary? (PS, this approach
assumes the command line interface client, but anyone running SMPs, especially
larger ones, will want and even require the command line).
|