|
Blah
I am sure those of you who cant reach this site through www.teamlambchop.com
already know that things haven't gotten any better since the IP change for
the domain. Something isn't configured somewhere on the current
domain name servers. I am working on things my end to get it
straightened out and straightened out permanently...
Even though I am not listed on TLC whois as the Registrant for the domain,
I am not authorized to change any of the info. Jason hasn't gotten
things straightened out yet, so I want to get it done myself. I have
put in for a change in the domain registry to get alittle control of
things. I have also had a couple of offers for use of people domain
name servers. Hopefully I can get things straightened out quickly.
Wheeeeeee
Hopefully everyone can see this ok. Right now the teamlambchop.com
domain should be pointed at Hagabard's
mirror site....well it isn't a mirror site right now cuz it is the main
and only site :). It has been a tough time for the site in the past
month, and hopefully the problems have been alleviated right now.
FYI, the reason for the change right now is that Jason at Nimbus
Networks is moving (him and his operations) from Austin, TX to
California. In the mean time, the site will be hosted solely on
Hagabard's server. (Big thanks to Hagabard!). The problems
with the site access mainly centered around the upcoming move for Jason,
and he had switched hosting the site on his home DSL line. The IP
address apparently wasn't quite static and it got changed a bit in the
middle, and access to the site for many got messed up.
Lets cross our fingers and hope things go smoooooooth.
Cruncher o' the Week
Last week TLC had another Cruncher
of the Week over at S@H. Congrats goes out to Ernies for the
big honor!
UnInformix
There has been problems with the S@H servers and stats looks ups for
different addons. Over at Berkeley they had upgraded their web
server and voila! it couldn't talk to the other servers. The
problem had to do with the newer version of Solaris they had on the web
server being a bit incompatible with their Informix database. The
solution. Install an older version of Solaris...as Matt
Lebofsky explains:
For the past few weeks we were
having problems with the new web server connecting to our database. This
is why user/team stats pages were being generated by a secondary web
server.
This afternoon we brought down the data server briefly and rebooted the
two web servers to fix this problem. The actual fix required us having
to fall back to an older version of the Solaris operating system. And so
far, so good. We are letting these changes incubate for a day before
calling the operation a success.
While they were working out the problems with the web
server, they had changed the address for the stats and other goodies from
setiathome.ssl.xxx.xxx to iosef.ssl.xxx.xxx. Now that they have
things "fixed" they have changed things back over to the old
address at setiathome.ssl.xxx.xxx. Now all of your addons should
work like they did before the trouble.
State of TLC
Just wanted to show ya a graph I had going in the stats but I never put up
on the site. This graph shows the team production and the net work
unit averages per day (averages over a 7 day period). The thin lines
are the actual production and new work unit values for each day.
Contrary to popular belief....TLC is still going strong! Also to
satisfy your love for numbers....705 members turned in a result
yesterday...and 975 members have turned in a result in the past
week. Keep up the work all.

Berkeley Getting
Straightened Out?
They have been working overtime getting the server problems straightened
out at Berkeley. A check of the Technical
News Reports will give a summary of what they have been doing in
the past week or so. Here is a summary of what they have been
working on:
June 20, 2001
Another busy morning:
We upgraded our version of Informix on the user
database, mostly motivated by our need to get the new web server able to
handle all the CGI functionality. We were having troubles with this
recently (in case you haven't noticed) because there is a conflict with
newer versions of Solaris and older versions of Informix. Anyway, the
upgrade itself went well.
However, this didn't totally clear up the
problems we were having - we still can't do database lookups on the new
web server, which is why all CGIs are running on an auxiliary machine,
iosef.
As well, after restarting the data server, we
found it maxing out very early, to the point where it was dropping
connections and failing when doing user database lookups. Some people
may have seen erroneous "unknown user" errors. It turns out
there the old configuration file for Informix didn't exactly jibe with
the new version. After a bit of research we were able to make Informix
happy and resume normal operations.
At this point we turned the data server back
on, as well as the user lookup CGIs. We are leaving the "View
Recently Completed Results" function off, though, since this needs
to connect to the science database which is currently turned off in
order to allow us to build some desperately needed indexes. Building
indexes will vastly speed up various database queries.
By the way, "View Recently Completed
Results" is what used to be "View Last 10 Results." Why
the change? In order to make the lookups much faster. This new method
restricts the database searches to only the past few weeks, whereas the
old method had to sift through and sort two years' worth of result data.
Users were experiencing many dropped connections from these queries
taking too long to process.
So.. As it stands now we are still working on
1) getting the new web server to contact the user database, and 2)
getting the science database indexes built.
June 19, 2001
1) The troubles that were causing
random power fluctuations (and the one long power outage on June 14th)
have been fixed by the building electrician. This required a two hour
outage this morning. Our data server and web servers were out of contact
during this time. They are all currently back on line. [snip]
3) The data server is still suffering from
fallout of the lengthy science database restore this past month. The
upshot of this is the data server occasionally stops working for short
periods of time (usually about 15-20 minutes). This has been happening
regularly for the past week as we've been diagnosing and fixing this
problem. [snip]
June 14, 2001
We had an unexpected loss of power last
night that resulted in an outage of several hours. The building
electrician has found a fault in a power distribution panel. He has
applied a temporary fix but will be replacing the panel soon, most
likely Monday. This will result in an additional 2 hour outage, which we
will announce on the main page.
June 13, 2001
Today we replaced two of the three RAID cards on the science
database (the third card was replaced a couple of weeks ago). Hopefully
this will reduce, if not eliminate, the problems we have been having
with the science database RAID system.
It seems that all of the things aren't fixed yet
though....oops...its working now...wasn't working 5 minutes ago! The
stats look up on iosef is working now...it wasn't a couple of min ago
:).
SETI Timer Updated
alistar_b posted
on the Distributed Computing newsgroup that he has updated his cool little
page for SETI
Timer. It is a page that lets you plug in your processor
type, CPU speed, RAM speed and WU angle range and it spits out the
estimated completion time for the work unit. Give it a shot and see
if your system is up to snuff!
Energy Reminder
Brad_H sent in word as a reminder to
all of us running distributed projects. With the energy crisis in
California and other possible energy crunches with the upcoming hot summer
months, we should try to adhere to some of the S@H requests for Running
Green.
S@H Worm
I saw this on the alt.sci.seti newsgroup a couple of days
ago....apparently from the Norton Anti-Virus pages there is a listing
for a S@H worm. Apparently this worm is spread using MS
Outlook, and when the work is executed it downloads the S@H program,
starts the program and sends the results to the virus writer's
account. It also uses Outlook to propagate itself.
TLC Site Info
Many people still cannot access the main
TLC site right now. The server is up. I can access it
here from Michigan, but many others cannot. I am still not sure what
the status of the site is at the moment, but most likely the site access
problems are due to the IP address that the TLC site is using right
now. The IP address (207.8.64.130), is different than both the
primary and secondary IP addresses on the TLC domain whois. I am not
sure if this is a temporary IP address though...it is similar to the
secondary DNS entry. Guess we will have to wait and see in the
meantime. Hagabard's mirror
site is still up and hopefully it can be accessed by all.
There is a second possibility on problems accessing the site (just
speculation). Recently there has been quite a bit of rain and
flooding in the Houston area, and the TLC site is being served through
Austin, Tx. There is a possibility that some of the internet
connections in the area have been having problems due to the flooding and
it may be a possible problem with the site. There has been a
workaround to get to the TLC site that has worked for some, but not
others. One workaround is to either try to access the IP address (207.8.64.130),
or www.nimbusnetworks.com
and then try to access the TLC site, another one is to try putting the IP
address and www.teamlambchop.com
in your OS's HOSTS file. It may be worth a shot.
News From Berkeley
They are apparently still having problems with the cgi calls on the new
web server...they have redirected cgi calls for stats etc to a different
machine. This has caused some problems in some addons such as
SetiSpy in trying to view personal stats. Roelof has a work around
to help get your stats back for right now...you can check that out here.
If you want to check things out on your web browser, instead of pointing
to setiathome.ssl.berkeley.edu for your cgi calls, use iosef.ssl.berkeley.edu.
They also updated the technical
news page with some more information:
June 11, 2001
The RAID cards have continued to give
us problems. Fortunately the new disk architecture (with mirroring
across controllers) has prevented downtime. Unfortunately, there has
again been corruption of the index that controls the order that
workunits are sent. We rebuilt the index on Friday. We're considering
turning off write caching on the disk controllers to prevent further
problems of this sort. We're hoping that won't cause too much of a
performance hit.
We've also (finally) found and eliminated
the bug that caused the occasional flood of "duplicate work
unit" messages.
Notice that last line....I am sure that many of you are
rejoicing over that news!
Finally there will be a server outage tomorrow (Wednesday
the 13th) for some more work on their databases:
Tomorrow (June 13) at 10:00am PDT
there will be a data server outage to replace two disk controller cards
on our science database. The outage should last about an hour.
Petition
There has been a petition
floating around concerning what some call a lack of activity over at the
S@H offices... I personally believe that the petition is a bit
misguided. Eric Korpela came through today with a response to the
petition on alt.sci.seti. I will let him tell you about it all:
>It's a respectful
petition. I hope it's feasible to implement the
>recommendations ie there's the manpower to do it.
Unfortunately, it isn't yet feasable to do much of what the petition
requests.
You may not have noticed, but we've had a significant amount of problems
and down time in the last month, including restoring both the online
science database and the "master" database from backup.
That has basically put us at a standstill in postprocessing for more
than a month. A couple months ago I would have projected that we'd
be at the point of having click-plots for all of the data analyzed so
far, and would be well on the way to generating maps of double and
triple detections. At this point, I'd say we're still months away
from that point. Because of the server reliability problems we've
been having, we're considering a change to the online server that would
increase reliability at the price of slowing down the post-processing.
In response to the specific points:
1) The question of which of the 2 billion or so signals is
"interesting" depends upon us getting to the point of making
maps of double and triple detections and stellar coincidences. As
the "top 20" pages show, sorting by power or chisquared leads
to a page dominated by entirely uninteresting signals. Every
signal in the top 20 pages is either RFI or a computational error. The
list of potentially interesting signals is currently endless. A
lot of reduction still needs to be done before we can publish a real
list of "interesting" candidates.
2) The science newsletters are a resonable indication of the status of
the postprocessing. We were at a virtual standstill prior to the
start of this year because our databases were overloaded.
Installation of the "master" database machine early this year
has sped postprocessing by orders of magnitude. Currently the
redundancy checkers and the clickplot generators are running. The
next two steps, frequency correction and zone RFI removal are waiting
until there is a significant clean dataset in the master database.
They should start shortly. I expect that a science newsletter will
be released regarding these steps. Following that there will be an
iteration of repeat finders and more complex RFI rejection. Following
the first iteration, we should be able to produce a list of interesting
candidates that has less than a billion items on it.
3) I'm not sure the word numerous really applies to our security
breaches. To date there has been one instance of a comprimise of
the security of our systems (the Alf hack), and one hack that exploited
the RPC the client uses to get user information. I get the
point, however. I will suggest to David that he implement a
password change CGI. Should SETI@home II come to pass there will
also be a client password to prevent unauthorized users from returning
results using your accounts.
4) We've kicked around the idea of milestone recognition for some time,
with the usual idea being small gifts that increasing in value to with
number of work units completed. In general we've been too
overwhelmed with daily operations to implement anything. We've
also considered that providing another incentive to cheat might not be
the best thing to do at this point. I find it difficult to believe
that any milestone recognition would be a source of revenue.
Above all, everyone should be aware of how limited a resource the time
of the members of the team here at Berkeley. Because of other
constraints I've had to reduce my SETI@home time to about 30% (which
tends to be about 20 hours a week). I'm recently back from my
third trip to Korea in the last nine months. Of course my
departure marks the point at which all hell broke loose. I'm about
4 weeks behind in answering SETI@home related email. If any of us
had an extra 8 hours a month to write a newsletter (a much better
estimate than 30 minutes, this email is taking longer than that) we'd do
it. It's not that we aren't appreciative of your collective
efforts. We just don't have the time to express it.
Right now our effort are dominated by keeping S@H running. Post
processing is the second priority, and that's where signal information
you seek would be generated. The third priority is trying to make
sure that SETI@home will continue. None of these tasks are easy.
All of them consume more man hours than we have available.
Well, it's time for me to get back to work. There's a problem with
one of the science database that cause the failure of one program last
week. I should be fixing it...
Eric
Lots of good info in there...
Server Woes
Both TLC and S@H have been having server woes in the past week. The
main TLC site is up now, but it is using a different IP address
(207.8.64.130). Some people may not be able to access this IP
address since it is not listed as the primary or secondary address for the
site. But alas, Hagabard's
mirror server is still up :) The problems on Berkeley's side are a
bit different. Here is what they had to say about it:
June 5, 2001
This morning we replaced the old web
server machine with a much better one. All was well until we realized
the new machine couldn't talk to the user and science databases. We are
still working on the problem, and the solution may involve having to
install new versions of database software to resolve conflicts with the
latest version of Solaris. In the meantime, we installed a second web
server to handle most of the basic CGI calls and are making links to
this auxiliary web server. This is a very temporary solution. Team/stats
pages may not get updated with regularity during this time.
June 4, 2001
Over the weekend the web server stopped working. At this point we're
not exactly sure why - most likely the process table filled up. The good
news is we recently received a brand new dual processor Sun Ultra to
replace the current web server. This switch will happen within the week.
Please note that in the process of switching over user/team stats might
get out of sync for a day or so (this will be strictly cosmetic and have
nothing to do with the actual data in our database). As well, we might
shut off user stats lookups for a short time. There will be notices on
the front page as things progress.
The stats were a bit late today because of two reasons,
the first has to do with the problems with their web server. My
stats scripts pulled the data from the main S@H site. The cgi pages
were offline, so my stats script wasn't able to download any current
data. The other problem was due to my bad VBA coding which due to
the lack of data had a line that wanted to divide by zero. Of
course, Excel coughed up a hairball and stopped the script.
Berkeley set up a different server to handle the cgi pages and I pointed
my script over there to pull stats down later.
Because the stats are pointed to a different server right
now SetiSpy is not currently updating the user stats. They may swich
back the server address to the previous server in the next day or two...If
they do this before I can catch the changeover, the stats for that day
*may* read 0. I will fix that when I am aware (and home) to update
the stats.
BTW - If you are interested, I updated the weekly
stats tonight :).
Art Bell Falloff?
A couple of days ago Team Art Bell slipped from the overall #5 spot to #6
behind Team MacAddict. The reason for
this dropoff was a bit strange and unexpected. The dropoff was due
to a heck of alot of team members "leaving" the team...but it
was not a mass defection. There was a hint of what happened on
alt.sci.seti, explained by Rick Bilyeu:
Some of you may have noticed that
Team Art Bell slipped to 3rd place over the weekend, in the club team
standings. At first I was at a loss to understand how we lost 149
members and their 110808 compleated work units all at once. Here's what
happened (This is from my response at the Fantastic Forum TAB message
boards.)
I hate to reply to my own post but I've nailed the answer to my own
question, with Frank's guidence. At some point over the last 4 days,
Angus, or someone else who has access, clicked on the Remove Inactive
Members link at the Seti@Home, Team Art Bell, group statistics page.
Here are the instructions of what will occur after the group founder (Or
anyone Angus gave the codes to.) enters his e-mail and access code,
taken directly from that page.
Only the founder can remove members from a team. You can determine which
groups you are the founder of by Clicking here. By removing a member,
you will also remove their credit and CPU time contributions to the
group Group members may only be removed if they have been inactive for
more than 2 weeks
So, there you have it. TAB lost over 100,000 completed work units, not
by an individual decision of a member to remove them, but rather the
decision of one individual. If it was Angus, well, he founded the group.
If it was someone else though, like one of the defectors (And I do know
that others had access.) or a hacker from another team, then that person
is one very sorry excuse of a human being.
In the end, I suppose what really
matters is that Berkley was able to analyze the processed data. TAB will
continue to grow. And loyal members, who believe in the project, will
continue to crunch numbers, in the hope of finding further proof that we
are not alone.
Actually to date, the numbers of lost members and work
units are quite hefty....they lost somewhere around 120,000 work units
from their total, and nearly 1,600 members from their team (over a period
of 3 days). I checked out the mechanism of deleting people from a
team, and the team founder needs to individually select each member for
deletion. I originally thought that maybe it was a slip and just hit
an automatic button to delete inactive users, but no...they had to delete
the members individually. (BTW: if a member has been selected for
deletion AND that member has sent in a WU in the past two weeks that
person isn't deleted).
I find it kind of strange and questionable that a team
founder would haphazardly kick members off of a team. I sure hope
that most team founders are sensible and wouldn't do that. The only
reason I could see to kick members off a team is for those members who
were inactive and have a total of 0 WU to their account. Lets say
Caesar would kick out all inactive members from TLC...that would be nearly
2500 or more members! I definitely wouldn't like to see that happen!
SETI Germany a Force
Sneaking up on alot of people has been SETI Germany.
They have been pushing out 4000+ work units for the past couple of
weeks. Don't worry though...TLC's place in the standings are secure
for quite a while, since SETI Germany has less than half the total that
TLC has. A further look sees that SETI Germany the team like the
country is divided. There are two different teams in the top 200
overall. SETI Germany is in 9th place with over 1,400,000 work
units...but there is also a Team SETI Germany
team in 61st place with 260,000 WUs. If the two teams combined, they
would move past Intel for the #8 spot and would rival or even surpass TLC
in daily production!
News From Berkeley
There was an update on the S@H
Technical News Page today. Here is a copy of what they had
to say:
June 1, 2001
What a week. We completed the restore
of our online science database last week. However, in so doing, we also
restored a corrupt index that we had recently repaired. So we had to
re-repair it. We also needed to run some update jobs on the DB, as it
was now inconsistent with both of our ondisk workunit and result queues.
We are now running a final check to make sure that everything is
consistent. The server is functioning normally, but the splitters are
temporarily off. This means that the workunit queue is static and this
makes for a small chance that a fast user may get a duplicate workunit.
At the same time, a disk on our offline master
science database crashed. We were able to swap the hardware and quickly
restore this DB. We are now using this database to reject radio
frequency interference (RFI) and look for persistent signals. SETI@home
participants have produced a rich data set! Stay tuned.
We also had a security problem. A malicious
person or persons obtained a number of user email addresses. There was
no server breakin. The perpetrator made use of a hole in our
client/server communications protocol. They obtained around 50,000 email
addresses and posted these on a web site. We see this as a significant
theft of our (and your) data and are pursuing legal action against this
person or persons. If you think you have received email from the
perpetrator, please go here.
We closed the security hole with the side effect that several fields in
the user_info.sah are now blank or zero. We realize that this is a
problem for some very cool third party add-ons and are putting some of
the fields back.
An Analysis of
Benchmarking
It appears that the work of Rat Bastard, Max, Roelof, and hundreds of
people who have posted S@H Benchmarks has spawned some more in depth
analysis of the times of completion for S@H work units. There is a news
post on Slashdot about the study of work unit completion
times. You can check out their study
here.
TLC Site Status
If you are accessing this from tlc.hagabard.com, then you probably already
know that the main TLC site is still not up yet. Jason said that the
server should have been back up sometime around noon today, but that
doesn't seem to be the case. Hagabard
has offered to use the tlc.hagabard.com site as a mirror and we are going
to be mirroring the site there. I need to get the rest of the site
loaded up there though...right now I don't have the Benchmarking part of
the site available to me, but I will get in touch with Max and see if he
can upload what he has. It kind of sucks to have a mention on
Slashdot, and not even have the site up on the normal server :/.
Right now I don't know when the main site will be online, but from now on,
you can use tlc.hagabard.com as a mirror for your stats/news pleasure.
Kiddies Running Amok
The script kiddies that caused the mess with the S@H information have been
running around without supervision again. The UFCF crew ran around
yesterday spamming the alt.sci.seti and sci.astro.seti newsgroups with
porn/hate/anti-religious posts. They apparently were also spamming
the S@H user email addresses that they obtained from their scripts a
couple of days ago.
Food Court Online
OK...well not completely. The Ars
Technica Distributed Computing Food Court Page is now
online. Unfortunately Caesar hasn't changed the link on the Ars
front page, but he has changed it on the site's other pages. I need
to remind him of this oversight :)
-Front Page |