Git Product home page Git Product logo

softwareupdater's People

Contributors

aaaaalbert avatar choksi81 avatar justincappos avatar linkleonard avatar lukpueh avatar monzum avatar vladimir-v-diaz avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

softwareupdater's Issues

software updater restart doesn't wait long enough for new software updater to start

While investigating #756, it appears that when the software updater is restarted, the old software updater doesn't always wait long enough for the new one to start. The wait time is one minute, but the new software updater doesn't signal that it's up and running until after the initialization process (which is largely pointless at the moment: #554). If this initialization time is more than a minute, the old software updater writes a stop file for the newly started one and the old software updater continues and will try to restart the software updater again.

Here's an example of a new software updater doing the downloads in its initialization process, which end up taking more than 90 seconds in this case:

1258628893.11:PID-2935:[Downloading file vessel.restrictions because it
doesn't already exist at download.test/vessel.restrictions
...
1258628986.96:PID-2935:[software_updater_start](do_rsync]) There's a stop file. Exiting.

In that same time the following is logged (unfortunately in this case, it was logged to a separate file, see #766):

1258628889.22:PID-31276:[Attempting to restart software updater.
1258628949.88:PID-31276:[restart_software_updater](restart_software_updater]) Failed to restart software
updater. This instance will continue.

This series of events continuously repeats and, if the initialization is always slow, a successful restart will never happen.

The simple solution would seem to be to increase the restart wait time from 1 minute to something much higher such as 20 minutes (that may sound like a lot, but what if this is a really, really slow system?).

Additionally, addressing #554 would be a good idea especially if the plan ends up being to just eliminate the pointless check (which would really speed things up).

Using writemetainfo.py creates _repy.py files in the current dir...

If you run writemetainfo.py, it will create a bunch of *_repy.py files in the current directory. Instead it probably makes sense to write these to a temp dir (using tempfile.mkdtemp?) and clean it up when done.

See the top of nmmain.py or softwareupdater.py to understand how to set up the repyhelper cache to a different dir.

Be sure to use shutil.rmtree to clean up the cache after finished.

failed to download metainfo traceback

The following tracebacks appear sporadically in several nodes's softwareupdater logs:

1304334016.57:PID-4243:Traceback (most recent call last):
File "softwareupdater.py", line 167, in safe_download File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 248, in retrieve
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/socket.py", line 309, in read
timeout: timed out
1304334016.57:PID-4243:[Failed to download http://seattle.cs.washington.edu/couvb/updatesite/0.1/metainfo
1304334016.57:PID-4243:do_rsync New metainfo not signed correctly. Not updating.

and

1304376470.34:PID-4243:Traceback (most recent call last):
File "softwareupdater.py", line 167, in safe_download
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 222, in retrieve
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 190, in open
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 325, in open_http
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/httplib.py", line 856, in endheaders
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/httplib.py", line 728, in _send_output
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/httplib.py", line 695, in send
File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/httplib.py", line 679, in connect
IOError: socket error timed out
1304376470.34:PID-4243:[Failed to download http://seattle.cs.washington.edu/couvb/updatesite/0.1/metainfo
1304376470.34:PID-4243:Traceback (most recent call last):
File "softwareupdater.py", line 830, in
File "softwareupdater.py", line 753, in main
File "softwareupdater.py", line 251, in do_rsync
File "/home/uw_seattle/seattle/seattle_repy/emulfile.py", line 144, in emulated_open
File "/home/uw_seattle/seattle/seattle_repy/emulfile.py", line 316, in init
IOError: Errno 2 No such file or directory: '/tmp/tmp7g1VKB/metainfo'

The updater should probably just log that the download timed out and not give a traceback for this sort of thing.

Software updater has failures when logging information

For some reason the software updater writes information to stderr. This causes problems because we close the stream.

testbed-mac:v2 gribble$ cat softwareupdater.old 1247510403.0:PID-36232:Traceback (most recent call last): File "softwareupdater.py", line 687, in <module> File "softwareupdater.py", line 388, in init File "softwareupdater.py", line 492, in fresh_software_updater ValueError: I/O operation on closed file

software updater needs better exception logging

There are multiple calls to servicelogger.log(str(e)) in except blocks of softwareupdater.py. This does not log any traceback information so it is difficult to determine the cause of unexpected errors.

software updater needs to sleep in potentially fast loops

There are two cases in softwareupdater.py where there could potentially be extremely fast looping, either intentionally for small periods of time or unintentionally if an error occurred in just the wrong place. We should add a sleep in both of these cases to minimize the impact of the many loop repetitions.

The first case is in software_updater_start() where there is a loop that repeatedly checks for a file to be deleted.

The second case is in the try/except which wraps main(). If an exception happened early enough in main() and the exception was recurring, it could cause fast looping.

Adding small periods of sleep'ing is only to minimize the impact on clients in these cases. Were these situations to actually occur for an extended period of time, there would be a critical bug causing it.

software updater code's dry run on startup is useless as currently implemented

The software updater has an init() method that, among other things, claims to be for the purpose of making sure everything works fine before it indicates to a previous software updater that started it that the new one is now taking over so the old software updater can exit. However, the part of the code in init() that does the runthrough of an update is wrapped in a try/except that pass's on all exceptions, so this could never cause the newly-started software updater to exit.

I'll probably remove the code related to calling do_rsync() from the init() method rather than change the code to cause the software updater to exit if the rsync fails. There could be an argument for making it exit, but I think if we wanted to do that we'd want to be careful about the types of exceptions that cause it to exit.

uncaught exception in software updater...

We see this traceback. This isn't a critical error and seems to be an isolated case.

1249379434.79:PID-10887:Traceback (most recent call last):
  File "softwareupdater.py", line 74, in safe_download
  File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
  File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/urllib.py", line 248, in retrieve
  File "/vservers/.vref/planetlab-f8-i386/usr/lib/python2.5/socket.py", line 309, in read
timeout: timed out

Software updater CPU use at 100[pct]

We had a bug report from Justin P. Rohrer today that said the following:

I've been running Seattle on my mac laptop since GEC 4 with no issues, until today when the softwareupdater.py script started using 100% of my CPU. If I kill it it just restarts after a few minutes and does the same thing. Do you know what could be causing this? Coincidentally I was also pushing Seattle out to the GpENI nodes today via planetlab slices, but I didn't touch the installation on my machine.

This is a critical problem which we need to address.

software updater needs more detailed logging

The software updater needs to log more details about its actions. Currently it will be hard to debug issues such as #522 because really only exceptions are being logged and this won't provide much information if there is a bug that does not result in an exception.

Softwareupdater doesn't seem to delete file, when the update deletes it.

I recently did a push on the betabox where I had added a lib file. After the push the softwareupdater downloaded the new file, however after a while a pushed another version and deleted the lib file in order to roll back. The new added file was not deleted even though it no longer existed in the updatesite. (That is the new lib file did not exist in /var/www/updatesite where all the files are stored.)

Log in softwareupdater after the first push with the new lib file.

1284067262.14:PID-29183:[Downloading file ShimStackInterface.repy because it doesn't already exist at ./ShimStackInterface.repy
1284067266.91:PID-29183:[do_rsync](do_rsync]) Downloading file nmmain.py because the hash changed.
1284067268.11:PID-29183:[Downloading file nmclient.repy because the hash changed.
1284067278.16:PID-29183:[do_rsync](do_rsync]) Downloading file sockettimeout.repy because the hash changed.
1284067280.31:PID-29183:[Downloading file shims.log because it doesn't already exist at ./shims.log
1284067284.75:PID-29183:[do_rsync](do_rsync]) Downloading file nmadvertise.py because the hash changed.
1284067286.73:PID-29183:[Updating files: ['metainfo', 'ShimStackInterface.repy', 'nmmain.py', 'nmclient.repy', 'sockettimeout.repy', 'shims.log', 'nmadvertise.py'](do_rsync])
1284067286.75:PID-29183:[Stopping the nodemanager.
1284067286.75:PID-29183:[restart_client](restart_client]) Starting the nodemanager.

The new lib file that was added was ShimStackInterface.repy

After I undid the change and ran another update, here is the log in the softwareupdater log:

1284074652.02:PID-29183:[Downloading file nmmain.py because the hash changed.
1284074653.39:PID-29183:[do_rsync](do_rsync]) Downloading file nmclient.repy because the hash changed.
1284074662.11:PID-29183:[Downloading file sockettimeout.repy because the hash changed.
1284074666.32:PID-29183:[do_rsync](do_rsync]) Downloading file nmadvertise.py because the hash changed.
1284074671.2:PID-29183:[Updating files: ['metainfo', 'nmmain.py', 'nmclient.repy', 'sockettimeout.repy', 'nmadvertise.py'](do_rsync])
1284074671.21:PID-29183:[Obtained the lock 'seattlenodemanager', it wasn't running.
1284074671.21:PID-29183:[restart_client](restart_client]) Starting the nodemanager.

As can be seen the ShimStackInterface.repy file is not mentioned at all in the second push in the softwareupdater log. When I checked the node manually to see if the file still existed or if it was deleted, I found that the file was still there. I am not sure if this is the desired effect or not.

software updater has no way to do OS-specific updates

The software updater provides the same updates to all OSs. This means, for example, that in order to push out a new version of the python interpreter for Windows, we'd have to push the various Windows python interpreter files to installations on every OS.

software updater will stop trying to restart under some conditions

The software updater recognizes when it needs to restart and sets the restartme variable that is local to main(). The problem is that some exceptions can be passed up from main to the loop that exists in global scope in the if __name__ == '__main__': block. That loop that's in the global scope catches all exceptions and calls main() again.

The problem is that, in the unfortunate event of such an exception occurring between the time of recognizing that a restart needs to be done and a successful restart happening, the software updater forgets that it wanted to restart itself because the restartme variable was local to main.

The solution would seem to be to make restartme global.

Note to self: don't forget the global keyword when fixing this.

Android softwareupdater process restarts very frequently

I noticed while debugging that the Android softwareupdater process restarts a lot. Looking into the softwareupdater cache folder in the file system, I find that there were a few translated *_repy.py files with 0 length. The process ran normally after I deleted these files.

I'm not sure whether or not this is common, as the Seattle version that was running on my device was very old.

software updater test broken: nodemanager restart fails due to hardcoded length in test script

The change of length of the version string in r2414 broke part of the software updater tests (but not the software updater itself). The tests are relying on the version string being an expected length.

The file softwareupdater/test/test_updater.py needs to be changed to allow nmmain.py to have a version string of any length.

Current test output below.

~/workspace/seattle-trunk/tests/updater1$ python test_updater_local.py 
Writing initial metainfo...                                                         
Copying files to /tmp/tmplveiq3/noup folder...                                      
Copying files to wronghash directory                                                
Changing nmmain...                                                                  
Writing updated nmmain.py metainfo...                                               
Warning, 'nmmain.py' has only a hash or file size change but not both (how odd).    
Copying files to /tmp/tmplveiq3/updatenmmain folder...                              
Copying files to corruptmeta folder...                                              
Writing badly signed metainfo                                                       
Copying files to /tmp/tmplveiq3/badkeysig folder...                                 
Changing softwareupdater                                                            
Writing updated softwareupdater.py metainfo                                         
Warning, 'softwareupdater.py' has only a hash or file size change but not both (how odd).
Copying files to /tmp/tmplveiq3/updater folder...                                        
Changing nmmain...                                                                       
Writing metainfo with new valid key                                                      
Warning, 'nmmain.py' has only a hash or file size change but not both (how odd).         
Copying files to /tmp/tmplveiq3/updater_new folder...                                    
Copying back files from noup folder...                                                   
Generating key...                                                                        
Writing config file...                                                                   
Writing vessel dictionary...                                                             
listening for connection on:  128.208.4.16                                               
/tmp/tmplveiq3/noup/                                                                     
Test type: -x URL: http://128.208.4.16:12345/    [ PASS ]                                


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/wronghash/                 
Test type: -e URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/badkeysig/                 
Test type: -x URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/corruptmeta/               
Test type: -e URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/updatenmmain/              
Test type: -u URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/updater/                   
Test type: -u URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
/tmp/tmplveiq3/updater_new/               
Test type: -u URL: http://128.208.4.16:12345/    [ PASS ]


listening for connection on:  128.208.4.16
Initial ps out:                           
justin   30292 29994  0 11:07 pts/3    00:00:00 python softwareupdater.py

  File "nmmain.py", line 117
    version = "0.2a"ng"     
                     ^      
SyntaxError: invalid syntax 
Old softwareupdater returned correctly
After ps out:                         
justin   30584     1 44 11:08 pts/3    00:00:01 python softwareupdater.py 0.4863502199

softwareupdater restart success!
listening for connection on:  128.208.4.16
Waiting 2 minutes for the second update to happen
  File "nmmain.py", line 117                     
    version = "0.5a"ng"                          
                     ^                           
SyntaxError: invalid syntax                      
Second update a success!

Unit Test Failure After Building Component

The unit tests fail on a newly-built softwareupdater component (using the buildscripts):

$ python initialize.py 
Checking out repo from https://github.com/SeattleTestbed/seattlelib_v2 ...
Done!
Checking out repo from https://github.com/SeattleTestbed/portability ...
Done!
Checking out repo from https://github.com/SeattleTestbed/repy_v2 ...
Done!
Checking out repo from https://github.com/SeattleTestbed/common ...
Done!
Checking out repo from https://github.com/SeattleTestbed/nodemanager ...
Done!
$ git build.py -t
git: 'build.py' is not a git command. See 'git --help'.
$ python build.py -t
Building into /home/vlad/projects/seattletestbed/softwareupdater/RUNNABLE
Done building!
$ cd ../RUNNABLE/
$ python utf.py -a
Testing module: softwareupdaters
    Running: ut_softwareupdaters_testupdaterlocal.py            [ FAIL ]
--------------------------------------------------------------------------------
Standard error :
..............................Produced..............................
Traceback (most recent call last):
  File "ut_softwareupdaters_testupdaterlocal.py", line 295, in main
Exception: [do_rsync] Unable to update ntp time. Not updating.
Test type: -u URL: http://128.238.64.165:12345/    [ FAIL ]
nmmain.py was supposed to be updated, but was not included in the updatedlist
metainfo was supposed to be updated, but was not included in the updatedlist
nmmain.py should have been updated, but wasn't
metainfo should have been updated, but wasn't



..............................Expected..............................
None
--------------------------------------------------------------------------------

software updater tests fail on freebsd

The software updater tests run by the continuous build fail on freebsd.

Initial ps out:

Failure to start initially
Old softwareupdater returned correctly
After ps out:

New updater failed to start!

It seems likely this is caused by the use of 'ps -ef', which on bsd doesn't work as intended. The current test uses 'ps -aww' for Darwin. I'm inclined to change it to 'ps aux' for linux/bsd and when testbed-mac is online again see if 'ps aux' works fine there, as well.

software updater tests fail on testbed-mac

The software updater tests have been failing when I run them on testbed-mac. I didn't keep any previous logs with this, it looks like, but the error was "First software updater never died." raised from line 325 of test_updater_local.py.

Software updater crash on several nodes

Several of our nodes (117 at time of writing) are stuck on 0.1.1b and 0.1.1c. Inspection of some of these nodes revealed that the software updater was actually not running.

We found this error message in cronlog.txt on the nodes that we inspected:

Full debugging traceback:
  "/home/uw_seattle/seattle/seattle_repy/emulcomm.py", line 694, in run
  "softwareupdater.repyhelpercache/ntp_time_repy.py", line 164, in _time_decode_NTP_packet
  "softwareupdater.repyhelpercache/ntp_time_repy.py", line 157, in _time_convert_timestamp_to_float

-------------------------------

### Do the conversion / decoding for NTP.   More details about the
### format of NTP are at RFC 2030 (http://www.ietf.org/rfc/rfc2030.txt)

# this unpacks the data from the packet and changes it to a float
def _time_convert_timestamp_to_float(timestamp):
  integerpart = (ord(timestamp[+ (ord(timestamp[1](0])<<24)))<<16) + (ord(timestamp[+ (ord(timestamp[3](2])<<8))))
  floatpart = (ord(timestamp[+ (ord(timestamp[5](4])<<24)))<<16) + (ord(timestamp[+ (ord(timestamp[7](6])<<8))))
  return integerpart + floatpart / float(2**32)


def _time_decode_NTP_packet(ip, port, mess, ch):
  # I got a time response packet.   Remember it and notify that I got it.
  mycontext[ mycontext['ntp_time_got_time']('ntp_time_received_times'].append(_time_convert_timestamp_to_float(mess[40:48]))
) = True

softwareupdater should default to HTTPS

The current update URL in the softwareupdater uses the HTTP protocol to contact the update server for every single download. This evokes a "302 Found" status code by the server, followed by a redirect to the HTTPS version of the URL. From there, the client can download the actual file (like metainfo and whatever else is to be updated). The redirection from HTTP to HTTPS is transparently handled by urllib.

This causes two times the load for no good reason on all involved machines, and effectively doubles the delay in updating due to round-trip times for every client. We should make the softwareurl use HTTPS instead.

(Note: Ignore that the software URL points to an awkward non-NYU site. It is manually adapted for every clearinghouse setup, see wiki:Local/VersionDeployment.)

software updater logging error "need more than 1 value to unpack"

After updating to 0.1j (at least, currently there's no indication that it happened before the update), some software updaters are logging the error "need more than 1 value to unpack" to the file softwareupdate.old (so, through the servicelogger).

This is likely coming from logging a str(e) for a "ValueError: need more than 1 value to unpack" exception (see #525). Such an exception can be caused in python through the following:

c = [0]
a, b = c

There may be other cases, but that is the only way I'm aware of and so is the likely culprit (that is, unpacking a list using that syntax when there is only one item in the list and multiple items on the right).

Exception raised for software updater test.

On both ubuntu 8.10 and testbed-mac when attempting to run the software updater test locally, as described in https://seattle.cs.washington.edu/wiki/UpdaterUnitTests I preformed the following steps:

python preparetest.py -t ../seattle_test
cd ../seattle_test
cp -r ../trunk/assignments/webserver/* .
cp -r ../trunk/softwareupdater/test/* .
python test_updater_local.py

On ubuntu I got:

Writing initial metainfo...
Copying files to /tmp/tmpwVdj3s/noup folder...
Copying files to wronghash directory
Changing nmmain.py to version: 0.999a
Writing updated nmmain.py metainfo...
Copying files to /tmp/tmpwVdj3s/updatenmmain folder...
Copying files to corruptmeta folder...
Writing badly signed metainfo
Copying files to /tmp/tmpwVdj3s/badkeysig folder...
Changing softwareupdater
Writing updated softwareupdater.py metainfo
Warning, 'softwareupdater.py' has only a hash or file size change but not both (how odd).
Copying files to /tmp/tmpwVdj3s/updater folder...
Changing nmmain.py to version: 1234
Writing metainfo with new valid key
Copying files to /tmp/tmpwVdj3s/updater_new folder...
Copying back files from noup folder...
Generating key...
Writing config file...
Writing vessel dictionary...
listening for connection on:  192.168.1.101
Traceback (most recent call last):
  File "test_updater_local.py", line 252, in main
  File "test_updater_local.py", line 125, in runRsyncTest
  File "test_updater_local.py", line 225, in run_webserver
Exception: Webserver exitted with code -15

On testbed-mac I got:

Writing initial metainfo...
Copying files to /tmp/tmp9tURmB/noup folder...
Copying files to wronghash directory
Changing nmmain.py to version: 0.999a
Writing updated nmmain.py metainfo...
Copying files to /tmp/tmp9tURmB/updatenmmain folder...
Copying files to corruptmeta folder...
Writing badly signed metainfo
Copying files to /tmp/tmp9tURmB/badkeysig folder...
Changing softwareupdater
Writing updated softwareupdater.py metainfo
Warning, 'softwareupdater.py' has only a hash or file size change but not both (how odd).
Copying files to /tmp/tmp9tURmB/updater folder...
Changing nmmain.py to version: 1234
Writing metainfo with new valid key
Copying files to /tmp/tmp9tURmB/updater_new folder...
Copying back files from noup folder...
Generating key...
Writing config file...
Writing vessel dictionary...
Traceback (most recent call last):
  File "test_updater_local.py", line 252, in main
  File "test_updater_local.py", line 125, in runRsyncTest
  File "test_updater_local.py", line 225, in run_webserver
Exception: Webserver exitted with code 98

Note: we've tested r3050 and it's ready to push once this is fixed

cleanup and improve software updater tests

The software updater tests need to be cleaned up, made easier to run, and made possible to automate (at least some of them should be possible to automate on some OSs).

It needs to be clearer when there is an error, as they currently will say success when only parts of the test were successful even though other parts failed, e.g. #531. At a minimum, if it is too difficult to state success with absolute certainty, the output of the tests should make clear that apparent success does not mean actual success and provide, in the output, the information the human running the tests needs in order to determine whether the test was truly successful.

software updater "remote location" tests broken

The software updater tests that use a remote location appear to be broken. I think this is just a case of bad or possibly non-deterministic tests rather than the software updater being broken, as I manually tested the software updater quite a bit yesterday.

I'm not considering this a dup of #532 because it is more than just cleanup, they don't actually seem to work. More specifically, some of the individual tests in the beginning pass, but failure seems to happen when trying to restart the software updater.

Here's my output following the instructions at UpdaterUnitTests (I'm not sure I'd consider these unit tests):

/tmp/remotetestfiles/mytestdir$ python test_runupdate.py http://seattle.cs.washington.edu/jsamuel/updatertests/9/
Test type: -x URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/noup/    [ PASS ]


Test type: -e URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/wronghash/    [ PASS ]


Test type: -x URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/badkeysig/    [ PASS ]


Test type: -e URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/corruptmeta/    [ PASS ]


Test type: -u URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/updatenmmain/    [ PASS ]


Test type: -u URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/updater/    [ PASS ]


Test type: -u URL: http://seattle.cs.washington.edu/jsamuel/updatertests/9/updater_new/    [ PASS ]


Initial ps out:
justin   13687 13481  0 16:39 pts/5    00:00:00 python softwareupdater.py

Return code is: None
Wrong return code! None
Second update failed to happen within 2 minutes

Webserver broken or incompatible with the software updater tests.

Running test_update_local.py on OSX results in the following:

Writing initial metainfo...
Copying files to /var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/noup folder...
Copying files to wronghash directory
Changing nmmain...
Writing updated nmmain.py metainfo...
Warning, 'nmmain.py' has only a hash or file size change but not both (how odd).
Copying files to /var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/updatenmmain folder...
Copying files to corruptmeta folder...
Writing badly signed metainfo
Copying files to /var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/badkeysig folder...
Changing softwareupdater
Writing updated softwareupdater.py metainfo
Warning, 'softwareupdater.py' has only a hash or file size change but not both (how odd).
Copying files to /var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/updater folder...
Changing nmmain...
Writing metainfo with new valid key
Warning, 'nmmain.py' has only a hash or file size change but not both (how odd).
Copying files to /var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/updater_new folder...
Copying back files from noup folder...
Generating key...
Writing config file...
Writing vessel dictionary...
listening for connection on: 192.168.1.127
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
/var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpJ4xD5W/noup/
client timed out
Test type: -x URL: http://192.168.1.127:12345/ [ FAIL ]
Unexpected Rsync Exception :(
Traceback (most recent call last):
File "test_rsync.py", line 107, in test_rsync
File "/Users/adadgar/Projects/seattle/trunk/soft/softwareupdater.py", line 1795, in do_rsync
IOError: 2 No such file or directory: '/var/folders/OF/OFxw4o5RGmKUU50to4TMKU+++TI/-Tmp-/tmpzF-PlC/metainfo'

client timed out
client timed out
listening for connection on: 192.168.1.127
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
client timed out
Traceback (most recent call last):
File "test_updater_local.py", line 379, in
main()
File "test_updater_local.py", line 254, in main
runRsyncTest('-e', tmpserver + '/wronghash/')
File "test_updater_local.py", line 124, in runRsyncTest
webserver = run_webserver(updatefolder)
File "test_updater_local.py", line 224, in run_webserver
raise Exception('Webserver exitted with code ' + str(webserver.poll()))
Exception: Webserver exitted with code 4

Also the error message should be changed from "client timed out" to something which actually describes the problem.

IOError Tracebacks

During the development of my log analysis program I found that all of the beta nodes were logging some exception tracebacks. This specific category of log entries (IOErrors) has been seen about 80 times on all beta nodes since July 23rd, which was when the beta nodes were reinstalled. The nodes are using version 0.1r-beta-r3519.

The general IOError's seem to fall into two sub-categories. One being a 'No such file or directory' error and the rest being various socket errors, both seen about 40 times. Below I have attached one of each of the log entries in question:

1280576739.02:PID-31937:Traceback (most recent call last):
File "softwareupdater.py", line 830, in
File "softwareupdater.py", line 753, in main
File "softwareupdater.py", line 251, in do_rsync
File "/home/uw_seattle/seattle/seattle_repy/emulfile.py", line 144, in emulated_open
File "/home/uw_seattle/seattle/seattle_repy/emulfile.py", line 316, in init
IOError: 2 No such file or directory: '/tmp/tmpzFAASg/metainfo'

1280590621.56:PID-9848:Traceback (most recent call last):
File "softwareupdater.py", line 167, in safe_download
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/urllib.py", line 222, in retrieve
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/urllib.py", line 190, in open
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/urllib.py", line 325, in open_http
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/httplib.py", line 856, in endheaders
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/httplib.py", line 728, in _send_output
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/httplib.py", line 695, in send
File "/vservers/.vref/onelab-f8-i386/usr/lib/python2.5/httplib.py", line 679, in connect
IOError: socket error timed out

It seems that the missing directory error typically occurs right after a socket time out error.

Among the socket errors a majority of them are entries that indicate a time out, but a few others had "IOError: socket error (-3, 'Temporary failure in name resolution')" and "IOError: socket error (113, 'No route to host')" as their last line but are also from softwareupdater.py and from safe_download on line 167.

The IP addresses of the nodes in question (ie. all beta nodes) are:
192.26.179.68, 210.123.39.168, 192.33.90.196, 140.192.249.203, 133.9.81.166, 139.80.206.132, 200.0.206.203, 137.148.16.10, 128.112.139.28, 140.109.17.180 and 131.193.34.21

Add user-friendly error message to `generatekeys.py`

While generatekeys.py isn't the most-often used script in our repos, it should nevertheless print meaningful information when called with wrong or too little arguments. Let usage() functions in other code inspire you.

albert$ python generatekeys.py 
Traceback (most recent call last):
  File "generatekeys.py", line 14, in <module>
    pubfn = sys.argv[1]+'.publickey'
IndexError: list index out of range
albert$ python generatekeys.py keyname abcdef
Traceback (most recent call last):
  File "generatekeys.py", line 18, in <module>
    keylength = int(sys.argv[2])
ValueError: invalid literal for int() with base 10: 'abcdef'

software updater cannot perform update if servicelogger breaks

The software updater uses the servicelogger for logging messages. If the servicelogger breaks (that is, if it reliably throws an exception whenever it's initialized or when it's used), the software updater will not be able to run.

All usages of the servicelogger should be wrapped in try/except blocks and the software updater should default to printing to stderr if the servicelogger fails. This way nodes with a broken servicelogger can still update (e.g. to fix the servicelogger).

Note: there are no known cases of this having been a problem.

Softwareupdater test needs to be migrated to repyV2

The old softwareupdater test needs to be properly migrated to repyV2. The old test was using a preprocessed webserver to run the test which was using registerhttpcallback. This is not viable and needs to be changed such that the new webserver is using httpserver and is not preprocessed.

generatekeys.py gives incorrect key values

when we run generatekeys.py over an existing key, it gives some weird keys! below is observed behavior, as shared by albert as well:-

python generatekeys.py abc 4096
Generating key files called 'abc.publickey' and 'abc.privatekey' of length 4096.
This may take a moment...
Success!
$ wc abc*
0 3 2469 abc.privatekey
0 2 1239 abc.publickey
0 5 3708 total
$ openssl md5 abc*
MD5(abc.privatekey)= a510da7973bce6e114a7277ab1d7c122
MD5(abc.publickey)= f700ed9253d5f4f54d5054c89d39fe9e

Okay, that's the 4096-bit key's stats. Recreate using the same key name but shorter (!) length:

$ python generatekeys.py abc 1024
Generating key files called 'abc.publickey' and 'abc.privatekey' of length 1024.
This may take a moment...
Success!
$ wc abc*
0 5 2469 abc.privatekey
0 2 1239 abc.publickey
0 7 3708 total
$ openssl md5 abc*
MD5(abc.privatekey)= f2d4f2f15071705b11a1efa926f8ff94
MD5(abc.publickey)= 1c25c843d8dd171e6f3280650bfecc3c

Ouch, file size didn't change, but contents changed. This must not happen.

For comparison, a proper 1024-bit key has much smaller files:

$ python generatekeys.py def 1024
Generating key files called 'def.publickey' and 'def.privatekey' of length 1024.
This may take a moment...
Success!
$ wc def*
0 3 619 def.privatekey
0 2 314 def.publickey
0 5 933 total

lots of software updater requests to the webserver in a small time period.

We are seeing lots of requests by the software updater in a short time period. For example, the requests below are all within a 2 second period. This is almost certainly an indication of other, yet to be discovered problems.

128.208.1.186 2009:11:10:01
128.208.4.30 2009:11:10:01
128.208.1.214 2009:11:10:01
128.208.1.225 2009:11:10:01
128.208.4.178 2009:11:10:01
128.208.1.246 2009:11:10:01
128.208.1.235 2009:11:10:01
128.208.1.158 2009:11:10:01
128.208.1.150 2009:11:10:01
128.208.1.185 2009:11:10:01
128.208.1.169 2009:11:10:01
128.208.1.217 2009:11:10:01
128.208.1.239 2009:11:10:01
128.208.1.167 2009:11:10:01
128.208.1.115 2009:11:10:01
128.208.1.183 2009:11:10:01
128.208.1.152 2009:11:10:01
128.208.1.240 2009:11:10:01
128.208.1.131 2009:11:10:01
128.208.1.108 2009:11:10:01
128.208.1.166 2009:11:10:01
128.208.1.153 2009:11:10:01
128.208.1.117 2009:11:10:01
128.208.1.232 2009:11:10:01
128.208.1.161 2009:11:10:01
128.208.1.121 2009:11:10:01
128.208.1.247 2009:11:10:01
128.208.1.168 2009:11:10:01
128.208.1.156 2009:11:10:01
128.208.1.222 2009:11:10:01
128.208.1.179 2009:11:10:01
128.208.1.199 2009:11:10:01
128.208.1.234 2009:11:10:02
128.208.6.165 2009:11:10:02
128.208.1.249 2009:11:10:02
128.208.1.157 2009:11:10:02
128.208.1.130 2009:11:10:02
128.208.1.241 2009:11:10:02
128.208.1.238 2009:11:10:02
128.208.1.159 2009:11:10:02
128.208.1.135 2009:11:10:02
128.208.1.231 2009:11:10:02
128.208.1.224 2009:11:10:02
128.208.1.221 2009:11:10:02
128.208.1.132 2009:11:10:02
128.208.1.163 2009:11:10:02
128.208.1.114 2009:11:10:02

Adding dist and nodemanager repos to softwareupdater

Currently softwareupdater has the following files:-

-rw-r--r-- 1 clienttuf clienttuf 31189 Apr 26 23:20 softwareupdater.py
drwxr-xr-x 2 clienttuf clienttuf 4096 Apr 26 23:20 scripts
-rw-r--r-- 1 clienttuf clienttuf 654 Apr 26 23:20 generatekeys.py
-rw-r--r-- 1 clienttuf clienttuf 4543 Apr 26 23:20 writemetainfo.py
drwxr-xr-x 2 clienttuf clienttuf 4096 Apr 26 23:20 test

can we add dist and nodemanager repository as it uses preparetest.py of dist and runonce.py of nodemanager repo to create a local softwareupdater repository.

[Newcomer] Integration test for softwareupdater

We should monitor the state of the software updater to make sure that updates are actually being done. When the apache HTTP -> HTTPS redirects were set up, the softwareupdater requests were being redirected to trac, and it has not been detected for two months.

This can perhaps be done by downloading the Linux installer on Blackbox and directly running softwareupdater.py without installing Seattle to not interfere with other integration tests.

generated repy files in updater metainfo cause a nodemanager restart on each update check

The premature push of 0.1j included files generated by repyhelper (as shown in #518). These *_repy.py files, because they get regenerated and thus modified on the client, appear to be new updates every time the software updater checks for updates. As a result, they are downloaded by the software updater and the nodemanager is restarted because files have changed.

This update check and restart would normally happen about every half an hour. However, the /home/couvb/public_html/updatesite/0.1 directory on seattle.cs had already been renamed, preventing updates from non-updated clients until the prematurely pushed 0.1j is fully triaged. Thus, this is not affecting current clients, whether updated or not.

If 0.1j is fine and decided to be again made available for update to, the current metainfo that lists the *_repy.py files would need to be recreated to exclude these files. The better solution may be to consider this incentive for going straight to a 0.1k release once #518 is fixed.

ValueError Traceback in software updater

My log analysis script just found a beta node that has been logging a lot of ValueError Tracebacks recently. Below is a copy of one of the errors:

1281731565.43:PID-3786:Traceback (most recent call last):
File "softwareupdater.py", line 830, in
File "softwareupdater.py", line 753, in main
File "softwareupdater.py", line 277, in do_rsync
File "softwareupdater.repyhelpercache/signeddata_repy.py", line 438, in signeddata_shouldtrust
File "softwareupdater.repyhelpercache/signeddata_repy.py", line 359, in signeddata_split
ValueError: need more than 1 value to unpack

I didn't look into why the code is setup the way it is but it looks like the offending line of code is trying to split, what it thinks is, a large string with "!" delimiters into 6 parts (using rsplit) but the string has less than 6 exclamations for some reason.

The node that I am seeing this on has an IP of 160.193.163.106

It also for some reason wasn't successfully upgraded to 0.1s-beta-r4015, it is still using version 0.1r-beta-r3519

Software updater URL must end in '/'

The software updater gives misleading error messages if you give it a bad URL. You get:

[do_rsync] New metainfo not signed correctly. Not updating.

We observed this error due to the lack of a trailing '/' on the softwareurl in softwareupdater.py. We should fix this!!!

software updater tests fail on mac when run with preparetest -t

The software updater unit tests fail on Mac when the directory is prepared with preparetest -t. This is likely partially the result of timing, but even with a large timeout, the problem still occurs.

See the comments in ticket #400 for more information.

software updater metainfo not signed correctly

From looking at the software updater logs, it looks like all the nodes have been logging "[do_rsync] New metainfo not signed correctly. Not updating" since 2/26/2011. Additionally, we have a number of nodes running an old version (0.1s), and those nodes seem to have been installed more recently than that date.

softwareupdater.py crashes due to a missing filename needed many subprocesses deep

The traceback is:

Traceback (most recent call last):
File "softwareupdater.py", line 1711, in
servicelogger.init("softwareupdater")
File "/home/zack/Research/testInstall/seattle_repy/servicelogger.py", line 807, in init
servicevessel = get_servicevessel()
File "/home/zack/Research/testInstall/seattle_repy/servicelogger.py", line 764, in get_servicevessel
vesseldict = persist.restore_object("vesseldict")
File "/home/zack/Research/testInstall/seattle_repy/persist.py", line 124, in restore_object
raise ValueError, "Filename '"+filename+"' missing."
ValueError: Filename 'vesseldict' missing.

Race to read vesseldict...

I don't think the persist module is safe for multiple processes to use concurrently. When doing start_seattle.sh I sometimes see:

Traceback (most recent call last):
  File "softwareupdater.py", line 69, in safe_servicelogger_init
  File "/Users/justincappos/Desktop/seattle_repy/servicelogger.py", line 2574, in init
  File "/Users/justincappos/Desktop/seattle_repy/servicelogger.py", line 2527, in get_servicevessel
  File "/Users/justincappos/Desktop/seattle_repy/persist.py", line 187, in restore_object
OSError: [2](Errno) No such file or directory: './vesseldict.tmp'

Perhaps the solution is to copy the file before restoring? The software updater will only need to read the file.

Improve output for softwareupdater tests

When the tests fail, it is often not obvious why from the test outputs. They need to be improved such that when an error does happen, detailed information is given.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.