Git Product home page Git Product logo

wimleers / fileconveyor Goto Github PK

View Code? Open in Web Editor NEW
341.0 24.0 95.0 8.84 MB

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.

Home Page: https://wimleers.com/fileconveyor

License: The Unlicense

Python 100.00%

fileconveyor's Introduction

Description
-----------
File Conveyor is designed to discover new, changed and deleted files via the
operating system's built-in file system monitor. After discovering the files,
they can be optionally be processed by a chain of processors – you can easily
write new ones yourself. After files have been processed, they can also
optionally be transported to a server.

Discovery happens through inotify on Linux (with kernel >= 2.6.13), through
FSEvents on Mac OS X (>= 10.5) and through polling on other operating systems.

Processors are simple Python scripts that can change the file's base name (it
is impossible to change the path) and apply any sort of processing to the
file's contents. Examples are image optimization and video transcoding.

Transporters are simple threaded abstractions around Django storage systems.

For a detailed description of the innards of file conveyor, see my bachelor
thesis text (find it via http://wimleers.com/tags/bachelor-thesis).

This application was written as part of the bachelor thesis [1] of Wim Leers
at Hasselt University [2].


[1] http://wimleers.com/tags/bachelor-thesis
[2] http://uhasselt.be/


<BLINK>IMPORTANT WARNING</BLINK>
--------------------------------
I've attempted to provide a solid enough README to get you started, but I'm
well aware that it isn't superb. But as this is just a bachelor thesis, time
was fairly limited. I've opted to create a solid basis instead of an extremely
rigourously documented piece of software. If you cannot find the answer in the
README.txt, nor the INSTALL.txt, nor the API.txt files, then please look at
my bachelor thesis text instead. If neither of that is sufficient, then please
contact me.


Upgrading
---------
If you're upgrading from a previous version of File Conveyor, please run
upgrade.py.



==============================================================================
| The basics                                                                 |
==============================================================================

Configuring File Conveyor
-------------------------
The sample configuration file (config.sample.xml) should be self explanatory.
Copy this file to config.xml, which is the file File Conveyor will look for,
and edit it to suit your needs.
For a detailed description, see my bachelor thesis text (look for the
"Configuration file design" section).

Each rule consists of 3 components:
- filter
- processorChain
- destinations

A rule can also be configured to delete source files after they have been
synced to the destination(s).

The filter and processorChain components are optional. You must have at least
one destination.
If you want to use File Conveyor to process files locally, i.e. without
transporting them to a server, then use the Symlink or Copy transporter (see
below).


Starting File Conveyor
----------------------
File Conveyor must be started by starting its arbitrator (which links
everything together; it controls the file system monitor, the processor
chains, the transporters and so on). You can start the arbitrator like this:
  python /path/to/fileconveyor/arbitrator.py


Stopping File Conveyor
----------------------
File Conveyor listens to standard signals to know when it should end, like the
Apache HTTP server does too. Send the TERMinate signal to terminate it:
  kill -TERM `cat ~/.fileconveyor.pid`

You can configure File Conveyor to store the PID file in the more typical
/var/run location on *nix:
* You can change the PID_FILE setting in settings.py to 
/var/run/fileconveyor.pid. However, this requires File Conveyor to be run with
root permissions (/var/run requires root permissions).
* Alternatively, you can create a new directory in /var/run which then no
longer requires root permissions. This can be achieved through these commands:
 1. sudo mkdir /var/run/fileconveyor`
 2. sudo chown fileconveyor-user /var/run/fileconveyor
 3. sudo chown 700 /var/run/fileconveyor
Then, you can change the PID_FILE setting in settings.py to
/var/run/fileconveyor/fileconveyor.pid, and you won't need to run File 
Conveyor with root permissions anymore.


File Conveyor's behavior
------------------------
Upon startup, File Conveyor starts the file system monitor and then performs a
"manual" scan to detect changes since the last time it ran. If you've got a
lot of files, this may take a while.

Just for fun, type the following while File Conveyor is syncing:
  killall -9 python
Now File Conveyor is dead. Upon starting it again, you should see something like:
  2009-05-17 03:52:13,454 - Arbitrator                - WARNING  - Setup: initialized 'pipeline' persistent queue, contains 2259 items.
  2009-05-17 03:52:13,455 - Arbitrator                - WARNING  - Setup: initialized 'files_in_pipeline' persistent list, contains 47 items.
  2009-05-17 03:52:13,455 - Arbitrator                - WARNING  - Setup: initialized 'failed_files' persistent list, contains 0 items.
  2009-05-17 03:52:13,671 - Arbitrator                - WARNING  - Setup: moved 47 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
  2009-05-17 03:52:13,672 - Arbitrator                - WARNING  - Setup: moved 0 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
As you can see, 47 items were still in the pipeline when File Conveyor was
killed. They're now simply added to the pipeline queue again and they will be
processed once again.


The initial sync
----------------
To get a feeling of File Conveyor's speed, you may want to run it in the console
and look at its output.


Verifying the synced files
--------------------------
Running the verify.py script will open the synced files database and verify
that each synced file actually exists.




==============================================================================
| Processors                                                                 |
==============================================================================

Addressing processors
---------------------

You can address a specific processor by first specifying its processor module
and then the exact processor name (which is its class name):
- unique_filename.MD5
- image_optimizer.KeepMetadata
- yui_compressor.YUICompressor
- link_updater.CSSURLUpdater

But, it works with third-party processors too! Just make sure the third-party
package is in the Python path and then you can just use this in config.xml:
- MyProcessorPackage.SomeProcessorClass


Processor module: filename
--------------------------
Available processors:
1) SpacesToUnderscores
   Changes a filename; replaces spaces by underscores. E.g.:
     this is a test.txt --> this_is_a_test.txt
2) SpacesToDashes
Changes a filename; replaces spaces by dashes. E.g.:
  this is a test.txt --> this-is-a-test.txt


Processor module: unique_filename
---------------------------------
Available processors:
1) Mtime
   Changes a filename based on the file's mtime. E.g.:
     logo.gif --> logo_1240668971.gif
2) MD5
   Changes a filename based on the file's MD5 hash. E.g.:
     logo.gif --> logo_2f0342a2b9aaf48f9e75aa7ed1d58c48.gif


Processor module: image_optimizer
---------------------------------
It's important to note that all metadata is stripped from JPEG images, as that
is the most effective way to reduce the image size. However, this might also
strip copyright information, i.e. this can also have legal consequences.
Choose one of the "keep metadata" classes if you want to avoid this.
When optimizing GIF images, they are converted to the PNG format, which also
changes their filename.

Available processors:
1) Max
   optimizes image files losslessly (GIF, PNG, JPEG, animated GIF)
2) KeepMetadata
   same as Max, but keeps JPEG metadata
3) KeepFilename
   same as Max, but keeps the original filename (no GIF optimization)
4) KeepMetadataAndFilename
   same as Max, but keeps JPEG metadata and the original filename (no GIF
   optimization)


Processor module: yui_compressor
--------------------------------
Warning: this processor is CPU-intensive! Since you typically don't get new
CSS and JS files all the time, it's still fine to use this. But the initial
sync may cause a lot of CSS and JS files to be processed and thereby cause a
lot of load!

Available processors:
1) YUICompressor
   Compresses .css and .js files with the YUI Compressor


Processor module: google_closure_compiler
-----------------------------------------
Warning: this processor is CPU-intensive! Since you typically don't get new
JS files all the time, it's still fine to use this. But the initial sync may
cause a lot of JS files to be processed and thereby cause a lot of load!

Available processors:
1) GoogleClosureCompiler
   Compresses .js files with the Google Closure Compiler


Processor module: link_updater
------------------------------
Warning: this processor is CPU-intensive! Since you typically don't get new
CSS files all the time, it's still fine to use this. But the initial sync may
cause a lot of CSS files to be processed and thereby cause a lot of load! Note
that this processor will skip processing a CSS file if not all files that are
referenced from it, have been synced to the CDN yet. Which means the CSS files
may need to parsed over and over again until the images have been synced.

It seems this processor is suited for optimization. It uses the cssutils
Python module, which validates every CSS property. This is an enormous slow-
down: on a 2.66 GHz Core 2 Duo, it causes 100% CPU usage every time it runs.
This module also seems to suffer from rather massive memory leaks. Memory
usage can easily top 30 MB on Mac OS X where it would never go over 17 MB
without this processor!

This processor will replace all URLs in CSS files with references to their
counterparts on the CDN. There are a couple of important gotchas to use this
processor module:
 - absolute URLs (http://, https://) are ignored, only relative URLs are
   processed
 - if a referenced file doesn't exist, its URL will remain unchanged
 - if one of the referenced images or fonts is changed and therefor resynced,
   and if it is configured to have a unique filename, the CDN URL referenced
   from the updated CSS file will no longer be valid. Therefor, when you
   update an image file or font file that is referenced by CSS files, you
   should modify the CSS files as well. Just modifying the mtime (by using the
   touch command) is sufficient.
 - it requires the referenced files to be synced to the same server the CSS
   file is being synced to. This implies that all the references files must
   also be synced to the same server, or the file will never get synced!

Available processors:
1) CSSURLUpdater
   Replaces URLs in .css files with their counterparts on the CDN




==============================================================================
| Transporters                                                               |
==============================================================================

Addressing transporters
-----------------------

You can address a specific transporter by only specifying its module:
- cf
- ftp
- cloudfiles
- s3
- sftp
- symlink_or_copy

But, it works with third-party transporters too! Just make sure the
third-party package is in the Python path and then you can just use this in
config.xml:
- MyTransporterPackage


Transporter: FTP (ftp)
----------------------
Value to enter: "ftp".

Available settings:
- host
- username
- password
- url
- port
- path
- key


Transporter: SFTP (sftp)
------------------------
Value to enter: "sftp".

Available settings:
- host
- username
- password
- url
- port
- path


Transporter: Amazon S3
----------------------
Value to enter: "s3".

Available settings:
- access_key_id
- secret_access_key
- bucket_name
- bucket_prefix

More than 4 concurrent connections doesn't show a significant speedup.


Transporter: Amazon CloudFront
------------------------------
Value to enter: "cf".

Available settings:
- access_key_id
- secret_access_key
- bucket_name
- bucket_prefix
- distro_domain_name



Transporter: Rackspace Cloud Files
----------------------------------
Value to enter: "cloudfiles".

Available settings:
- username
- api_key
- container


Transporter: Symlink or Copy
----------------------------
Value to enter: "symlink_or_copy".

Available settings:
- location
- url


Transporter: Amazon CloudFront - Creating a CloudFront distribution
-------------------------------------------------------------------
You can either use the S3Fox Firefox add-on to create a distribution or use
the included Python function to do so. In the latter case, do the following:

>>> import sys
>>> sys.path.append('/path/to/fileconveyor/transporters')
>>> sys.path.append('/path/to/fileconveyor/dependencies')
>>> from transporter_cf import create_distribution
>>> create_distribution("access_key_id", "secret_access_key", "bucketname.s3.amazonaws.com")
Created distribution
    - domain name: dqz4yxndo4z5z.cloudfront.net
    - origin: bucketname.s3.amazonaws.com
    - status: InProgress
    - comment: 
    - id: E3FERS845MCNLE

    Over the next few minutes, the distribution will become active. This
    function will keep running until that happens.
    ............................
    The distribution has been deployed!




==============================================================================
| The advanced stuff                                                         |
==============================================================================

Constants in Arbitrator.py
--------------------------
The following constants can be tweaked to change where File Conveyor stores
its files, or to change its behavior.

RESTART_AFTER_UNHANDLED_EXCEPTION = True
  Whether File Conveyor should restart itself after it encountered an
  unhandled exception (i.e., a bug).
RESTART_INTERVAL = 10
  After how much time File Conveyor should restart itself, after it has
  encountered an unhandled exception. Thus, this setting only has an effect
  when RESTART_AFTER_UNHANDLED_EXCEPTION == True.
LOG_FILE = './fileconveyor.log'
  The log file.
PERSISTENT_DATA_DB = './persistent_data.db'
  Where to store persistent data (pipeline queue, 'files in pipeline' list and
  'failed files' list).
SYNCED_FILES_DB = './synced_files.db'
  Where to store the input_file, transported_file_basename, url and server for
  each synced file.
WORKING_DIR = '/tmp/fileconveyor'
  The working directory.
MAX_FILES_IN_PIPELINE = 50
  The maximum number of files in the pipeline. Should be high enough in order
  to prevent transporters from idling too long.
MAX_SIMULTANEOUS_PROCESSORCHAINS = 1
  The maximum number of processor chains that may be executed simultaneously.
  If you've got CPU intensive processors and if you're running File Conveyor
  on the web server, you'll want to keep this very low, probably at 1.
MAX_SIMULTANEOUS_TRANSPORTERS = 10
  The maximum number of transporters that may be running simultaneously. This
  effectively caps the number of simultaneous connections. It can also be used
  to have some -- although limited -- control on the throughput consumed by
  the transporters.
MAX_TRANSPORTER_QUEUE_SIZE = 1
  The maximum of files queued for each transporters. It's recommended to keep
  this low enough to ensure files are not unnecessarily waiting. If you set
  this too high, no new transporters will be spawned, because all files will
  be queued on the existing transporters. Setting this to 0 can only be
  recommended in environments with a continuous stream of files that need
  syncing. The default of 1 is to ensure each transporter is idling as little
  as possible.
QUEUE_PROCESS_BATCH_SIZE = 20
  The number of files that will be processed when processing one of the many
  queues. Setting this too low will cause overhead. Setting this too high will
  cause delays for files that are ready to be processed or transported. See
  the "Pipeline design pattern" section in my bachelor thesis text.
CALLBACKS_CONSOLE_OUTPUT = False
  Controls whether output will be generated for each callback. (There are
  callbacks for the file system monitor, processor chains and transporters.)
CONSOLE_LOGGER_LEVEL = logging.WARNING
  Controls the output level of the logging to the console. For a full list of
  possibilities, see http://docs.python.org/release/2.6/library/logging.html#logging-levels.
FILE_LOGGER_LEVEL = logging.DEBUG
  Controls the output level of the logging to the console. For a full list of
  possibilities, see http://docs.python.org/release/2.6/library/logging.html#logging-levels.
RETRY_INTERVAL = 30
  Sets the interval in which the 'failed files' list is appended to the
  pipeline queue, to retry to sync these failed files.


Understanding persistent_data.db
--------------------------------
We'll go through this by using a sample database I created. You should be able
to reproduce similar output on your persistent_data.db file using the exact
same commands.
Access the database, by using the SQLite console application.
  $ sqlite3 persistent_data.db
  SQLite version 3.6.11
  Enter ".help" for instructions
  Enter SQL statements terminated with a ";"
  sqlite>

As you can see, there are three tables in the database, one for every
persistent data structure:
  sqlite> .table
  failed_files_list  pipeline_list      pipeline_queue

Simple count queries show how many items there are in each persistent data
structure. In this case for example, there are 2560 files waiting to enter the
pipeline, 50 were in the pipeline at the time of stopping File Conveyor (these
will be added to the queue again once we restart File Conveyor) and 0 files
are in the list of failed files. Files end up in there when their processor
chain or (one of) their transporters fails.
  sqlite> SELECT COUNT(*) FROM pipeline_queue;
  2560
  sqlite> SELECT COUNT(*) FROM pipeline_list;
  50
  sqlite> SELECT COUNT(*) FROM failed_files_list;
  0

You can also look at the database schemas of these tables:
  sqlite> .schema pipeline_queue
  CREATE TABLE pipeline_queue(id INTEGER PRIMARY KEY AUTOINCREMENT, item pickle);
  sqlite> .schema pipeline_list
  CREATE TABLE pipeline_list(id INTEGER PRIMARY KEY AUTOINCREMENT, item pickle);
  sqlite> .schema failed_files_list
  CREATE TABLE failed_files_list(id INTEGER PRIMARY KEY AUTOINCREMENT, item pickle);

As you can see, the three tables have identical schemas. the type for the
stored item is 'pickle', which means that you can store any Python object in
there as long as it can be "pickled", which means as much as "convertable to
a string representation". "Serialization" is the term PHP developers have
given to this, although pickling is much more advanced.
The Python object stored in there is the same for all three tables: a tuple of
the filename (as a string) and the event (as an integer). The event is one of
FSMonitor.CREATED, FSMonitor.MODIFIED, FSMonitor.DELETED.

This file is what tracks the curent state of File Conveyor. Thanks to this file,
it is possible for File Conveyor to crash and not lose any data.
Deleting this file would cause File Conveyor to lose all of its current work.
Only new (as in: after the file was deleted) changes in the file system would
be picked up. Changes that still had to be synced, would be forgotten.


Understanding fsmonitor.db
--------------------------
This database has a single table: pathscanner (which is inherited from the
pathscanner module around which the fsmonitor module is built). Its schema is:

  sqlite> .schema pathscanner
  CREATE TABLE pathscanner(path text, filename text, mtime integer);

This file is what tracks the current state of the directory tree associated
with each source. When an operating system's file system monitor is used, this
database will be updated through its callbacks. When no such file system
monitor is available, it will be updated through polling.
Deleting this file would cause File Conveyor to have to sync all files again.


Understanding synced_files.db
-----------------------------
We'll go through this by using a sample database I created. You should be able
to reproduce similar output on your synced_files.db file using the exact
same commands.
Access the database, by using the SQLite console application.
  $ sqlite3 synced_files.db 
  SQLite version 3.6.11
  Enter ".help" for instructions
  Enter SQL statements terminated with a ";"
  sqlite>
  
As you can see, there's only one table: synced_files.
  sqlite> .table
  synced_files

Let's look at the schema. There are 4 fields: input_file,
transported_file_basename, url and server. input_file is the full path.
transported_file_basename is the base name of the file that was transported to
the server. This is stored because the filename might have been altered by the
processors that have been applied to it, but the path cannot change. I use
this to delete the previous version of a file if a file has been modified. The
url field is of course the URL to retrieve the file from the server. Finally,
the server field contains the name you've assigned to the server in the
configuration file. Each file may be synced to multiple servers and this
allows you to check if a file has been synchronized to a specific server.
  sqlite> .schema synced_files
  CREATE TABLE synced_files(input_file text, transported_file_basename text, url text, server text);

We can again use simple count queries to learn more about the synced files. As
you can see, 845 files have been synced, of which 602 have been synced to a
the server that was named "origin pull cdn" and 243 to the server that was
named "ftp push cdn".
  sqlite> SELECT COUNT(*) FROM synced_files;
  845
  sqlite> SELECT COUNT(*) FROM synced_files WHERE server="origin pull cdn";
  602
  sqlite> SELECT COUNT(*) FROM synced_files WHERE server="ftp push cdn";
  243


License
-------
This application is dual-licensed under the GPL and the UNLICENSE.
  
Due to the dependencies that were initially included within File Conveyor,
which were all subject to GPL-compatible licenses, it made sense to initially
release the source code under the GPL.
Then, it was decided the UNLICENSE was a better fit.


Author
------
Wim Leers ~ http://wimleers.com/

This application was written as part of the bachelor thesis of Wim Leers at
Hasselt University.

fileconveyor's People

Contributors

andyshinn avatar benoitbryon avatar btubbs avatar chrisivens avatar davidseth avatar jacobsingh avatar niekoost avatar octplane avatar wimleers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fileconveyor's Issues

Sync files created/updated before install

Hello Wim,

I've setup fileconveyor and it is currently synching files to the S3. However, it only synchronizing files that have changed or have been updated after installing fileconveyor.

How do I sync files that created/updated before installing fileconveyor.

Thanks in advance for your help,

KH

Using S3 + Fileconveyor to store files and Cloudfront for Urls with CNAME

Hello,

As requested, I am posting this issue here.

I am currently synchronizing my files to S3 using Fileconveyor and need to use a subdomain with Cloudfront.

For Cloudfront I am using CNAME with http://images.example.com instead of the default distribution URL.

What or where do I need to change this so that the urls change from

http://example-images.s3.amazonaws.com to http://images.example.com

Thanks in advance for your help,

KH

Mapping fileconveyor to several static assets

Question: How can you achieve this concept:

  • all CSS assets mapped to [say] static0.domain.com
  • all JS assets mapped to [say] static1.domain.com
  • all images mapped to [say] static2.domain.com
  • all Flash mapped to [say] static3.domain.com

I tried several [logical] combination's as to what seem Logical (reading the code, doc's and thesis) - but NO good result on parallel asset divisions.

The [various] notes given do not speak [clearly] of the inner code design or code methods.
I have [therefore] resorted to hours of 'guess + trial work' to find the methods that yet work.

I [additionally] raise a note that those CSS assets that need to be re-queued for processing will [finally] miss out on having compression via the [yuicompressor] compression processor.
My testing indicates that all CSS assets that have survived the re-queue process of fileconveyor are eventually finalized as 'uncompressed' assets.

Please advise - as time permits.

Automatic start of fileconveyor at boot

I am trying to figure out how to run the daemon at (re)boot of (Ubuntu) Linux.

I put this line to /etc/rc.local:

/usr/bin/python /mnt/data-store/fileconveyor/code/arbitrator.py

After reboot, I can see the python script running, but /admin/reports/status claims "The daemon is currently not running."

When I start the daemon via terminal (python /mnt/data-store/fileconveyor/code/arbitrator.py) then it works OK.

Could you please advise the best way of running fileconveyor at (Ubuntu) Linux boot?

Thanks.

Newly created files that are immediately deleted may cause File Conveyor to crash

We have svn on a cron doing updates every 1 minute on a directory. That directory is successfully synced to amazon by fileconveyor, however, we keep seeing the following in the logs every few minutes. It borks fileconveyor completely requiring a cron to constantly keep restarting fileconveyor.

It seems that for whatever reason, fsmonitor isn't adhering to the ignoredDirs, however, those svn files are not being pushed to amazon.

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <!-- Sources -->
  <sources ignoredDirs="CVS:.svn">
    <source name="interface" scanPath="/pathtointerface"/>

Traceback (most recent call last):
  File "/pythonlocation/ActivePython-2.5.5.7-linux-x86_64/INSTALLDIR/lib/python2.5/threading.py", line 488, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-x86_64/egg/pyinotify.py", line 1415, in run
    self.loop()
  File "build/bdist.linux-x86_64/egg/pyinotify.py", line 1401, in loop
    self.process_events()
  File "build/bdist.linux-x86_64/egg/pyinotify.py", line 1185, in process_events
    watch_.proc_fun(revent)  # user processings
  File "build/bdist.linux-x86_64/egg/pyinotify.py", line 831, in __call__
    return _ProcessEvent.__call__(self, event)
  File "build/bdist.linux-x86_64/egg/pyinotify.py", line 562, in __call__
    return meth(event)
  File "/opt/fileconveyor/code/fsmonitor_inotify.py", line 164, in process_IN_MODIFY
    FSMonitor.trigger_event(self.fsmonitor_ref, event.path, event.pathname, FSMonitor.MODIFIED)
  File "/opt/fileconveyor/code/fsmonitor.py", line 142, in trigger_event
    self.callback(monitored_path, event_path, event)
  File "/opt/fileconveyor/code/arbitrator.py", line 880, in fsmonitor_callback
    if stat.S_ISDIR(os.stat(event_path)[stat.ST_MODE]):
OSError: [Errno 2] No such file or directory: '/somdir/.svn/tmp/entries'

New setting: deletion delay

When generated HTML is cached by the system that generates it, it may reference files on the CDN. Those files may have been updated and if they get assigned a unique filename by File Conveyor, the previous version should be deleted by File Conveyor. However, this deletion should be postponable for a certain amount of time (i.e. the maximum time that generated HTML may be cached), to prevent invalid resource URLs.

Files duplicating on source server

Hi Wim,
Regards of my issue that I opened on the other site. (files that are duplicating in my source server)

I actually followed your instructions and used the config file that you passed me. When I start the daemon, it did transfered files to my CDN.Server, and I have no issues there (at least, not yet). But the duplicates are only at the source server itself.

This is my current config.xml:

/home/drupal/public_html http://www.source-server.web/
<server name="ftp push cdn" transporter="ftp" maxConnections="5">
  <host>ftp.cdn-server.com</host>
  <username>username</username>
  <password>password</password>
  <url>http://www.cdn-server.com/</url>
</server>

Arbitrator not finding transporters when Django is installed

I am having trouble with the transporters being reported as not found.

2010-09-30 16:11:10,046 - Arbitrator - ERROR - The Transporter module 'transporters.transporter_cf' could not be found.
Consult the log file for details

The log shows:

2010-09-30 15:58:13,585 - Arbitrator - WARNING - Arbitrator is initializing.
2010-09-30 15:58:13,585 - Arbitrator - INFO - Loading config file.
2010-09-30 15:58:13,587 - Arbitrator.Config - INFO - Parsing sources.
2010-09-30 15:58:13,587 - Arbitrator.Config - INFO - Parsing servers.
2010-09-30 15:58:13,587 - Arbitrator.Config - INFO - Parsing rules.
2010-09-30 15:58:13,587 - Arbitrator - WARNING - Loaded config file.
2010-09-30 15:58:13,971 - Arbitrator - ERROR - The Transporter module 'transporters.transporter_cf' could not be found.

I am on Dreamhost, if that helps or hinders. Have a python virtualenv running 2.5.2 and have installed pyinotify 0.9.0 . .pyc files are being generated for the transporters I have tried...ftp and cf.

Any insight or ideas on things to check would be most welcome.

How to run daemon on debian4 (libc2.3.6 but 2.4 required)

It's a followup of http://drupal.org/node/613374

In fact this is only a problem in old debian installs. Now debian stable (5) has glibc2.7-1 but previous debian stable (4) had glibc 2.3.6.

I am chatting with dreamhost tech support checking whether it is possible to have some chroot jail for the daemon, but it seems setting up a chroot jail needs root perms.

I suppose the case will be extended. A website being hosted in some stable debian which is not updated to next stable release, following the philosophy "if it works don't touch it"

As you can see in http://trac.dbzteam.org/pyinotify/wiki/InstallPyinotify it needs at least Libc >= 2.4 so debian 4 can't use it

Performance and CPU usage

Hello. I have been using this daemon now with the Drupal CDN module for a few months on a test site. Right now, the stats on the Drupal reports page show 4019 files synced to Cloudfront, and 17995 files currently being synced.

One of my current challenges is system performance. For whatever reason, the arbitrator.py file starts to use huge amounts of CPU time - on the order of 11%-16% of the total CPU on the machine as determined by running PS and looking at the %CPU column. This generally happens within 1-2 minutes of starting the daemon. Needless to say this kills my overall system performance as I am serving both the Drupal DB and web from the same box.

I'm not terribly familiar with debugging python - any ideas as to what you might want to see to help diagnose the issue? Thanks in advance.

Getting Syncing Working

I've managed to get fileconveyor setup on Centos 5 and Drupal CDN integration working in advanced mode to the extent it's summarising correctly in debug output:

CDN integration statistics for node

* Total number of files on this page: 16.
* Number of files available on CDNs: 0 (0% coverage).
* Total time it took to look up the CDN URLs for these files: 2.57 ms, or 0.161 ms on average per file.
* The files that are not (yet?) synchronized to the CDN:
     1. themes/garland/logo.png

However, files are being added to the Sync queue, and when I run the Daemon in the console, it reports 0 files need to be synced.

What steps am I missing?

Config file is:

<!-- Sources -->
<sources ignoredDirs="CVS:.svn">
<source name="drupal" scanPath="/var/www/vhosts/default/httpdocs" documentRoot="/" basePath="/" />

</sources>

<!-- Servers -->
<servers>
<server name="Rackspace Cloud" transporter="mosso">
<username>XXXXX</username>
<api_key>XXXXX</api_key>
<container>XXXXX</container>
</server>
</servers>

<!-- Rules -->
<rules>
<rule for="drupal" label="CSS, JS, images and Flash">
<filter>
<paths>misc:profiles:modules:themes:sites/all:sites/default</paths>
<extensions>ico:js:css:gif:png:jpg:jpeg:svg:swf</extensions>
</filter>
<processorChain>
<processor name="image_optimizer.KeepFilename" />
<processor name="yui_compressor.YUICompressor" />
<processor name="google_closure_compiler.GoogleClosureCompiler" />
<processor name="link_updater.CSSURLUpdater" />
<processor name="unique_filename.Mtime" />
</processorChain>
<destinations>
<destination server="Rackspace Cloud" path="static" />
</destinations>
</rule>
</rules>

Any light that can be shed on this would be massively appreciated

Ctrl-C signal to stop

This command can take several minutes to complete and when restarting again I sometimes get a "DatabaseError: database disk image is malformed" error from sqlite. Any ideas?

Feedback

It would be very nice if the arbitrator could be more verbose it sits there for minutes taking 1% cpu and to me it looks like its doing nothing.

Setting up File Conveyor on Mac OSX, issues with Macports

I had spent quite a bit of time trying to get File Conveyor to work on a Mac using a Macports version of Python, Python 2.5.*. After getting all the dependencies installed for what I was doing, the conveyor wouldn't fully startup, but there was no errors as to why it wasn't completing. Macports Py25 was giving me crazy numbers for my mac version. Importing and check platform.mac_ver() was returning a tuple with:

('4294967306.4294967302.4294967300', ('', '', ''), '')

The conveyor was stopping after trying to return the fs monitor by checking my Mac version. After removing the Darwin check lines in fsmonitory.py, I ran the arbitrator and found out FSEvents wasn't installed, and isn't available for anything but Py26. I ended up reverting to Apple Python 2.6 - python_select python26-apple and trying again. I already had the majority of dependencies installed including the egg, and it wound up working perfectly.

Transporter not finding file

from sets import Set, ImmutableSet
2011-03-18 17:14:56,868 - Arbitrator - WARNING - Arbitrator is initializing.
2011-03-18 17:14:56,911 - Arbitrator - WARNING - Loaded config file.
2011-03-18 17:14:57,774 - Arbitrator - WARNING - Created 'symlink_or_copy' transporter for the 'bhaskar-laptop' server.
2011-03-18 17:14:57,774 - Arbitrator - WARNING - Server connection tests succesful!
2011-03-18 17:14:57,775 - Arbitrator - WARNING - Setup: created transporter pool for the 'bhaskar-laptop' server.
2011-03-18 17:14:57,780 - Arbitrator - WARNING - Setup: initialized 'pipeline' persistent queue, contains 0 items.
2011-03-18 17:14:57,782 - Arbitrator - WARNING - Setup: initialized 'files_in_pipeline' persistent list, contains 16 items.
2011-03-18 17:14:57,785 - Arbitrator - WARNING - Setup: initialized 'failed_files' persistent list, contains 169 items.
2011-03-18 17:14:57,842 - Arbitrator - WARNING - Setup: moved 16 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2011-03-18 17:14:57,931 - Arbitrator - WARNING - Moved 20 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2011-03-18 17:14:57,939 - Arbitrator - WARNING - Setup: connected to the synced files DB. Contains metadata for 18 previously synced files.
2011-03-18 17:14:58,166 - Arbitrator - WARNING - Setup: initialized FSMonitor.
2011-03-18 17:14:58,189 - Arbitrator - WARNING - Fully up and running now.
2011-03-18 17:14:59,285 - Arbitrator - WARNING - Created 'symlink_or_copy' transporter for the 'bhaskar-laptop' server.
2011-03-18 17:14:59,692 - Arbitrator - WARNING - Created 'symlink_or_copy' transporter for the 'bhaskar-laptop' server.
2011-03-18 17:14:59,788 - Arbitrator.Transporter - ERROR - The transporter 'SYMLINK_OR_COPY' has failed while transporting the file '/tmp/daemon/var/www/drupal/themes/pushbutton/tabs-option-hover-rtl.png' (action: 1). Error: '[Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/themes/pushbutton/tabs-option-hover-rtl.png''.
2011-03-18 17:14:59,789 - Arbitrator.Transporter - ERROR - The transporter 'SYMLINK_OR_COPY' has failed while transporting the file '/tmp/daemon/var/www/drupal/themes/pushbutton/arrow-prev-hover.png' (action: 1). Error: '[Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/themes/pushbutton/arrow-prev-hover.png''.
2011-03-18 17:14:59,904 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/drupal/misc/collapse.js'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/misc/collapse.js.tmp'.
2011-03-18 17:14:59,922 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/themes/pushbutton/tabs-option-hover-rtl.png'. Retrying later.
2011-03-18 17:14:59,927 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/themes/pushbutton/arrow-prev-hover.png'. Retrying later.
2011-03-18 17:14:59,931 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/misc/collapse.js'. Retrying later.
2011-03-18 17:15:00,137 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/drupal/misc/tableselect.js'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/misc/tableselect.js.tmp'.
2011-03-18 17:15:00,195 - Arbitrator.Transporter - ERROR - The transporter 'SYMLINK_OR_COPY' has failed while transporting the file '/tmp/daemon/var/www/drupal/themes/pushbutton/tabs-option-off.png' (action: 1). Error: '[Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/themes/pushbutton/tabs-option-off.png''.
2011-03-18 17:15:00,290 - Arbitrator.Transporter - ERROR - The transporter 'SYMLINK_OR_COPY' has failed while transporting the file '/tmp/daemon/var/www/drupal/misc/watchdog-error.png' (action: 1). Error: '[Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/misc/watchdog-error.png''.
2011-03-18 17:15:00,374 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/drupal/modules/locale/locale.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/modules/locale/locale.css.tmp'.
2011-03-18 17:15:00,381 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/misc/tableselect.js'. Retrying later.
2011-03-18 17:15:00,385 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/themes/pushbutton/tabs-option-off.png'. Retrying later.
2011-03-18 17:15:00,392 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/misc/watchdog-error.png'. Retrying later.
2011-03-18 17:15:00,400 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/drupal/modules/locale/locale.css'. Retrying later.
2011-03-18 17:15:00,809 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/drupal/modules/block/block.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/drupal/modules/block/block.css.tmp'.
^C2011-03-18 17:15:00,954 - Arbitrator - WARNING - Signaling to stop.
2011-03-18 17:15:01,005 - Arbitrator - WARNING - Stopping.
2011-03-18 17:15:01,141 - Arbitrator - WARNING - Stopped FSMonitor.
^C2011-03-18 17:15:01,698 - Arbitrator - WARNING - Stopped transporters for the 'bhaskar-laptop' server.
2011-03-18 17:15:01,698 - Arbitrator - WARNING - 'pipeline' persistent queue contains 0 items.
2011-03-18 17:15:01,699 - Arbitrator - WARNING - 'files_in_pipeline' persistent list contains 29 items.
2011-03-18 17:15:01,699 - Arbitrator - WARNING - 'failed_files' persistent list contains 156 items.
2011-03-18 17:15:01,700 - Arbitrator - WARNING - synced files DB contains metadata for 18 synced files.
2011-03-18 17:15:01,703 - Arbitrator - WARNING - Signaling to stop.

Here is my config.xml:

2
3
4
5
6
7
8
9
10
11
12 /var/www/daemontest
13 http://bhaskar-laptop/daemontest
14
15
24
25
26
27
28
29
30 misc:profiles:modules:themes:sites/all:sites/default
31 ico:js:css:gif:png:jpg:jpeg:svg:swf
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 modules:misc
49 flv:mov:avi:wmv
50 1000000
51
52
53

55
56
57
58
59
60
61
62 mov:avi:mkv
63 ./([a-zA-Z- ])+720([a-zA-Z-_ ])_.[a-zA-Z]{3}$
64
65
66
67
68
69
70
71

Kindly help.

Cheers!
Bhaskar

processsor 'link_updater.CSSURLUpdater' has failed while processing the file

  • The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/www/sites/test.com/httpdocs/components/com_multinotify/assets/css/multinotify.css'. Exception class: <type 'exceptions.ValueError'>. Message: invalid literal for int() with base 16: 'rr'.

It seems that it cant process my css file...

the multinotify.css file is

div.mnSubscriptionContainer {margin:2px 0;padding:2px 8px;background:#eee;border:1px solid #ccc;min-height:63px;}
div.mnSubscriptionContainer h3 {margin-top:2px;}
div.mnSubscriptionContainer .description {padding:0;}
div.mnSubscriptionContainer .subscribers {float:right;width:auto;height:40px;background:#fff;border:1px solid #ccc;margin:0;padding:17px 4px 8px 4px;font-size:14px;}
div.mnSubscriptionContainer .subscribers span {display:block;width:100%;text-align:center;}
div.mnSubscriptionContainer .subscribers span.number {margin:1px auto;font-weight:bold;}
span.number.xs {font-size:10px;}
span.number.s {font-size:20px;}
span.number.l {font-size:25px;}
span.number.xl {font-size:30px;}
div.mnSubscriptionContainer .multinotifyFormContainer {}
form.mnSubscriptionForm {}
.mnSusbcriptionFormLog {display:block;color:#4c8bc1;font-weight:bold; margin-top:4px; padding:8px 0 8px 30px}
.notice { background:url(/components/com_multinotify/assets/images/info.png) no-repeat scroll 0 50% transparent;border-bottom:2px solid; border-top:2px solid;}
form.mnSubscriptionForm label {float:none;margin-right:10px;margin-top:8px;}
ul.mnUnauthorized {background:url(/components/com_multinotify/assets/images/warning.png) no-repeat scroll 0 50% transparent; display:block;color:#e83030;border-bottom:2px solid; border-top:2px solid; margin:25px 0 0; padding:8px}
li.mnUnauthorized { list-style:none; padding-left:28px;font-weight:bold; }
span.mnUnauthorized {display:block;color:#red;font-weight:bold; text-align:center; border-bottom:2px solid; border-top:2px solid; margin-top:4px; padding:8px}

No such file or directory error for CSS files.

2010-02-15 15:32:53,113 - Arbitrator.ProcessorChain -
ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/themes/bluemarine/style.css'.
Exception class: <type 'exceptions.IOError'>.
Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/themes/bluemarine/style.css.tmp'.
2010-02-15 15:32:54,197 - Arbitrator.ProcessorChain -
ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/themes/bluemarine/style-rtl.css'.
Exception class: <type 'exceptions.IOError'>.
Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/themes/bluemarine/style-rtl.css.tmp'.

Here are the rules part of my config.xml




misc:profiles:modules:themes:sites
ico:js:css:gif:png:jpg:jpeg:svg:otf:ttf:swf
CVS:.svn












Using S3Fox I see png and gif files are being uploaded from the base theme directory, just no css files ATM. When I go to /tmp/daemon/var/www/html/themes/bluemarine there are no files listed.

Can't authenticate to cloud files

Hey,
I currently have my config.xml set up as so to try and use Rackspace Cloudfiles:


brightlemon
<api_key>***</api_key>
kinetick

But when I run arbitrator all I see is this:
/var/fileconveyor/filter.py:10: DeprecationWarning: the sets module is deprecated
from sets import Set, ImmutableSet
2011-03-10 13:47:03,703 - Arbitrator - WARNING - Arbitrator is initializing.
2011-03-10 13:47:03,707 - Arbitrator - WARNING - Loaded config file.
2011-03-10 13:47:04,650 - Arbitrator - ERROR - Could not start transporter 'mosso'. Error: 'Authentication failed'.
2011-03-10 13:47:04,650 - Arbitrator - ERROR - Server connection tests: could not connect with 1 servers.
<class 'main.ServerConnectionTestError'> Consult the log file for details.

I've checked the username and password and they're definitely correct. I tried changing the auth url in consts.py in the cloudfiles lirbary to the url suggested for the UK cloudfiles servers. I've tried updating the cloudfiles library to the latest version. Nothing seems to work. Is there some crucial step which I'm missing?

Sync with django-storages or provide installation instructions

django-storages has had quite an overhaul over the last few months. We should update the code that's included with File Conveyor to the latest of django-storages. Alternatively, we should no longer include that in File Conveyor, but add installation instructions to install django-storages. After all, django-storages has been updated to work with easy_install, so installation should no longer be an issue.

config.xml settings for Fileconveyor on CentOs and Drupal multi-sites

Hi Wim,

I'm testing out fileconveyor with the Drupal CDN module.

I've setup the fileconveyor daemon on our CentOs vps with ftp push to a Cachefly CDN account. When I run the arbitrator file, the daemon does not appear to be picking up any of the files of the Drupal multi-site source I provided in the config.xml file, which leads me to think I have not configured this correctly.

Our drupal core is located at: /home/devmojah/domains/mojahflow/
Multi-sites are here: /home/devmojah/domains/mojahflow/sites/sitename.com

Excerpt from config.xml

<sources ignoredDirs="CVS:.svn">
<source name="MojahFlow" scanPath="/home/devmojah/domains/mojahflow/sites/flow.mojahmedia.net" documentRoot="/home/devmojah/domains/mojahflow/sites/flow.mojahmedia.net" basepath="/home/devmojah/domains/mojahflow/sites/flow.mojahmedia.net/" />
</sources>

Below is the terminal output. No files are sent to the CDN account.

[root@host code]# python arbitrator.py -v
2011-02-16 14:12:47,743 - Arbitrator - WARNING - Arbitrator is initializing.
2011-02-16 14:12:47,746 - Arbitrator - WARNING - Loaded config file.
2011-02-16 14:12:48,007 - Arbitrator - WARNING - Created 'ftp' transporter for the 'ftp push cdn' server.
2011-02-16 14:12:48,161 - Arbitrator - WARNING - Server connection tests succesful!
2011-02-16 14:12:48,186 - Arbitrator - WARNING - Setup: created transporter pool for the 'ftp push cdn' server.
2011-02-16 14:12:48,187 - Arbitrator - WARNING - Setup: initialized 'pipeline' persistent queue, contains 0 items.
2011-02-16 14:12:48,187 - Arbitrator - WARNING - Setup: initialized 'files_in_pipeline' persistent list, contains 0 items.
2011-02-16 14:12:48,188 - Arbitrator - WARNING - Setup: initialized 'failed_files' persistent list, contains 0 items.
2011-02-16 14:12:48,188 - Arbitrator - WARNING - Setup: moved 0 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2011-02-16 14:12:48,189 - Arbitrator - WARNING - Setup: connected to the synced files DB. Contains metadata for 0 previously synced files.
2011-02-16 14:12:48,224 - Arbitrator - WARNING - Setup: initialized FSMonitor.
2011-02-16 14:12:48,224 - Arbitrator - WARNING - Fully up and running now

FileConveyor corrupt

Hi, I am attempting to download and install file conveyor. When I unpack the file conveyor files i get this error message from winrar: ! C:\Users\Chris\AppData\Local\Temp\Rar$DI00.383\wimleers-fileconveyor-c472b96.tar-1: The archive is corrupt

Is there an error in the file or is it safe? I can't wait to get started :).

NameError: name 'ProcessEvent' is not defined

File "/var/www/sites/all/modules/cdn/wimleers-fileconveyor-cfe98cc/code/fsmonitor_inotify.py", line 143, in
class FSMonitorInotifyProcessEvent(ProcessEvent):
NameError: name 'ProcessEvent' is not defined

Above error appears when i try to run arbitrator.py
Anyone know what could be the problem?

The Processor module 'link_updater' could not be found

I so close to getting everything working - CDN for Drupal to CloudFiles on Rackspace.. The files are in sync - but I'm getting the error in log:
2011-03-21 18:09:15,074 - Arbitrator - ERROR - The Processor module
'link_updater' could not be found.

I installed the egg -
[root@drupal www]# easy_install cssutils-0.9.7-py2.6.egg
and
[root@drupal processors]# ls link_update*
link_updater.py link_updater.pyc
are there from the install tar -

I have the following in my config.xml



misc:profiles:modules:themes:sites/all:sites/default
ico:js:css:gif:png:jpg:jpeg:svg:swf











Segmentation fault

If I have
processor name="yui_compressor.YUICompressor" />
Or
processor name="link_updater.CSSURLUpdater" />
Enabled I get a segfault when It tries to do css files

[root@tdpl01 www]# python /opt/fileconveyor/code/arbitrator.py
2010-02-16 11:02:48,141 - Arbitrator - WARNING - Arbitrator is initializing.
2010-02-16 11:02:48,145 - Arbitrator - WARNING - Loaded config file.
2010-02-16 11:02:49,671 - Arbitrator - WARNING - Created 's3' transporter for the 'amazon' server.
2010-02-16 11:02:49,671 - Arbitrator - WARNING - Server connection tests succesful!
2010-02-16 11:02:49,672 - Arbitrator - WARNING - Setup: created transporter pool for the 'amazon' server.
2010-02-16 11:02:49,675 - Arbitrator - WARNING - Setup: initialized 'pipeline' persistent queue, contains 3159 items.
2010-02-16 11:02:49,675 - Arbitrator - WARNING - Setup: initialized 'files_in_pipeline' persistent list, contains 0 items.
2010-02-16 11:02:49,676 - Arbitrator - WARNING - Setup: initialized 'failed_files' persistent list, contains 0 items.
2010-02-16 11:02:49,677 - Arbitrator - WARNING - Setup: moved 0 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:02:49,677 - Arbitrator - WARNING - Setup: connected to the synced files DB. Contains metadata for 0 previously synced files.
2010-02-16 11:02:49,703 - Arbitrator - WARNING - Setup: initialized FSMonitor.
2010-02-16 11:02:49,705 - Arbitrator - WARNING - Fully up and running now.
2010-02-16 11:03:03,681 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/beta.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/beta.css.tmp'.
2010-02-16 11:03:06,056 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/beta.css'. Retrying later.
2010-02-16 11:03:06,133 - Arbitrator - WARNING - Moved 1 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:03:10,770 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_d8467684070712aa24fb41710f86a722.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_d8467684070712aa24fb41710f86a722.css.tmp'.
2010-02-16 11:03:13,154 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_92b682927d04e250fb8faf33cd33f05b.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_92b682927d04e250fb8faf33cd33f05b.css.tmp'.
2010-02-16 11:03:13,235 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_d8467684070712aa24fb41710f86a722.css'. Retrying later.
2010-02-16 11:03:15,749 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_27835072877dab98249ec2d132f020c6.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_27835072877dab98249ec2d132f020c6.css.tmp'.
2010-02-16 11:03:15,826 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_92b682927d04e250fb8faf33cd33f05b.css'. Retrying later.
2010-02-16 11:03:15,906 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_27835072877dab98249ec2d132f020c6.css'. Retrying later.
2010-02-16 11:03:18,181 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_601f2b0f178b0af6c5ae4b4bd0383ae9.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_601f2b0f178b0af6c5ae4b4bd0383ae9.css.tmp'.
2010-02-16 11:03:20,416 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_7f3fffb78cf8bca305dca7978cbe72b3.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_7f3fffb78cf8bca305dca7978cbe72b3.css.tmp'.
2010-02-16 11:03:20,502 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_601f2b0f178b0af6c5ae4b4bd0383ae9.css'. Retrying later.
2010-02-16 11:03:20,574 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_7f3fffb78cf8bca305dca7978cbe72b3.css'. Retrying later.
2010-02-16 11:03:23,077 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_31e5cfe772696cf907fddd05135f23dd.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_31e5cfe772696cf907fddd05135f23dd.css.tmp'.
2010-02-16 11:03:25,340 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_044c0d076b3ab539204a46131b7ff073.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_044c0d076b3ab539204a46131b7ff073.css.tmp'.
2010-02-16 11:03:25,455 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_31e5cfe772696cf907fddd05135f23dd.css'. Retrying later.
2010-02-16 11:03:27,994 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_9a54202b27a34bf0cb077678c58f63ba.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/localnet.datasphere.com/files/css/css_9a54202b27a34bf0cb077678c58f63ba.css.tmp'.
2010-02-16 11:03:28,076 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_044c0d076b3ab539204a46131b7ff073.css'. Retrying later.
2010-02-16 11:03:30,381 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base.css.tmp'.
2010-02-16 11:03:30,464 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_9a54202b27a34bf0cb077678c58f63ba.css'. Retrying later.
2010-02-16 11:03:30,552 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base.css'. Retrying later.
2010-02-16 11:03:32,730 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/layout.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/layout.css.tmp'.
2010-02-16 11:03:34,565 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/print.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/print.css.tmp'.
2010-02-16 11:03:34,651 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/layout.css'. Retrying later.
2010-02-16 11:03:34,733 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/print.css'. Retrying later.
2010-02-16 11:03:36,618 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base-fresh.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base-fresh.css.tmp'.
2010-02-16 11:03:37,386 - Arbitrator - WARNING - Moved 11 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:03:38,792 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/html-elements.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/html-elements.css.tmp'.
2010-02-16 11:03:38,874 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/hyperlocal_base-fresh.css'. Retrying later.
2010-02-16 11:03:39,878 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/hyperlocal_base/drupal6-reference.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/hyperlocal_base/drupal6-reference.css.tmp'.
2010-02-16 11:03:39,961 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/html-elements.css'. Retrying later.
2010-02-16 11:03:40,033 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/hyperlocal_base/drupal6-reference.css'. Retrying later.
2010-02-16 11:03:41,189 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/zen/zen_classic/layout-garland.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/zen/zen_classic/layout-garland.css.tmp'.
2010-02-16 11:03:42,183 - Arbitrator.ProcessorChain - ERROR - The processsor 'yui_compressor.YUICompressor' has failed while processing the file '/var/www/html/sites/all/themes/zen/zen_classic/ie.css'. Exception class: <type 'exceptions.IOError'>. Message: [Errno 2] No such file or directory: '/tmp/daemon/var/www/html/sites/all/themes/zen/zen_classic/ie.css.tmp'.
Segmentation fault

[root@tdpl01 www]# python /opt/fileconveyor/code/arbitrator.py
2010-02-16 11:03:51,765 - Arbitrator - WARNING - Arbitrator is initializing.
2010-02-16 11:03:51,769 - Arbitrator - WARNING - Loaded config file.
2010-02-16 11:03:53,338 - Arbitrator - WARNING - Created 's3' transporter for the 'amazon' server.
2010-02-16 11:03:53,338 - Arbitrator - WARNING - Server connection tests succesful!
2010-02-16 11:03:53,339 - Arbitrator - WARNING - Setup: created transporter pool for the 'amazon' server.
2010-02-16 11:03:53,341 - Arbitrator - WARNING - Setup: initialized 'pipeline' persistent queue, contains 2746 items.
2010-02-16 11:03:53,342 - Arbitrator - WARNING - Setup: initialized 'files_in_pipeline' persistent list, contains 43 items.
2010-02-16 11:03:53,343 - Arbitrator - WARNING - Setup: initialized 'failed_files' persistent list, contains 4 items.
2010-02-16 11:03:56,145 - Arbitrator - WARNING - Setup: moved 43 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:03:56,413 - Arbitrator - WARNING - Moved 4 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:03:56,414 - Arbitrator - WARNING - Setup: connected to the synced files DB. Contains metadata for 0 previously synced files.
2010-02-16 11:03:56,439 - Arbitrator - WARNING - Setup: initialized FSMonitor.
2010-02-16 11:03:56,443 - Arbitrator - WARNING - Fully up and running now.
2010-02-16 11:04:01,656 - Arbitrator - WARNING - Created 's3' transporter for the 'amazon' server.
2010-02-16 11:04:04,096 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/corkboard-fresh.css'.
2010-02-16 11:04:04,180 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/html-elements.css'.
2010-02-16 11:04:07,056 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/all/themes/corkboard/blocks.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 11:04:08,815 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/layout.css'.
2010-02-16 11:04:08,899 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/corkboard/blocks.css'. Retrying later.
2010-02-16 11:04:15,077 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/drupal6-reference.css'.
2010-02-16 11:04:15,433 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/all/themes/corkboard/corkboard.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 11:04:17,893 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/sidebars.css'.
2010-02-16 11:04:17,972 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/corkboard/corkboard.css'. Retrying later.
2010-02-16 11:04:22,676 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/all/themes/bulletin_board/bulletin_board.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 11:04:24,819 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/corkboard/print.css'.
2010-02-16 11:04:24,887 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/all/themes/bulletin_board/bulletin_board.css'. Retrying later.
2010-02-16 11:04:27,594 - Arbitrator - WARNING - Moved 3 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 11:04:29,974 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/bulletin_board/views.css'.
2010-02-16 11:04:32,385 - Arbitrator - WARNING - Synced: '/var/www/html/sites/all/themes/bulletin_board/bulletin_board-fresh.css'.
Segmentation fault

S3/CF don't work with European endpoints

Hi, I'm getting an error message about not being able to connect when using the s3 or cf transporters. The log is below. Any ideas on what might be causing this? At first I thought it might be a network issue, but I'm able to access the S3 bucket using the same settings from the same machine when using a different script...

Any help is appreciated.

2010-12-14 15:16:05,137 - Arbitrator - WARNING - Arbitrator is initializing.
2010-12-14 15:16:05,137 - Arbitrator - INFO - Loading config file.
2010-12-14 15:16:05,139 - Arbitrator.Config - INFO - Parsing sources.
2010-12-14 15:16:05,139 - Arbitrator.Config - INFO - Parsing servers.
2010-12-14 15:16:05,139 - Arbitrator.Config - INFO - Parsing rules.
2010-12-14 15:16:05,140 - Arbitrator - WARNING - Loaded config file.
2010-12-14 15:17:08,871 - Arbitrator - ERROR - Could not start transporter 'cf'. Error: '[Errno -2] Name or service not known'.
2010-12-14 15:17:08,872 - Arbitrator - ERROR - Server connection tests: could not connect with 1 servers.

pathscanner.py error "You must not use 8-bit bytestrings ..."

At one point during the run arbitrator.py the process halts with the error message. I googled for "You must not use 8-bit bytestrings" but cannot figure out what's wrong. Running Python 2.6.5.

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/mnt/data-store/fileconveyor/code/fsmonitor_inotify.py", line 107, in run
self.__process_queues()
File "/mnt/data-store/fileconveyor/code/fsmonitor_inotify.py", line 132, in __process_queues
self.__add_dir(path, event_mask)
File "/mnt/data-store/fileconveyor/code/fsmonitor_inotify.py", line 58, in __add_dir
FSMonitor.generate_missed_events(self, path, event_mask)
File "/mnt/data-store/fileconveyor/code/fsmonitor.py", line 121, in generate_missed_events
for event_path, result in self.pathscanner.scan_tree(path):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 236, in scan_tree
for subpath, subresult in self.scan_tree(os.path.join(path, filename)):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 236, in scan_tree
for subpath, subresult in self.scan_tree(os.path.join(path, filename)):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 236, in scan_tree
for subpath, subresult in self.scan_tree(os.path.join(path, filename)):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 236, in scan_tree
for subpath, subresult in self.scan_tree(os.path.join(path, filename)):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 236, in scan_tree
for subpath, subresult in self.scan_tree(os.path.join(path, filename)):
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 223, in scan_tree
result = self.scan(path)
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 198, in scan
self.add_files(files)
File "/mnt/data-store/fileconveyor/code/pathscanner.py", line 126, in add_files
self.dbcur.execute("INSERT INTO %s VALUES(?, ?, ?)" % (self.table), row)
ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

link_updater.py code fault with Python >=2.6

The /fileconveyor/code/processors file [link_updater.py] needs a change to work correctly with Python versions greater than 2.5+:

Line # 39 of link_updater.py needs to change from:

parser = CSSParser(log=None, loglevel=logging.critical)

to:

parser = CSSParser(log=None, loglevel=logging.CRITICAL)

I have tested this with and without the above change.
It works fine with the correct setLevel syntax, but gives ERROR with out the change!

/tmp/daemon consumes 100% of available disk space when files are updated frequently

I was surprised when /tmp/daemon filled up my disk partition (causing mysql to stop responding). This happened because my client was uploading new videos to a folder being synced by fileconveyor. I configured fileconveyor with unique_filename.Mtime, so every time the FTP daemon wrote new data to the file, fileconveyor made a copy in /tmp/daemon.

A quick fix is to add a note to the README: don't upload directly to sync folders. A better approach might be to have a delay between change notification and kicking off the processor.

sqlite error: no such table: synced_files

Now that I have java installed I can run link_updater. Problem is it's throwing errors.

[root@tdpl01 www]# python /opt/fileconveyor/code/arbitrator.py
2010-02-16 14:26:36,712 - Arbitrator - WARNING - Arbitrator is initializing.
2010-02-16 14:26:36,715 - Arbitrator - WARNING - Loaded config file.
2010-02-16 14:26:37,949 - Arbitrator - WARNING - Created 's3' transporter for the 'amazon' server.
2010-02-16 14:26:37,949 - Arbitrator - WARNING - Server connection tests succesful!
2010-02-16 14:26:37,949 - Arbitrator - WARNING - Setup: created transporter pool for the 'amazon' server.
2010-02-16 14:26:38,018 - Arbitrator - WARNING - Setup: initialized 'pipeline' persistent queue, contains 0 items.
2010-02-16 14:26:38,078 - Arbitrator - WARNING - Setup: initialized 'files_in_pipeline' persistent list, contains 0 items.
2010-02-16 14:26:38,139 - Arbitrator - WARNING - Setup: initialized 'failed_files' persistent list, contains 0 items.
2010-02-16 14:26:38,139 - Arbitrator - WARNING - Setup: moved 0 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2010-02-16 14:26:38,231 - Arbitrator - WARNING - Setup: connected to the synced files DB. Contains metadata for 0 previously synced files.
2010-02-16 14:26:38,257 - Arbitrator - WARNING - Setup: initialized FSMonitor.
2010-02-16 14:26:38,260 - Arbitrator - WARNING - Fully up and running now.
2010-02-16 14:30:28,838 - Arbitrator - WARNING - Created 's3' transporter for the 'amazon' server.
2010-02-16 14:30:52,716 - Arbitrator - WARNING - Synced: '/var/www/html/sites/localnet.datasphere.com/files/beta.css'.
2010-02-16 14:37:08,326 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_d8467684070712aa24fb41710f86a722.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 14:42:59,142 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_d8467684070712aa24fb41710f86a722.css'. Retrying later.
2010-02-16 14:42:59,373 - Arbitrator - WARNING - Moved 1 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 14:47:24,782 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_27835072877dab98249ec2d132f020c6.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 14:52:34,148 - Arbitrator - WARNING - Synced: '/var/www/html/sites/localnet.datasphere.com/files/css/css_92b682927d04e250fb8faf33cd33f05b.css'.
2010-02-16 14:52:34,572 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_27835072877dab98249ec2d132f020c6.css'. Retrying later.
2010-02-16 14:52:34,918 - Arbitrator - WARNING - Moved 1 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 14:52:41,477 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_601f2b0f178b0af6c5ae4b4bd0383ae9.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.
2010-02-16 14:58:22,648 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/html/sites/localnet.datasphere.com/files/css/css_601f2b0f178b0af6c5ae4b4bd0383ae9.css'. Retrying later.
2010-02-16 14:58:22,774 - Arbitrator - WARNING - Moved 1 items from the 'failed_files' persistent list into the 'pipeline' persistent queue.
2010-02-16 15:00:02,554 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/html/sites/localnet.datasphere.com/files/css/css_31e5cfe772696cf907fddd05135f23dd.css'. Exception class: <class 'sqlite3.OperationalError'>. Message: no such table: synced_files.

excluding path

Is it possible to exclude specific path from the scan? Could I use the <pattern /> for this ? Or what is pattern for?
Thanks ..
Marc

Warning when daemon starts

When I'm starting the daemon with:

$  python /opt/fileconveyor/code/arbitrator.py

I getting am error/warning message right at the beginning of the console output:

/opt/fileconveyor/code/filter.py:10: DeprecationWarning: the sets module is deprecated
  from sets import Set, ImmutableSet

Installation Issue

Hi there.

I've had an issue since I setup fileconveyor to use with the Drupal CDN integration mod.

I followed Mike's instructions on installing on CentOS, however once that is done, you need to start the daemon. I couldn't work out which one to run, so tried a couple of the different Python scripts, and fsmonitor gives me the error:
"Using class <class 'fsmonitor_inotify.FSMonitorInotify'>
[Pyinotify ERROR] add_watch: cannot watch /Users/wimleers/Downloads (WD=-1)"

Config.xml has been setup, and to be sure I copied it to the fileconveyor folder and removed the config, since this directory is referenced in the sample. I still get the same error.

The install.txt is a little vague when it comes to running the fileconveyor and setting up the synced files database. I'm sure it makes sense to some people but I can't work it out...

Thanks!

How to empty/reset the fileconveyor database

Hi,

While configuring your CDN with fileconveyor on sites I often need to re-start with a clean slate. I usually kill the process (kill -9 pid) and change the CDN folder name in config.xml (to start with an empty one as I am not sure whether orphaned files on CDN are ever deleted by fileconveyor).

However, when I start arbitrator.py again, the status page (/admin/reports/status) still contains the old setup information.
This is certainly coming from the fileconveyor's sqlite database and I don't know how to clear it quickly, so I delete all files in the fileconveyor's code directory and upload them again from local machine.

  1. How can I avoid this and just remove/empty the slqite db?

  2. Ultimately, it would be lovely to have a button for this on the CDN settings page in Drupal - what do you think?

Thanks!

AttributeError: 'Event' object has no attribute 'path'


Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-i686/egg/pyinotify.py", line 1348, in run
    self.loop()
  File "build/bdist.linux-i686/egg/pyinotify.py", line 1334, in loop
    self.process_events()
  File "build/bdist.linux-i686/egg/pyinotify.py", line 1128, in process_events
    self._default_proc_fun(revent)
  File "build/bdist.linux-i686/egg/pyinotify.py", line 810, in __call__
    return _ProcessEvent.__call__(self, event)
  File "build/bdist.linux-i686/egg/pyinotify.py", line 544, in __call__
    return meth(event)
  File "/var/www/ef/fileconveyor/code/fsmonitor_inotify.py", line 177, in process_IN_Q_OVERFLOW
    if FSMonitor.is_in_ignored_directory(self.fsmonitor_ref, event.path):
AttributeError: 'Event' object has no attribute 'path'

Any ideas why Event has no path?
Thanks

Store daemon.pid in standard location

daemon.pid gets created in your current working directory. Drupal looks for it in the same path as synced_files.db. daemon.pid should be written to the same dir as synced_files.db.

ftp / mosso cloudfiles config file

Hi I am really interested in getting fileconveyor to work and would like a little assistance with syncing of files. I beleieve my config.xml file may have an error in it because the daemons out put is kinda funky

from sets import Set, ImmutableSet 2010-11-12 17:29:51,438 - Arbitrator - WARNING - Arbitrator is initializing. 2010-11-12 17:29:51,454 - Arbitrator - WARNING - Loaded config file. 2010-11-12 17:29:51,733 - Arbitrator - WARNING - Created 'mosso' transporter for the 'blog push cloudfiles' server. Traceback (most recent call last): File "arbitrator.py", line 988, in arbitrator = Arbitrator(os.path.join(sys.path[0], "config.xml")) File "arbitrator.py", line 158, in **init** transporter = self.**create_transporter(server) File "arbitrator.py", line 835, in __create_transporter transporter = transporter_class(settings, self.transporter_callback, self.transporter_error_callback, "Arbitrator") File "/opt/fileconveyor/code/transporters/transporter_ftp.py", line 17, in __init** Transporter.**init**(self, settings, callback, error_callback, parent_logger) File "/opt/fileconveyor/code/transporters/transporter.py", line 57, in **init** self.validate_settings() File "/opt/fileconveyor/code/transporters/transporter.py", line 135, in validate_settings raise MissingSettingError transporters.transporter.MissingSettingError

here is the config file that I am using

flaunt.com root ******_ flauntmg **_*** flaunt2 /var/www/sites/default/files mp4 10000000

Let me know if I can add anything that would make easy to understand or if this makes any sense at all.

thanks

Could not connect to persistent data database

Hi,

I've been trying for 2 days to get this working with S3 on Mercury Pressflow 6.17.

In config.xml, specified access_key_id, secret_access_key, bucket_name (not distro_domain_name because then arbitrator threw an error); commented out the processorChain (threw errors, not needed now anyway), etc.

arbitrator.py seems to run fine, synces the files, I can see them in the S3 bucket (in folder 'static' which is because I left the default destination path="static", but I guess that's no harm).

But there are no files served from the CDN and the status page says:

CDN integration Enabled – advanced mode
CDN integration is enabled for all users.
* Could not connect to persistent data database.
* The synced files database exists.
* The synced files database is readable.
* The daemon is currently running.
CDN integration — Drupal core patch Applied
CDN integration — ImageCache module patch Not or incompletely applied.

Note I did not bother about the ImageCache patch since I am not using that module. I first want to see .css and other files served from the CDN.

fileconveyor lives in /home/wimleers-fileconveyor/ (is that wrong location?) I did not know what to set in /admin/settings/cdn/advanced (I found no documentation about it) but then I found that there is synced_files.db in the install dir, so I set the path to /home/wimleers-fileconveyor/code/synced_files.db and after saving it there was a confirmation "The synced files database was found and can be opened for reading."

Please, please help.

(Note there is an issue about this problem opened also at http://drupal.org/node/845936 but I know you don't want to deal with fileconveyor issues over there.)

Arbitrator keeps rejecting transporters

This is the error

ERROR - The Transporter module 'transporters.transporter_ftp' could not be found.

May it be caused by some problem in the configuration? When I check transporters directory I see ftp has pyc extension so I understand it has been found and byte compiled.

Exception in thread Error

Hello,
I am getting the following error after fileconveyor runs and syncs a couple of files. There is enough disk space and the synced_files.db is currently at 1.4 MB and the persistent_data.db is at 1.38MB...

Exception in thread Thread-1:
Traceback (most recent call last):
File "/opt/python2.5/lib/python2.5/threading.py", line 486, in __bootstrap_inner
self.run()
File "/opt/fileconveyor/code/arbitrator.py", line 284, in run
self.__process_discover_queue()
File "/opt/fileconveyor/code/arbitrator.py", line 340, in __process_discover_queue
self.pipeline_queue.put((input_file, event))
File "/opt/fileconveyor/code/persistent_queue.py", line 69, in put
self.dbcur.execute("INSERT INTO %s (item) VALUES(?)" % (self.table), (cPickle.dumps(item), ))
OperationalError: database or disk is full

Is there a specific way one needs to do to keep the fileconveyor running? I am currently starting it from the command line as root, but it looks like once I disconnect from the terminal the fileconveyor stops running.

Thanks in advance for your help,

KH

Path correction in db

Hello & thanks for this great work !
I have a little annoying problem:
I wrongly configured config.xml, when I realize my mistake I correct it and relaunch fileconveyor. But since 20G have already been transfered I don't want to re-upload everything.
cssurlupdater keep tring to upload

// instead of / seems to cause the problem :
/var/www/pressflow/sites/all/themes/courrier/css//images/arrow_btn.png
but
/var/www/pressflow/sites/all/themes/courrier/css/images/arrow_btn.png exists on S3.

How could I erase/modify/update that entry ?
++
Marc

Problem on daemon startup - exception in inotify?

Hi,

I am in the process of installing the daemon as part of the CDN Integration module for Drupal. Although I believe I have everything configured correctly, and I have installed the requisite dependencies, I can not get the daemon to start on Ubuntu 8.04. It works on my Mac, but unfortunately that is not my server. Here is the output:

dev:~/bin/cdn/daemon> python ./arbitrator.py
2009-12-29 15:30:01,271 - Arbitrator                - WARNING  - Arbitrator is initializing.
2009-12-29 15:30:01,274 - Arbitrator                - WARNING  - Loaded config file.
2009-12-29 15:30:01,276 - Arbitrator                - WARNING  - Created 'symlink_or_copy' transporter for the 'origin pull cdn' server.
2009-12-29 15:30:01,302 - Arbitrator                - WARNING  - Created 'ftp' transporter for the 'ftp push cdn' server.
2009-12-29 15:30:01,302 - Arbitrator                - WARNING  - Server connection tests succesful!
2009-12-29 15:30:01,303 - Arbitrator                - WARNING  - Setup: created transporter pool for the 'origin pull cdn' server.
2009-12-29 15:30:01,303 - Arbitrator                - WARNING  - Setup: created transporter pool for the 'ftp push cdn' server.
2009-12-29 15:30:01,304 - Arbitrator                - WARNING  - Setup: initialized 'pipeline' persistent queue, contains 0 items.
2009-12-29 15:30:01,305 - Arbitrator                - WARNING  - Setup: initialized 'files_in_pipeline' persistent list, contains 0 items.
2009-12-29 15:30:01,305 - Arbitrator                - WARNING  - Setup: initialized 'failed_files' persistent list, contains 0 items.
2009-12-29 15:30:01,306 - Arbitrator                - WARNING  - Setup: moved 0 items from the 'files_in_pipeline' persistent list into the 'pipeline' persistent queue.
2009-12-29 15:30:01,306 - Arbitrator                - WARNING  - Setup: connected to the synced files DB. Contains metadata for 0 previously synced files.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner 
self.run()
  File "./arbitrator.py", line 276, in run
self.__setup()
  File "./arbitrator.py", line 259, in __setup
   fsmonitor_class = get_fsmonitor() 
  File "/home/hasarrived/bin/cdn/daemon/fsmonitor.py", line 203, in get_fsmonitor
    return __get_class_reference("fsmonitor_inotify", "FSMonitorInotify")
  File "/home/hasarrived/bin/cdn/daemon/fsmonitor.py", line 191, in __get_class_reference
    module = __import__(modulename, globals(), locals(), [classname])
  File "/home/hasarrived/bin/cdn/daemon/fsmonitor_inotify.py", line 24, in <module>
    class FSMonitorInotify(FSMonitor):
  File "/home/hasarrived/bin/cdn/daemon/fsmonitor_inotify.py", line 29, in FSMonitorInotify
    FSMonitor.CREATED             : pyinotify.IN_CREATE,
AttributeError: 'module' object has no attribute 'IN_CREATE'

It looks like it can't find some sort of inotify library or something. I do not know Python (yet), so I can't debug.

I did read the README.txts, API.txt and INSTALL.txts, and have spent pretty much all day trial-and-error but made no progress.

Any ideas?

Thanks,

Mike

Rackspace Cloud Files storage changes

I was using the code from mosso.py directly from fileconveyor and it was incredibly slow to sync. Somewhere on the lines of 8 hours for 30 files! And I had a few hundred to go. Along with that I was getting 200-400% cpu usage for python.

Anyway, I took a look at Rich Leland [email protected] original work at http://bitbucket.org/richleland/django-cumulus/src/tip/cumulus/storage.py and made some changes to your code based off his updates.

Now it is syncing very fast (50 files per minute / 20mb avg file size). The CPU usage is now at 28% during sync.

Not sure where to post my updated file, but let me know.

Thanks for you hard work.

Cheers,

David

link_updater.CSSURLUpdater error

I meet this error when fc is working.

Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/injoys/sites/default/files/css/css_4bee6decddcc0803d936840e3577390f.css'. Exception class: <type 'exceptions.TypeError'>. Message: 'NoneType' object is unsubscriptable.

Just before, it was complaining about that images listed in the css file wasn't synced.
Now the images are synced but the file couldn't be rewrited.
Thx for the help and sorry for the English

"have been synced" vs "are currently being synced"

Hi,

This is what my status page says:

* The synced files database exists.
* The synced files database is readable.
* 19702 files have been synced to the s3 server.
* The daemon is currently running.
* 0 files are waiting to be synced.
* 42978 files are currently being synced.

Could you please explain what's the difference between "files currently being synced" and "files have been synced", and why the latter can possibly be higher than the former.

Thanks!

Matching Urls: the Daemon and Drupal Advanced Mode CDN Module

Hi Wim, Thanks for the great system. I have the daemon configured and running, doing it's job transporting the files to amazon and inspecting the file activity to see if anything new is uploaded. However, it seems although all the files have successfully copied to cloudfront in the s3 bucket, the drupal cdn integration module isn't reporting any matching files from amazon.

When I switch to basic mode, it says 100% match. But when I'm in advanced, it doesn't show anything. Now I'm trying to figure out what the issue could be, the database has at least a megabyte of info in synced_files.db and the advanced drupal module settings says it's readable.

Here's my config file. (On my server's copy, to get basic mode to work, I commented out the unique filename processor and restarted the daemon with a fresh db so that the file names exactly match.)

Thanks,
Joseph

ps...the editor below removes config tags

bucket name Access Key Here Secret Access Key Here something.cloudfront.net misc:profiles:modules:themes:sites/all:sites/default:modules/ ico:js:css:gif:png:jpg:jpeg:svg:swf:pdf:rtf:doc:docx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.