Git Product home page Git Product logo

mobius3's Introduction

mobius3 CircleCI Test Coverage

Continuously and asynchronously sync a local folder to an S3 bucket. This is a Python application, suitable for situations where

  • FUSE cannot be used, such as in AWS Fargate;
  • high performance local access is more important than synchronous saving to S3;
  • there can be frequent modifications to the same file monitored by a single client;
  • there are infrequent concurrent modifications to the same file from different clients;
  • local files can be changed by any program;
  • there are at most ~10k files to sync;
  • changes in the S3 bucket may be performed directly i.e. not using mobius3.

These properties make mobius3 similar to a Dropbox or Google Drive client. Under the hood, inotify is used and so only Linux is supported.

Early version. Please consider enabling versioning on the S3 bucket to avoid data loss.

Installation

pip install mobius3

Usage

mobius3 can be used a standalone command-line application

mobius3 /local/folder remote-bucket https://{}.s3-eu-west-2.amazonaws.com/ eu-west-2 --prefix folder/

or from Docker

docker build -t mobius:latest .
docker run --rm -it \
    -v /local/folder:/home/mobius3/data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    mobius:latest
    mobius3 \
        /home/mobius3/data \
        remote-bucket \
        https://{}.s3-eu-west-2.amazonaws.com/ \
        eu-west-2 \
        --prefix my-prefix/

or from asyncio Python

from mobius3 import Syncer

start, stop = Syncer('/local/folder', 'remote-bucket', 'https://{}.s3-eu-west-2.amazonaws.com/', 'eu-west-2', prefix='folder/')

# Will copy the contents of the bucket to the local folder,
# raise exceptions on error, and then continue to sync in the background
await start()

# Will complete any remaining uploads
await stop()

In the cases above AWS credentials are taken from the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. To use ECS-provided credentials / IAM Roles, you can pass --credentials-source ecs-container-endpoint as a command line option. In an ECS task definition, this would look something like the below

{
    "command": [
        "mobius3",
        "/home/mobius3/data",
        "remote-bucket",
        "https://{}.s3-eu-west-2.amazonaws.com/",
        "eu-west-2",
        "--prefix", "my-prefix/"
        "--credentials-source", "ecs-container-endpoint"
    ]
}

If using mobius3 to sync data in a volume accessed by multiple containers, you may have to create your own Dockerfile that runs mobius3 under a user with the same ID as the users in the other containers.

Under the hood and limitations

Uploads to S3

Uploads to S3 are initiated when a file is closed.

Downloads from S3

A simple polling mechanism is used to check for changes in S3: hence for large number of files/objects mobius3 may not be performant. If a file has been updated or deleted by a local process, until 120 seconds after the completion of its upload to S3, it will not be updated by a poll to S3. This is a best-effort attempt to mitigate the possibility of older versions overwriting newer due to the eventual consistency model of S3.

Renaming files and folders

Renaming files or folders map to no atomic operation in S3, and there is no explicit conflict resolution, so conflicts are resolved by S3 itself: the last write wins. This means that with concurrent modifications or deletions to the same file(s) or folder(s) by different clients, data can be lost and the directory layout may get corrupted.

Responding to concurrent file modifications

Mid-upload, a file can could modified by a local process, so in this case a corrupt file could be uploaded to S3. To mitigate this mobius3 uses the following algorithm for each upload.

  • An IN_CLOSE_WRITE event is received for a file, and we start the upload.
  • Just before the end of the upload, the final bytes of the file are read from disk.
  • A dummy "flush" file is written to the relevant directory.
  • Wait for the IN_CREATE event for this file. This ensures that any events since the final bytes were read have also been received.
  • If we received an IN_MODIFY event for the file, the file has been modified, and we do not upload the final bytes. Since IN_MODIFY was received, once the file is closed we will receive an IN_CLOSE_WRITE, and we re-upload the file. If not such event is received, we complete the upload.

An alternative to the above would be use a filesystem locking mechanism. However

  • other processes may not respect advisary locking;
  • the filesystem may not support mandatory locking;
  • we don't want to prevent other processes from progressing due to locking the file on upload: this would partially remove the benefits of the asynchronous nature of the syncing.

Keeping HTTP requests for the same file ordered

Multiple concurrent requests to S3 are supported. However, this presents the possibility of additional race conditions: requests started in a given order may not be received by S3 in that order. This means that newer versions of files can be overwritten by older. The guarantee from S3 that "latest time stamp wins" for concurrent PUTs to the same key does not offer protection from this.

Therefore to prevent this, a FIFO mutex is used around each file during PUT and DELETE of any key.

Objects with the same key as a directory

S3 is a key, value store and not a filesystem: there is no perfect mapping of all possible keys to a directory structure, e.g. it can store objects with keys a and a/b, but a filesystem can't have files with paths /a and /a/b. In such a case mobius3 will usually treat a as a file and ignore a/b. However, if a/b is created while mobius3 is running and synced locally, then a will not be created locally.


Some of the above behaviours may change in future versions.

Running tests

docker-compose build && \
docker-compose run --rm test python3 setup.py test

mobius3's People

Contributors

abbas123456 avatar dependabot[bot] avatar michalc avatar niross avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobius3's Issues

Upload or Download only?

This is more of a question than an issue, does mobius3 support upload or download only rather than keeping everything in sync?

In my case, I am running SFTP within Fargate ECS and just want to upload the files and forget about them.

How to use mobius3

Hy is there any documentation of this tool regarding how to use it, I tried to run the python code you have mentioned in the REDME.md. It gives me the following error

SyntaxError: 'await' outside function

And when I try to use it with the cmd command, It gives me the following error

    raise KeyError(key) from None
KeyError: 'AWS_ACCESS_KEY_ID'

Where do I need to enter the key, please provide some documentation on how to use this tool, how can I enable the synchronization with this folder.

Creating a new file raises a KeyError

If I run mobius3 with an existing bucket and then go to the synced folder and run touch foo, the file is uploaded correctly, but raises this error :

s3sync:event,f7997d07] Exception during <function Syncer.<locals>.schedule_upload_meta.<locals>.function at 0x7ff8750350e0>
Traceback (most recent call last):
  File "/home/yamrzou/.pyenv/versions/mainenv/lib64/python3.7/site-packages/mobius3.py", line 755, in process_jobs
    await job()
  File "/home/yamrzou/.pyenv/versions/mainenv/lib64/python3.7/site-packages/mobius3.py", line 678, in function
    await upload_meta(logger, path, version_current, version_original)
  File "/home/yamrzou/.pyenv/versions/mainenv/lib64/python3.7/site-packages/mobius3.py", line 884, in upload_meta
    on_done=set_meta,
  File "/home/yamrzou/.pyenv/versions/mainenv/lib64/python3.7/site-packages/mobius3.py", line 943, in locked_request
    if not cont():
  File "/home/yamrzou/.pyenv/versions/mainenv/lib64/python3.7/site-packages/mobius3.py", line 878, in <lambda>
    cont=lambda: meta[path] != data,
KeyError: PurePosixPath('/local/folder/foo')

I suppose it's due to the fact that set_meta is only called [on_done], but I can't fix it as it's not clear to me what cont does.

changes made during the system offline is not getting sync

When my system is not connected to internet or when I just not run the program and make changes to the folder which I have sync with my S3Bucket, and when I connect to internet or run the program, mobius3 don't sync the new changes of the local folder to the S3Bucket instead what it do, It just overwrite my local folder changes with the data available in my S3Bucket. Is this a bug or this feature is not available. Please help.

Sync state

Is there an easy way to know (e.g. from log entry) whether mobius3 is in an idle state where there is no pending or ongoing work? I.e. to check whether mobius3 thinks everything is sync'd ?

Fails on OsX

mobius3 fails to run on OsX because od wrong imports:

Traceback (most recent call last):
  File "/Users/username/developer_tools/file-ingestion/main.py", line 3, in <module>
    from mobius3 import Syncer
  File "/Users/username/developer_tools/file-ingestion/.venv/lib/python3.9/site-packages/mobius3.py", line 56, in <module>
    libc = ctypes.CDLL('libc.so.6', use_errno=True)
  File "/usr/local/Cellar/[email protected]/3.9.1_6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(libc.so.6, 6): image not found

This should probably just be changed so it imports .dylib instead of .so if the platform is darwin.
e.g. https://stackoverflow.com/a/35675786

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.