Git Product home page Git Product logo

python-librsync's Introduction

SmartFile

A SmartFile Open Source project. Read more about how SmartFile uses and contributes to Open Source software.

Travis CI Status Code Coverage Latest PyPI version Number of PyPI downloads

Introduction

A ctypes wrapper for librsync. Provides signature(), delta(), and patch() functions.

There are three steps necessary to synchronize a file. Two steps are performed on the source file and one on the destination.

  1. Generate a signature for the destination file.
  2. Generate a delta for the source file (using the signature).
  3. Patch the destination file using the generated delta.

Usually, these steps involve remote systems. Here is an example of synchronizing two local files.

import librsync

# The destination file.
dst = open('Resume-v1.0.pdf', 'rb')

# The source file.
src = open('Resume-v1.2.pdf', 'rb')

# Where we will write the synchronized copy.
synced = open('Resume-latest.pdf', 'wb')

# Step 1: prepare signature of the destination file
signature = librsync.signature(dst)

# Step 2: prepare a delta of the source file
delta = librsync.delta(src, signature)

# Step 3: synchronize the files.
# In many cases, you would overwrite the destination with the result of
# synchronization. However, by default a new file is created.
librsync.patch(dst, delta, synced)

Extending

This wrapper only exposes the most common operations that librsync provides. It is not meant to be a full wrapper, but should cover most use-cases. You can easily extend this wrapper. Information about librsync is available in it's manual which is linked below (I wish I had found this BEFORE writing this wrapper!)

http://rproxy.samba.org/doxygen/librsync/refman.pdf

python-librsync's People

Contributors

btimby avatar zulupro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-librsync's Issues

redundant read in patch/read_cb

in the function patch, the read_cb callback reads the same file position repeated.
this leads to poor performance on big files

how to repeat:
to be able to see the problem, add a line to the read_cb function
see:
https://github.com/smartfile/python-librsync/blob/master/librsync/__init__.py#L221

add the follwing line before the f.seek(pos) line
print "pos:",pos," length:",length

the code should now look:

...
def read_cb(opaque, pos, length, buff):
        print "pos:",pos," length:",length
        f.seek(pos)
        block = f.read(length)
....

now store this testcase as file and execute it:

import librsync, os, shutil
#create the datfiles only once
file_1='1mb_a.dat'
file_2='1mb_b.dat'
file_new='1mb_c.dat'

if (not os.path.exists(file_1)):
    rnd = open('/dev/random','rb')
    dst = open(file_1, 'wb')
    dst.seek(0)
    dst.write(rnd.read(1000000));
    dst.close()
    rnd.close()

if (not os.path.exists(file_2)):
    src = open(file_1,'rb')
    cnt = src.read()
    src.close()

    #make a change in the content of file_1
    cnt_a=bytearray(cnt)
    if (cnt_a[10] != 'A'):
        cnt_a[10] = 'A'
    else:
        cnt_a[10] = 'B'

    #and store the changed content as file_2
    dst = open(file_2, 'wb')
    dst.seek(0)
    dst.write(str(cnt_a))
    dst.close()

#now create the signature, delta
dst = open(file_1, 'rb')
src = open(file_2, 'rb')

synced = open(file_new, 'wb')
signature = librsync.signature(dst)
delta = librsync.delta(src, signature)

# Step 3: synchronize the files.
librsync.patch(dst, delta, synced)

the output is:
python redundant-read.py
pos: 0 length: 2048
pos: 0 length: 65536
pos: 0 length: 131072
pos: 0 length: 196608
pos: 0 length: 262144
pos: 0 length: 327680
pos: 0 length: 393216
pos: 0 length: 458752
pos: 0 length: 524288
pos: 0 length: 589824
pos: 0 length: 655360
pos: 0 length: 720896
pos: 0 length: 786432
pos: 0 length: 851968
pos: 0 length: 917504
pos: 0 length: 983040

here, the read_cb callback reads the file from position 0 again and again

Pypi version is seriously broken

Just pip installed it and quickly found out one serious bug

Python 3.4.3 (default, Sep 14 2015, 17:11:46) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(50, 70):
...     s = b'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'*i
...     d = b'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'*i + b'2'
...     src = io.BytesIO(s)
...     dst = io.BytesIO(d)
...     signature = librsync.signature(dst)
...     delta = librsync.delta(src, signature)
...     synced = librsync.patch(dst, delta)
...     _= src.seek(0)
...     src.read() == synced.read()
... 
True
True
True
True
True
True
True
True
True
False
False
False
False
False
False
False
False
False
False
False

tried with current master and all seems ok, so maybe it will be worth it to publish a new release on pypi :)

filesize limit of 4MB

Trying to work with files bigger than 4MB results in misleading errors.

Changing the size in the tests.py to (1024**2 * 5) will result in:

File "python-librsync/librsync/__init__.py", line 126, in _execute
  block = f.read(RS_JOB_BLOCKSIZE)
File "/usr/lib64/python2.7/tempfile.py", line 581, in read
  return self._file.read(*args)

IOError: File not open for reading

AttributeError: 'unicode' object has no attribute 'read'

I received the following error when attempting to use sync:

Traceback (most recent call last):
File "test.py", line 11, in
main()
File "test.py", line 8, in main
sync.download('/home/travis/smartfile-sync/test.txt', '/test.txt')
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/smartfile/sync.py", line 117, in download
self.sync(RemoteFile(remote, self.api), LocalFile(local))
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/smartfile/sync.py", line 103, in sync
return dst.patch(src.delta(dst.signature(block_size=self.block_size)))
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/smartfile/sync.py", line 47, in patch
r = librsync.patch(reference, delta, output)
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/librsync/init.py", line 117, in wrapper
return f(_args, *_kwargs)
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/librsync/init.py", line 229, in patch
_execute(job, d, o)
File "/home/travis/virtualenv/local/lib/python2.7/site-packages/librsync/init.py", line 130, in _execute
block = f.read(RS_JOB_BLOCKSIZE)
AttributeError: 'unicode' object has no attribute 'read'

the behavior of compare and patch file of the same?

If two files are the same content, applying the following procedure will fail ... diff mytest.1.txt and mytest3.txt will differ. Is this the intended behavior, or am I missing something.

#!/usr/bin/python

import librsync

with open("mytest1.txt", "wb") as f:
    f.write("hello world\n")

with open("mytest2.txt", "wb") as f:
    f.write("hello world\n")

with file("mytest1.txt", "rb") as src, file("mytest2.txt", "rb") as dest, \
    file("mytest3.txt", "wb") as patched:

    signature = librsync.signature(dest)
    delta = librsync.delta(src, signature)
    librsync.patch(dest, delta, patched)

"Unexplained problem" while patching

I have two files:

OldFile.txt with content:

Text.

NewFile.txt with content:

New text.
Text.

Then I run the example provided in readme:

import librsync

dst = file('OldFile.txt', 'rb')
src = file('NewFile.txt', 'rb')
synced = file('Synced.txt', 'wb')

signature = librsync.signature(dst)
delta = librsync.delta(src, signature)
librsync.patch(dst, delta, synced)

And get the following error:

Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 314, in 'calling callback function'
TypeError: read_cb() takes exactly 3 arguments (4 given)
python: ERROR: (rs_job_complete) patch job failed: unexplained problem
Traceback (most recent call last):
  File "RsyncTest.py", line 12, in <module>
    librsync.patch(dst, delta, synced)
  File "/usr/local/lib/python2.7/dist-packages/librsync/__init__.py", line 113, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/librsync/__init__.py", line 228, in patch
    _execute(job, d, o)
  File "/usr/local/lib/python2.7/dist-packages/librsync/__init__.py", line 141, in _execute
    raise LibrsyncError(r)
librsync.LibrsyncError: unexplained problem

I tried syncing many different files and all went well, but this one case just won't work.

Does this work recursively?

Can I recursively rsync a directory using python-librsync? Will it accept a list of files names/directories to rsync (i.e. using rsync's --include-from=FILE or --files-from=FILE options).

Thanks! :)

ImportError: Could not load librsync, make sure it is installed

Traceback (most recent call last):
File "C:\Users\Devin Lucaschu\AppData\Local\Programs\Python\Python311\Lib\site-packages\librsync_init_.py", line 20, in
librsync = ctypes.cdll.librsync
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Devin Lucaschu\AppData\Local\Programs\Python\Python311\Lib\ctypes_init
.py", line 446, in getattr
dll = self.dlltype(name)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Devin Lucaschu\AppData\Local\Programs\Python\Python311\Lib\ctypes_init
.py", line 376, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'librsync' (or one of its dependencies). Try using the full path with constructor syntax.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "a:\User\Devin Lucaschu\Download\tex.py", line 2, in
import librsync
File "C:\Users\Devin Lucaschu\AppData\Local\Programs\Python\Python311\Lib\site-packages\librsync_init_.py", line 22, in
raise ImportError('Could not load librsync, make sure it is installed')
ImportError: Could not load librsync, make sure it is installed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.