Git Product home page Git Product logo

ruplicity's Introduction

ruplicity

Rust Coverage Status

Rust library to explore duplicity backups.

API documentation

Installation

Add the corresponding entry to your Cargo.toml dependencies:

[dependencies]
ruplicity = "0.2"

and add extern crate ruplicity to your crate root.

Motivations

Why I chose to implement a duplicity backup reader in Rust? What are the differencies with duplicity?

  1. Performances. Listing the files in a 195 GB backup from an external hard drive takes 9.1 seconds in my laptop with ruplicity and 166 seconds with duplicity with an empty cache. The time goes down to 33 seconds for duplicity by having cached the backup signatures in the hard drive. This is still a huge performance gain, however I believe that we can still improve the time a lot.
  2. Provide an easy to use library to implement features such as a command line utility, and a fuse filesystem to mount a backup directly in your file system (that is not easily implementable within duplicity).

This library does not aim to replace duplicity, since it does not provide actual backup / restore functionalities, and it does not have the many backends duplicity has. However, feel free to contribute if you need them.

Example

This example demonstrates the opening of a backup stored in a local directory, and printing the files present in each backup snapshot.

extern crate ruplicity;

use ruplicity::Backup;
use ruplicity::backend::local::LocalBackend;
use ruplicity::timefmt::TimeDisplay;

fn main() {
    // use the local backend to open a path in the file system containing a backup
    let backend = LocalBackend::new("tests/backups/single_vol");
    let backup = Backup::new(backend).unwrap();
    for snapshot in backup.snapshots().unwrap() {
        println!("Snapshot {}", snapshot.time().into_local_display());
        println!("{}", snapshot.entries().unwrap());
    }
}

Check out the documentation for advanced usages and examples.

Contributing

Contributions are welcome! There are lots of features still to be implemented. The most important are:

  • improve the code; I need some feedback from experienced Rustaceans here :);
  • improve performances (since there is always room for a boost);
  • implement new features such as read backup file contents, and new backends (e.g. Dropbox, Azure, FTP, etc.), like duplicity does;
  • support encrypted backups; this is actually more feasible when a rust GPG library is implemented.

License

This crate is distributed under the MIT license. See LICENSE for details.

And for those who are wondering: Can you use this license even if duplicity project is licensed under GNU GPL v2? Yes, because this project does not take a single line of code of duplicity and I wanted a permissive license to ease the use of the crate.

ruplicity's People

Contributors

mbrt avatar dependabot-preview[bot] avatar

Stargazers

Eric Moynihan avatar Denis Denisov avatar Stefano Probst avatar Roman Hossain Shaon avatar ik5 avatar John Deeny avatar Rust avatar Pieter Lange avatar K S avatar Emanuele Aina avatar  avatar Vitaliy V. Shopov avatar Andrew Ladouceur avatar  avatar Heni avatar KokaKiwi avatar Stephan Sokolow avatar  avatar

Watchers

 avatar James Cloos avatar  avatar

Forkers

fossabot

ruplicity's Issues

Support for non-UTF8 paths

On Windows non-UTF8 paths are not supported. However, in the manifest they could appear. If this happens, the backup becomes unreadable on Windows, so it's better to support them under the hood, and to change Entry interface, providing:

  • an Option<Path> for the Entry path,
  • and an &[u8] for the path bytes.

Add a command line tool

Build a command line tool similar to duplicity. The parameters don't have to be the same of duplicity. The focus should be for clarity.

  • add command line handler from kbknapp/clap-rs;
  • support list collections;
  • support list files.

Add logs

Library users will be able to get warnings at least.

Volume numbers sould be of `usize` type

Currently BackupSet::volume_path takes an i32. Should be changed to usize. Moreover, the underlying storage should be a Vec and not an HashMap<i32, Path>.

Rename mod `time_utils` into `timefmt`

The utils suffix is very ugly and means nothing. Will "unuseful" modules ever exist? They are all supposed to be useful or to contain utilities somehow...

Fix tests under windows

  • Local time zone is not settable from the test, so simply remove the test
  • Size hint does not work for the unprintable path in the backup
  • Non-UTF8 paths are not supported on windows, so fix tests accordingly

Incomplete signature chain on multi chain backups

When a backup with multiple chains is provided, examining an incremental snapshot of a chain different than the first will rise "The signature chain is incomplete" error.

For example with this backup:

Backup chain
Start time:            Aug 11  2015
End time:              Sep 25  2015
Number of backup sets:            3
Number of volumes:             1971
Backup sets (index, type, time, num volumes):
     0 Full         Aug 11  2015  1929
     1 Incremental  Sep 20  2015    28
     2 Incremental  Sep 25  2015    14

Backup chain
Start time:            Dec 06  2015
End time:              Jan 08 22:53
Number of backup sets:            5
Number of volumes:             1990
Backup sets (index, type, time, num volumes):
     3 Full         Dec 06  2015  1985
     4 Incremental  Dec 12  2015     1
     5 Incremental  Dec 25  2015     2
     6 Incremental  Jan 05 22:10     1
     7 Incremental  Jan 08 22:53     1

Listing files in snapshot number 4, will rise that error.

Better date time formatting

The current pretty format for timestamps is RFC-822Z, that is not the best for readability. I suggest to use the same format as the unix ls command does. If the date is from the same as the current year, print month, day, time; otherwise print month, day, year.

Example:

drwxr-xr-x 2 dev dev 4096 Jul 27  2002 old_dir
-rw-rw-r-- 1 dev dev 3016 Nov 14 16:18 new_file

Optimize performances

In a large backup (104190 files), some performance issues were found when getting the file list. See the attached flame graph.

The 87.89% of execution time is inside the compute_size_hint function, caused by getting the file size by using the Read::count method.

Consider the execution time with compute_size_hint enabled:

real    0m36.682s
user    0m36.534s
sys 0m0.176s

And with that computation disabled:

real    0m10.099s
user    0m9.929s
sys 0m0.176s

Use enums when possible

  • in file_naming module, merge FileType and FileName in a single enum. This change saves memory and provides better readability;
  • consider the same issue in collections module, for BackupSet

Consider implementing IntoIterator for Snapshots

Instead of implementing Iterator directly, implement IntoIterator. This is less misleading, since one can call as_snapshots while iterating, which is not unsafe, but at least, strange. In this way we have a clear separation between the as_xxx and iterators.

Format permissions

Current Display for backup files does not consider setuid, setgid and sticky bit.

On unix they are displayed as:

$ touch file
$ chmod 7700 file
$ ls -la file
-rws--S--T 1 dev dev    0 Nov 27 09:52 file

See Unix permissions calculator to do some experiments.

Update tar crate

Bleeding edge version uses:

  • Entries intead of Files;
  • EntriesMut instead of FilesMut;
  • Entry instead of File.

Handle also support for entry types like is_directory, is_file, is_soft_link, etc.

See tar-rs docs.

  • Port to new API;
  • Check if entry type is correctly exposed;
  • Use a published version and not the git repo.

Add type of backup entry info

Currently we do not provide the type of entry for a BackupFile. It could be a directory, a file, a link, etc.

Need also to rename "File" in "Entry" for this reason.

Truncated file names

When displaying file list, if a path is too long it is truncated.

Example:

rw-rw-r--  michele  michele  Aug 14  2013  home/michele/Documenti/Development/Progetti/Meta cloud/Reference/duplicati/BuildTools/WixI

But the original path is:

home/michele/Documenti/Development/Progetti/Meta cloud/Reference/duplicati/BuildTools/WixIncludeMake/Program.cs

Fix date time formatting

There are currently no differences between formatting a timestamp in UTC and local time zones.
There is probably a bug in time_utils module.

Use saturating_sub

There are a bunch of if a > b { a - b } else { 0 } occurrences. This could be easily solved by using a.saturating_sub(b) if the variables are unsigned.

Revise signature::SnapshotEntries iterator

Instead of implementing Iterator for SnapshotEntries, it's better to implement IntoIterator for SnapshotEntries, &SnapshotEntries, &mut SnapshotEntries. This allows to add to SnapshotEntries these methods independently:

  • entry(&self, id: EntryId) -> Entry;
  • into_display(self) -> SnapshotEntriesDisplay.

Consider removing support for non-UTF8 paths

Working with byte arrays is a big pain, especially in the manifest parsing. It's maybe better to drop support for non UTF-8 paths. We are not going to see many of them in real world backups.

Get backup file size from signature

Understand signature file format to get the original file size.

See mksum.c in librsync project.

Duplicity uses md4 type signatures, because the header starts with: 0x72730136.

Suggested interface: fn size_hint(&self) -> (usize, usize).

Fix formatting

  • Struct definitions first, impl after;
  • Public first, private after;
  • Use free functions instead of static one if self is not used;
  • Move static functions into free functions;
  • Run rustfmt on the sources.

Read files from backup

Extract the restore functionality from duplicity, to implement the read of a file snapshot in a backup.

  • Parse manifest files
  • Integrate librsync-rs crate
  • Implement a cache trait to allow reusing extracted file chunks
  • Use fnv hashing for the cache
  • Documentation

Fix signature files for incremental snapshots

Tests are now failing because files reported by incremental signatures are equal to the parent full signature.

The test in question exposes the problem, because in the first incremental signature there undetected modifications w.r.t the full snapshot:

  • new files;
  • deleted files;
  • changed last modified times.

Documentation

Add documentation to all public structs and methods.

  • Use #![deny(missing_docs)] lint at crate level to help documenting everything
  • Add examples for all common methods
  • Add examples in README
  • Add license notes in README
  • Add motivations and related crates in README

Improve readability of BlockId and EntryId

Now we use pub type BlockId = (EntryId, usize) and pub type EntryId = (usize, u8) which are pretty unreadable. For example, to get the snapshot number out of a BlockId we have to do (id.0).1. Better to wrap them in new types and provide accessor methods.

Add error module

Instead of using io::Error everywhere, create a new crate global error type, that contains all the possible errors. A nice possibility is to use the quick-error crate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.