Git Product home page Git Product logo

dedupsqlfs's People

Contributors

disconnect3d avatar sergey-dryabzhinsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dedupsqlfs's Issues

Snapshot cleanup by plan removes first (oldest) yearly snapshot

The goal of plan is to keep first yearly snapshot, first interval snapshot.
So if plan is 2y,6m,8w,14d then it must contain:

  • at least 2 yearly snapshots, and 1 oldest in 2 years interval
  • at least 6 monthly snapshots, and 1 oldest in 6 month interval
  • at least 8 weekly snapshots, and 1 oldest in 8 week interval
  • at least 14 daily snapshots, and 1 oldest in 14 days interval

All of them cat intersect with each other.

Need to recalculate time-line to keep right amount of them.

Fix truncated down files data

If file was truncated down - blocks/index not deleted, block data not zeroed.
Indexes flushing only by defragmentation.
But by block data not truncated and block sizes not adjusted - subvol stats are incorrect (sparse sizes).

Need to check inode size change. If it possibly truncated - than recalc block count, last block size and truncate data to new size.

Block size per inode

Add posibility to config block size per inode, except directory, special files.

Add command interface via socket

To call snapshooting, vacuum, and other commands then FS is mounted.
The trick is to not mess with threads and fuse operations and Sqlite.

Try to:

  • start RPC-socket-server in separate thread
  • use pipe/queue in main thread to get commands from RPC
  • trigger queue run by trigger fuse operation like file creation or stat change
  • use socket in do command

Update Zstd to 1.0.0

Use latest stable version.

And this one can be build w/o legacy formats support.

Probably others too: 0.6.1, 0.4.7.
v0.3.6 must remain with legacy support.

Gzip snapshots

Gzip tree, inode, block-index tables with gzip after snapshot creation.
This can make snapshooted data size be lesser by 75%.

This need to be supported by defragment, table open/close code.

Extract snapshot

Extract all snapshot data into other copy of dedupsqlfs like a new fs.

Rehash action

Change hashing algo for FS and recalculate all hashes with it.

Support sparce files

There is two ways for handle sparce areas in files:

  1. If block full of zeroes - then don't hash, compress and write it.
    Still, write block size to db.
  2. If block has many zeroes at block end, at least more than 1024b or 10% - then hash, compress and write only valuable bytes.
    Write "real", with zeroes, block size to db.

Recompress on the fly

  • add "isDeprecated" to compressors
  • add option "--recompress-on-fly" to mount action
  • check compression type for readed and writed block
    • if it's deprecated - mark block as "to write / update", compress again and save

Wrong table names in files per subvolume

It was designed to just change files name for table storage.
Not table name inside that file.
So it can be copied back.

But sometimes things get worse: indexes may get doubled after making snapshot.

Try to use multi-threaded compression for fast methods

For example lz4, zstd, lzo can be really fast.
And multi-process compression waste more time in inter-process communication.
Need to test if multi-threaded version will be faster.

Add:

  • multi-threaded compression tool class
  • new option to switch between multi-th/p versions

Add support for cython

Some parts/modules of DdSF can be compiled by cython:

  • dedupsqlfs/lib/cache
  • dedupsqlfs/fuse

This can be used by packaging.

Recompression action

Create "recompression" action in "do" app to "remove" not needed compression algo from fs. Don't forget to NULL all subvol stats after that.

Make three modules for quicklz

Quicklz compression level is choosen at compilation time.
What need to do:

  1. Keep "old" module version for compatibility
  2. Make two versions with different compilation options: quicklzf(ast), quicklzm(edium) & quicklzb(est)

Cleanup snapshots by some plan

What is means:

  • remove old snapshots
  • keep only selected by some periods

Like --cleanup-old-snapshots-by-plan=14d:8w=1:18m=1:3y, so it will keep:

  • 14 latest daily snapshots (distance <=1 day between)
  • 8 weekly - Monday or any in week day if no other (distance <=1 week between)
  • 18 monthly - 1 day or any day if no other (distance <=1 month between)
  • 3 yearly - one in year (distance <=1 year between)

Caught exception in read(): 'NoneType' object is not subscriptable

2017-05-21 05:05:17,751 - DedupFS - ERROR - Traceback (most recent call last):
  File "/opt/rusoft/dedupsqlfs/dedupsqlfs/fuse/operations.py", line 765, in read
    data = self.__get_block_data_by_offset(fh, offset, size)
  File "/opt/rusoft/dedupsqlfs/dedupsqlfs/fuse/operations.py", line 1480, in __get_block_data_by_offset
    block = self.__get_block_from_cache(inode, n + first_block_number)
  File "/opt/rusoft/dedupsqlfs/dedupsqlfs/fuse/operations.py", line 1399, in __get_block_from_cache
    self.getLogger().debug("-- db size: %s" % len(item["data"]))
TypeError: 'NoneType' object is not subscriptable

Start and stop mysqld server separately from main commands

Make mysqld server startup and stop separate commands or options for do/mkfs commands.
Goal:

  • start server and make fs
  • mount fs and connect to started server
  • operate with files, sync backups, etc.
  • umount fs
  • make snapshot
  • print statistics
  • stop server

It can save about 10-30 seconds on each operation due disabled start-stop of server.

Add timer-thread to push fs events and cache flush-expire

On some looong rsync processes filesystem consume memory and ... stops.

To gradualy drop caches we need to touch root of filesystem to provide any event (setattr).
It will trigger cache cleanup procedures.
Just create Thread that do os.utime(mountPoint, None) every second.
Until fuse destroy.

Setuptools module deprecated

Should use distutils module distributed with python.

LZ4 & ZSTD modules affected.

Problem is that module not exists in old distros like Debian Wheezy, Ubuntu Lucid.

Store snapshot statistics in subvol table

And update statistics only if subvol FS was modified.

  1. Store statistics only for snapshots (readonly=True)
  2. Recalculate on umount only if modified
  3. Store in table only if modified, it is snapshot, no stats saved. On request stats, on snapshot creation.

Current root subvol can gather stats on umount. Need option. If modified.

This is for snapshot statistics output speed up, subvolumes list stats speed up.

Stats:

  1. apparent size
  2. unique size
  3. sparse size
  4. dedup size
  5. compressed size
  6. comp. uniq. size
  7. comp. type. stats
    In json, one blob field.

Add special option for decompress tryout variants

Add --decompress-tryout=method1:method2,method3:method4,....
Enable instead --decompress-try-all to speedup decompress on errors.

Behaviour:

  • if method1 fails - try method2
  • if method2 fails - try method1
  • else - raise error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.