lakshmipathi / dduper Goto Github PK
View Code? Open in Web Editor NEWFast block-level out-of-band BTRFS deduplication tool.
License: GNU General Public License v2.0
Fast block-level out-of-band BTRFS deduplication tool.
License: GNU General Public License v2.0
Just tried it out and got lots of messages like:
btrfs inspect-internal: unknown token 'dump-csum'
.
There should be a text somewhere on how to apply a patch or get a version of btrfs-progs which supports the dump-csum
command in the README
I have disk data of 7GB and --analyze
reports it can claim 22GB data dedupe? Please explain how this magic happens ? :D
: 0m :
8192 : /mnt/fn_abcd_50m_200m:/mnt/fn_cdcdcd_50m_300m : 278528
8192 : /mnt/fn_abcd_50m_200m:/mnt/fn_pqsrt_50m_250m : 0
8192 : /mnt/fn_abcd_50m_200m:/mnt/fn_pqsrt_100m_500m : 0
8192 : /mnt/fn_abac_50m_200m:/mnt/fn_cdcdcd_50m_300m : 139264
8192 : /mnt/fn_abac_50m_200m:/mnt/fn_pqsrt_50m_250m : 0
8192 : /mnt/fn_abac_50m_200m:/mnt/fn_pqsrt_100m_500m : 0
8192 : /mnt/fn_cdcdcd_50m_300m:/mnt/fn_pqsrt_50m_250 : 0
: m :
8192 : /mnt/fn_cdcdcd_50m_300m:/mnt/fn_pqsrt_100m_50 : 0
: 0m :
8192 : /mnt/fn_pqsrt_50m_250m:/mnt/fn_pqsrt_100m_500 : 491520
: m :
================================================================================
dduper:23117824KB of duplicate data found with chunk size:8192KB
Hi!
I have tested dduper on a Raspberry Pi 3B+ with Raspberry OS.
Unfortunately, the installation does not succeed, because the following error occurs:
Building wheels for collected packages: pysqlite3
Building wheel for pysqlite3 (setup.py) ... done
WARNING: Legacy build of wheel for 'pysqlite3' created no files.
Command arguments: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-5pjwwdhd/pysqlite3_1d66649f2505475fa84dcb126b8dd6d9/setup.py'"'"'; __file__='"'"'/tmp/pip-install-5pjwwdhd/pysqlite3_1d66649f2505475fa84dcb126b8dd6d9/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-g4vittxi
Command output: [use --verbose to show]
Running setup.py clean for pysqlite3
Failed to build pysqlite3
The rest could be installed without problems according to the log.
I have tested the same installation method on a normal x64 system with Debian 10, there everything worked great and the tool runs wonderfully.
This code seems to use python2.
beautifultable
, but the one installed via pip
does not work:(venv) root@box:~/dduper# ./dduper --device /dev/mapper/something --dir /mnt/something
Traceback (most recent call last):
File "./dduper", line 28, in <module>
from beautifultable import BeautifulTable
File "/root/dduper/venv/lib/python2.7/site-packages/beautifultable/__init__.py", line 5, in <module>
from .beautifultable import ( # noqa F401
File "/root/dduper/venv/lib/python2.7/site-packages/beautifultable/beautifultable.py", line 36, in <module>
from .utils import (
File "/root/dduper/venv/lib/python2.7/site-packages/beautifultable/utils.py", line 39
def ensure_type(value, *types, varname="value"):
^
SyntaxError: invalid syntax
Sadly, this makes this tool unusable on my end.
Provide one-liner installation using pypi. pip3 install dduper
The docker image laks/dduper
created via Dockerfile has 287MB
. Reduce this image preferably around 100MB. Hint: docker history laks/dduper
I tried running sudo dduper -Dm
on my backups and it outputs an endless list of skipped files. Maybe it could be a bit less verbose? i.g: just output a counter of skipped files? Put them in a log file?
Also: This kind of messages should probably be printed to stderr rather than stdout.
Is it possible to dedupe read only subvolumes? Couldn't find an option to dedupe read only subvolumes.
I have a bunch of read only snapshots but I can't use dduper on them.
I just discovered dduper today and was trying to set it up, but ran into the following issue while cut/pasting the INSTALL.md steps to apply the btrfs-progs patch.
08:23:58 evil@H510 ~/src/dduper/btrfs-progs» patch -p1 < ../patch/btrfs-progs-v5.6.1/0001-Print-csum-for-a-given-file-on-stdout.patch
patching file Makefile
Hunk #1 FAILED at 158.
1 out of 1 hunk FAILED -- saving rejects to file Makefile.rej
patching file cmds/commands.h
patching file cmds/inspect-dump-csum.c
patching file cmds/inspect.c
Hunk #1 succeeded at 667 (offset -3 lines).
Makefile.rej:
08:28:14 evil@H510 ~/src/dduper/btrfs-progs» cat Makefile.rej
--- Makefile
+++ Makefile
@@ -158,7 +158,8 @@ cmds_objects = cmds/subvolume.o cmds/filesystem.o cmds/device.o cmds/scrub.o \
cmds/rescue-super-recover.o \
cmds/property.o cmds/filesystem-usage.o cmds/inspect-dump-tree.o \
cmds/inspect-dump-super.o cmds/inspect-tree-stats.o cmds/filesystem-du.o \
- mkfs/common.o check/mode-common.o check/mode-lowmem.o
+ mkfs/common.o check/mode-common.o check/mode-lowmem.o \
+ cmds/inspect-dump-csum.o
libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o \
kernel-lib/radix-tree.o extent-cache.o extent_io.o \
crypto/crc32c.o common/messages.o \
Looking at the Makefile, it looks like more cmds_objects have been added since the original patch. I manually added cmds/inspect-dump-csum.o to my makefile to workaround this, but thought you'd want to know to so that you can update the patch.
Right now dduper works only with crc32. Add support for other checksum types like xxhash,blake,sha256
Examine and fix https://gitlab.com/giis/dduper/-/jobs/720569752 failure.
I'm trying to make a folder of highly redundant data. 128k chunk size barely makes a difference but using smaller chunks on duperemove made a significant difference.
Running dduper on a subvolume doesn't seem to work. Both directories have the same two files. Both files are canceled dd copies of my boot drive.
Output from subvolume:
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/subvol/ddtest/ --dry-run
Prefect match : /btrfs/subvol/ddtest/sbd.img /btrfs/subvol/ddtest/sbd.img2
Summary
blk_size : 4KB chunksize : 8192KB
/btrfs/subvol/ddtest/sbd.img has 0 chunks
/btrfs/subvol/ddtest/sbd.img2 has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) available for dedupe: 0
dduper took 32.3749928474 seconds
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/subvol/ddtest/
Prefect match : /btrfs/subvol/ddtest/sbd.img /btrfs/subvol/ddtest/sbd.img2
************************
Dedupe completed for /btrfs/subvol/ddtest/sbd.img:/btrfs/subvol/ddtest/sbd.img2
Summary
blk_size : 4KB chunksize : 8192KB
/btrfs/subvol/ddtest/sbd.img has 0 chunks
/btrfs/subvol/ddtest/sbd.img2 has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) deduped: 0
dduper took 32.7617127895 seconds
Output from rootvolume:
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/ --dry-run
Summary
blk_size : 4KB chunksize : 32KB
/btrfs/ddtest/sbd.img has 184064 chunks
/btrfs/ddtest/sbd.img2 has 84480 chunks
Matched chunks: 32066
Unmatched chunks: 52414
Total size(KB) available for dedupe: 1026112
dduper took 36.9195628166 seconds
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/
************************
Dedupe completed for /btrfs/ddtest/sbd.img:/btrfs/ddtest/sbd.img2
Summary
blk_size : 4KB chunksize : 32KB
/btrfs/ddtest/sbd.img has 184064 chunks
/btrfs/ddtest/sbd.img2 has 84480 chunks
Matched chunks: 32066
Unmatched chunks: 52414
Total size(KB) deduped: 0
dduper took 204.889986038 seconds
Also I'm not sure why the total size deduped is 0 on the actual dedupe...
I am using blake2 as csum on a 6-drive raid5 data raid1 meta array.
Need to verify how dduper behaves for sparse files.
Instead of building docker image manually and pushing it to docker hub, include it as part of CI.
Running sudo dduper --device /dev/nvme0n1p5 --dir / --recurse
fails due to
Dedupe completed for /core:/app/docker/kafka1/data/__confluent.support.metrics-0/00000000000000000000.log
Summary
blk_size : 4KB chunksize : 128KB
/core has 28 chunks
/app/docker/kafka1/data/__confluent.support.metrics-0/00000000000000000000.log has 1 chunks
Matched chunks: 0
Unmatched chunks: 1
Total size(KB) deduped: 0
Traceback (most recent call last):
File "/usr/sbin/dduper", line 594, in <module>
main(results)
File "/usr/sbin/dduper", line 465, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/sbin/dduper", line 456, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/usr/sbin/dduper", line 410, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/usr/sbin/dduper", line 225, in do_dedupe
assert len(out2) != 0
AssertionError
Adding --verbose
provides no additional information. dmesg
contains
[ 1361.934038] btrfs.static[9668]: segfault at ffffffffb7bd9228 ip 00000000005228d4 sp 00007fff7e417848 error 5 in btrfs.static[401000+189000]
[ 1361.934043] Code: 0e 88 0f c3 c5 fa 6f 06 c5 fa 6f 4c 16 f0 c5 fa 7f 07 c5 fa 7f 4c 17 f0 c3 48 8b 4c 16 f8 48 8b 36 48 89 4c 17 f8 48 89 37 c3 <8b> 4c 16 fc 8b 36 89 4c 17 fc 89 37 c3 0f b7 4c 16 fe 0f b7 36 66
experienced with v0.04-9-g78155b6 on Ubuntu 20.04 with Linux 5.8.0-43-generic
Resolve the following:
$ pycodestyle dduper
dduper:67:5: E265 block comment should start with '# '
dduper:67:80: E501 line too long (88 > 79 characters)
dduper:147:80: E501 line too long (89 > 79 characters)
dduper:154:1: E302 expected 2 blank lines, found 1
dduper:155:5: E266 too many leading '#' for block comment
dduper:159:22: E262 inline comment should start with '# '
dduper:162:23: E262 inline comment should start with '# '
dduper:240:80: E501 line too long (132 > 79 characters)
dduper:258:80: E501 line too long (96 > 79 characters)
dduper:292:5: E303 too many blank lines (2)
dduper:293:1: E101 indentation contains mixed spaces and tabs
dduper:293:1: W191 indentation contains tabs
dduper:294:1: W191 indentation contains tabs
dduper:294:2: E101 indentation contains mixed spaces and tabs
dduper:295:1: W191 indentation contains tabs
dduper:295:2: E101 indentation contains mixed spaces and tabs
dduper:296:1: W191 indentation contains tabs
dduper:297:1: W191 indentation contains tabs
dduper:297:2: E101 indentation contains mixed spaces and tabs
dduper:298:1: W191 indentation contains tabs
dduper:299:1: W191 indentation contains tabs
dduper:299:2: E101 indentation contains mixed spaces and tabs
dduper:300:1: W191 indentation contains tabs
dduper:301:1: W191 indentation contains tabs
dduper:301:2: E101 indentation contains mixed spaces and tabs
dduper:302:1: W191 indentation contains tabs
dduper:303:1: W191 indentation contains tabs
dduper:303:2: E101 indentation contains mixed spaces and tabs
dduper:304:1: W191 indentation contains tabs
dduper:304:2: E101 indentation contains mixed spaces and tabs
dduper:306:1: E101 indentation contains mixed spaces and tabs
dduper:314:80: E501 line too long (88 > 79 characters)
dduper:319:5: E303 too many blank lines (2)
dduper:321:12: E111 indentation is not a multiple of four
dduper:322:14: W291 trailing whitespace
dduper:323:12: E111 indentation is not a multiple of four
dduper:324:40: E231 missing whitespace after ','
dduper:326:66: E228 missing whitespace around modulo operator
dduper:326:76: E231 missing whitespace after ','
dduper:326:80: E501 line too long (86 > 79 characters)
dduper:369:1: E101 indentation contains mixed spaces and tabs
dduper:369:1: W191 indentation contains tabs
dduper:370:1: E101 indentation contains mixed spaces and tabs
dduper:373:80: E501 line too long (112 > 79 characters)
dduper:376:13: E265 block comment should start with '# '
dduper:377:1: E101 indentation contains mixed spaces and tabs
dduper:377:1: W191 indentation contains tabs
dduper:377:3: E265 block comment should start with '# '
dduper:377:80: E501 line too long (82 > 79 characters)
dduper:378:1: E101 indentation contains mixed spaces and tabs
dduper:394:30: W291 trailing whitespace
dduper:506:80: E501 line too long (88 > 79 characters)
dduper:509:5: E303 too many blank lines (2)
dduper:520:20: W291 trailing whitespace
dduper:527:22: E231 missing whitespace after ','
dduper:527:26: E231 missing whitespace after ','
dduper:527:31: E231 missing whitespace after ','
dduper:527:36: E231 missing whitespace after ','
dduper:527:41: E231 missing whitespace after ','
dduper:531:35: E261 at least two spaces before inline comment
dduper:531:36: E262 inline comment should start with '# '
dduper:532:14: E231 missing whitespace after ','
dduper:539:1: E101 indentation contains mixed spaces and tabs
dduper:539:1: W191 indentation contains tabs
dduper:540:1: E101 indentation contains mixed spaces and tabs
dduper:542:16: E111 indentation is not a multiple of four
dduper:543:16: E111 indentation is not a multiple of four
dduper:544:5: E101 indentation contains mixed spaces and tabs
dduper:544:5: W191 indentation contains tabs
dduper:544:13: E111 indentation is not a multiple of four
dduper:544:14: E231 missing whitespace after ','
dduper:545:16: E111 indentation is not a multiple of four
dduper:546:16: E111 indentation is not a multiple of four
dduper:547:16: E111 indentation is not a multiple of four
dduper:548:16: E111 indentation is not a multiple of four
dduper:549:2: E101 indentation contains mixed spaces and tabs
dduper:549:2: W191 indentation contains tabs
dduper:550:80: E501 line too long (98 > 79 characters)
dduper:553:27: E225 missing whitespace around operator
dduper:553:49: E225 missing whitespace around operator
dduper:553:60: E703 statement ends with a semicolon
cmds/inspect-dump-csum.c: In function ‘btrfs_lookup_extent’:
cmds/inspect-dump-csum.c:166:53: error: ‘struct btrfs_fs_info’ has no member named ‘csum_root’; did you mean ‘fs_root’?
166 | u16 csum_size = btrfs_super_csum_size(info->csum_root->fs_info->super_copy);
...
My BTRFS setup is RAID10, so I don't have a single device tied to my BTRFS array. I wasn't sure what I am supposed to pass in this case. Do I just pass in any device that is part of the array for the file/folder I am trying to dedupe?
For example, if I want to dedupe everything in '/mnt/ddimages', and that is part of a BTRFS pool that is comprised of 4 disks:
/dev/sda
/dev/sdb
/dev/sdc
/dev/sdd
Would I use the command:
dduper --device /dev/sda --dir /mnt/ddimages
Will that work as expected? Or will it somehow only dedupe the files that BTRFS has stored on /dev/sda? Do I then need to run the command 4 times, once for each device?
I guess my question is why do we need to specify --device at all? Isn't that something that can be determined based on the mount that holds the file/folders specified?
I run a site.
When the user uploads a file, the site needs to tell, instantly, if it is a duplicate of an existing file, since the user won't stay on the upload page to wait for a full HDD scan.
When the site admin deletes a file (say it turned out that the file has illegal content) the administrator wishes to delete every copy of that file. However, he knows if he only deletes the path, there might be other duplicated path pointing to the same blocks.
Basically what I ask for is something like
$ dduper --file new_upload.mp4 --device /dev/sda3
which returns immediately whether or not a duplicate of new_upload.mp4 exists in /dev/sda3. Bonues if the new_upload.mp4 doesn't have to be inside /dev/sda3.
Thanks!
Linux ≥ 5.5 and btrfs-progs ≥ 5.4 finally bring support for checksum algorithms that are stronger than CRC32C. xxHash, SHA256, and BLAKE2 are supported with kernel+btrfs-progs newer than these.
Right dduper has minimal test script to check basic functionality See ci/gitlab/*.sh
. Enhance it add RAID tests.
[lizelive@fedora ~]$ sudo podman run --rm -it --device /dev/sda2 --privileged -v /var/home/lizelive/.local/share/containers/storage/:/mnt docker.io/laks/dduper dduper --device /dev/sda2 --dir /mnt --analyze --recurse
Traceback (most recent call last):
File "/usr/sbin/dduper", line 575, in <module>
main(results)
File "/usr/sbin/dduper", line 465, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/sbin/dduper", line 456, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/usr/sbin/dduper", line 410, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/usr/sbin/dduper", line 224, in do_dedupe
assert len(out1) != 0
AssertionError
what is the correct way to dedupe?
related to #48
I installed dduper on my Ubuntu 21.04 system with a BTRFS file system called mounted as /data in /dev/sda1. I'm trying to play around with it on a single directory, but I keep getting permission denied errors.
dduper -p /dev/sda1 --dir /data/G --recurse --dry-run
Tells me ..
ERROR: cannot open '/dev/sda1': Permission denied
unable to open /dev/sda1
Adding sudo in front of dduper doesn't work either.
Any ideas?
Hello,
thanks for dduper!
I have run over a directory recursively:
dduper --device /dev/sda1 --dir /srv/dev-disk-by-label-DataPool1/Video/ -r --dry-run
Perfect match : /srv/dev-disk-by-label-DataPool1/Video/plugin.video.vdr.recordings_0.2.4.zip /srv/dev-disk-by-label-DataPool1/Video/VDR/unsortiert/Topspione_der_Geschichte/2016-11-04.20.13.23-0.rec/00055.ts
Summary
blk_size : 4KB chunksize : 128KB
/srv/dev-disk-by-label-DataPool1/Video/plugin.video.vdr.recordings_0.2.4.zip has 0 chunks
/srv/dev-disk-by-label-DataPool1/Video/VDR/unsortiert/Topspione_der_Geschichte/2016-11-04.20.13.23-0.rec/00055.ts has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) available for dedupe: 0
Perfect match : /srv/dev-disk-by-label-DataPool1/Video/plugin.video.vdr.recordings_0.2.4.zip /srv/dev-disk-by-label-DataPool1/Video/VDR/unsortiert/Topspione_der_Geschichte/2016-11-04.20.13.23-0.rec/00039.ts
Summary
blk_size : 4KB chunksize : 128KB
/srv/dev-disk-by-label-DataPool1/Video/plugin.video.vdr.recordings_0.2.4.zip has 0 chunks
/srv/dev-disk-by-label-DataPool1/Video/VDR/unsortiert/Topspione_der_Geschichte/2016-11-04.20.13.23-0.rec/00039.ts has 0 chunks
What I find odd is, that the plugin.video.vdr.recordings_0.2.4.zip seems to match every single ts file (https://fileinfo.com/extension/ts).
I can imagine that every ts file must contain a certain bit-pattern in it... But that to be in a zip file as well?
Greetings,
Hendrik
when recursive into a directory of 6.2GB with 767 files in it, I thought the insane fast one will:
Since csum is already computed, this shouldn't take more than a minute in a modern computer. Instead, the process has been running 30 minutes now and the result already showing 2020 non-matching results.
Looks like the code spends a lot of time on sqlite lookups for various things, perhaps sqlite indexes could speed things up a bit even in normal mode?
Using dduper-git or dduper-bin on Arch, I'm running into the following error. I'm not sure what other informaction to give, but if you need more, let me know. I'm sure I just missed a setup step, or something similar.
Traceback (most recent call last):
File "/usr/bin/dduper", line 576, in <module>
main(results)
File "/usr/bin/dduper", line 466, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/bin/dduper", line 457, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/usr/bin/dduper", line 411, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/usr/bin/dduper", line 225, in do_dedupe
assert len(out1) != 0
AssertionError
It seems like the file system has to be mounted in order for dduper to work. Is it safe to use the file system while dduper is running? What do you mean by "offline"?
I'm trying to run dduper from Docker with
sudo docker run -it --device /dev/sdf -v /media/data:/media/data laks/dduper dduper --device /dev/sdf --dir /media/data/media/Ixus --recurse --analyze
/media/data
is the mountpoint for /dev/sdf
with btrfs filesystem.
The output for all files is more or less like:
[Analyzing] /media/data/media/Ixus/103___12/IMG_0431.JPG:/media/data/media/Ixus/144___05/IMG_3098.JPG bad tree block 21676032, bytenr mismatch, want=21676032, have=0
ERROR: cannot read chunk root
unable to open /dev/sdf
bad tree block 21676032, bytenr mismatch, want=21676032, have=0
ERROR: cannot read chunk root
unable to open /dev/sdf
Perfect match : /media/data/media/Ixus/103___12/IMG_0431.JPG /media/data/media/Ixus/144___05/IMG_3099.JPG
The volume is fine and healthy. All files can be read.
I assume there's something going wrong with accessing /dev/sdf from within the container.
Any ideas?
Add simple .travisci.yml to run pycodestyle
and other basic sanity test.
Hi trying to run your amazing tool on a synology with BTRFS using your docker as described in install.md
but i'm seeing errors
/dev/mapper/cachedev_0 is the dev that synology mounts the BTRFS(checked with the mount command)
maybe is because synology have NVME caching
parent transid verify failed on 7939913383936 wanted 7154693 found 7154700
parent transid verify failed on 7939913383936 wanted 7154693 found 7154700
parent transid verify failed on 7939913383936 wanted 7154693 found 7154700
Ignoring transid failure
leaf parent key incorrect 7939913383936
ERROR: failed to read block groups: Operation not permitted
unable to open /dev/mapper/cachedev_0
(venv) root@box:~/dduper# ./dduper --device /dev/mapper/something --dir /mnt/something
Traceback (most recent call last):
File "./dduper", line 28, in <module>
from beautifultable import BeautifulTable
File "/root/dev/dduper/venv/lib/python2.7/site-packages/beautifultable/__init__.py", line 5, in <module>
from .beautifultable import ( # noqa F401
File "/root/dev/dduper/venv/lib/python2.7/site-packages/beautifultable/beautifultable.py", line 34, in <module>
from . import enums
File "/root/dev/dduper/venv/lib/python2.7/site-packages/beautifultable/enums.py", line 2, in <module>
import enum
ImportError: No module named enum
pip install enum34
fixed the issue on my end
It's a nasty hurdle to have to build btrfs-progs yourself.
Multiple files are supported with --files
, but only one directory using --dir
. This is quite limiting when deduplicating multiple directories.
Thank you for creating and maintaining dduper
.
I noticed that the --perfect_match_only
option was merged in #54. I think it will be beneficial if this option is explained in for example the https://github.com/Lakshmipathi/dduper/blob/master/README.md file.
When starting dduper on docker from openmediavault I get the following error when accessing any file:
user@host:~$ sudo docker run -it --device /dev/sdf -v /media/data:/media/data -u root laks/dduper bash
root@0dad41b09a40:/dduper# dduper --device /dev/sdf --dir /media/data/media/Ixus/ -r -a
bad tree block 912588800, bytenr mismatch, want=912588800, have=0
Couldn't read tree root
unable to open /dev/sdf
...
root@0dad41b09a40:/dduper# ls -la /media/data/media/Ixus/
total 0
d---r-x--- 1 root root 1112 Dec 1 08:10 .
drwxr-xr-x 1 root root 428 Dec 4 14:53 ..
d---r-x--- 1 root root 3330 Dec 1 07:48 103___12
...
I assume it's related to user rights or some wrong parameters. Any idea?
In case you missed it, btrfs-progs 5.13 added commands to dump csums.
kdave/btrfs-progs@9f6c055
The do_btrfs_dump_csum fails if BTRFS' inspect-internal dump-csum command is not implemented (and this is still not in the main BTRFS implementation)...
It would be good, if it fails to have a fall-back method to calculate this.
Backed up an old Windows disk onto a BTRFS backed network share. Now dduper
throws an exception on one of the filenames.
ls
gives the filename as:
'Finland.J'$'\344''rvenp'$'\344\344''-Elisa.xml'
Traceback (most recent call last):
File "/usr/sbin/dduper", line 535, in <module>
main(results)
File "/usr/sbin/dduper", line 426, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/sbin/dduper", line 409, in dedupe_dir
if validate_file(fn) is True:
File "/usr/sbin/dduper", line 399, in validate_file
file size < 4kb ")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 146: surrogates not allowed
Using the docker image.
sudo docker run -it --device /dev/sdc -v /media/backup/:/mnt laks/dduper dduper --device /dev/sda1 --dir /mnt --analyze --recurse
The UnicodeEncodeError reported by @plattrap in issue #15 has been marked "resolved" long ago, but I've just started using dduper yesterday (pre-built binary 0.04) and already get that:
# dduper --device /dev/sdd2 --recurse --dir /btrfstestdir
File "/usr/sbin/dduper", line 734, in <module>
main(results)
File "/usr/sbin/dduper", line 595, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/sbin/dduper", line 571, in dedupe_dir
populate_records(file_list)
File "/usr/sbin/dduper", line 549, in populate_records
btrfs_dump_csum(fn)
File "/usr/sbin/dduper", line 269, in btrfs_dump_csum
out, ret = check_btrfs_file_exists(filename)
File "/usr/sbin/dduper", line 252, in check_btrfs_file_exists
cursor.execute("SELECT * FROM filehash WHERE filename = ?",(filename,))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 58-59: surrogates not allowed
(I'm not sure if it's the same core issue as in #15 , so I file this one separately. Please feel free to move it there and/or mark as duplicate.)
What License are you publishing this under?
Or, rather, it applies, but compilation fails with an error about too many arguments to open_ctree_fs_info
.
Update TESTS.md with results from different RAID setups like raid0,raid1,raid5,raid10.
$ dduper --device /dev/mapper/vg-root --dir /nix/store/*llvm*-lib --recurse
[...]
Dedupe completed for /nix/store/1xcwdxx002a70ml4h1k0byciidbsnx2n-llvm-8.0.1-lib/lib/libLLVM-8.so:/nix/store/hpa2wxp7cjxgb5bn44wnhb4aig65s1kg-llvm-8.0.1-lib/lib/libLLVM.so
Summary
blk_size : 4KB chunksize : 128KB
/nix/store/1xcwdxx002a70ml4h1k0byciidbsnx2n-llvm-8.0.1-lib/lib/libLLVM-8.so has 646 chunks
/nix/store/hpa2wxp7cjxgb5bn44wnhb4aig65s1kg-llvm-8.0.1-lib/lib/libLLVM.so has 643 chunks
Matched chunks: 1
Unmatched chunks: 642
Total size(KB) deduped: 128
************************
error([Errno 22] Invalid argument)
Traceback (most recent call last):
File "/nix/store/ldmj09d7pfyircf1j34m8rhpy0qxlj2l-dduper-v0.04/bin//dduper", line 594, in <module>
main(results)
File "/nix/store/ldmj09d7pfyircf1j34m8rhpy0qxlj2l-dduper-v0.04/bin//dduper", line 465, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/nix/store/ldmj09d7pfyircf1j34m8rhpy0qxlj2l-dduper-v0.04/bin//dduper", line 456, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/nix/store/ldmj09d7pfyircf1j34m8rhpy0qxlj2l-dduper-v0.04/bin//dduper", line 410, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/nix/store/ldmj09d7pfyircf1j34m8rhpy0qxlj2l-dduper-v0.04/bin//dduper", line 281, in do_dedupe
bytes_deduped,status = ioctl_fideduperange(src_fd, s)
TypeError: cannot unpack non-iterable NoneType object
strace -o /mnt/logs.txt dduper --device /dev/mapper/cachedev_0 --dir /mnt/Exchange --analyze --recurse
Traceback (most recent call last):
File "/usr/sbin/dduper", line 575, in
main(results)
File "/usr/sbin/dduper", line 465, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/sbin/dduper", line 456, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/usr/sbin/dduper", line 410, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/usr/sbin/dduper", line 224, in do_dedupe
assert len(out1) != 0
AssertionError
Strace log attached.
Running code from today's repo Gen-15-2021 (dduper 0.04)
btrfs_lookup_csums search failed.icrosoft.MicrosoftOfficeHub_8wekyb3d8bbwe/AC/Microsoft/CLR_v4.0/ngen.log
Error: btrfs_lookup_csumextent buffer leak: start 87928078336 len 16384
extent buffer leak: start 55155425280 len 16384
extent buffer leak: start 305545216 len 16384
Traceback (most recent call last):
File "/usr/local/bin/dduper", line 575, in <module>
main(results)
File "/usr/local/bin/dduper", line 465, in main
dedupe_dir(results.dir_path, results.dry_run, results.recurse)
File "/usr/local/bin/dduper", line 456, in dedupe_dir
dedupe_files(file_list, dry_run)
File "/usr/local/bin/dduper", line 410, in dedupe_files
ret = do_dedupe(src_file, dst_file, dry_run)
File "/usr/local/bin/dduper", line 225, in do_dedupe
assert len(out2) != 0
AssertionError
It ran half a day analyzing a path then crashed. Command line is
dduper --device /dev/sdq --dir /mypath/backup/ --analyze --recurse
Not sure which repo to report/ask this in, sorry.
I've tried the prebuilt btrfs.static and kdave/btrfs-progs.git#v5.6.1 with 0001-Print-csum-for-a-given-file-on-stdout.patch
built from source. I'm pretty sure I have CRC32 csums (mount says Btrfs loaded, crc32c=crc32c-intel
), but btrfs inspect-internal dump-csum
just pauses and exits (code 0) without printing anything. No kernel/syslog messages occur while dump-csum is running. I've tried several files and all three devices in the set.
Any ideas as to how I diagnose this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.