Git Product home page Git Product logo

Comments (14)

hiliev avatar hiliev commented on July 28, 2024

The sector size is hardcoded in the call to RaidzDevice._map_alloc() in zio.py. _map_alloc() converts a contiguous pool block into a set of sector-aligned blocks on each disk. This should be the only place where the sector size plays any role and replacing 9 with 12 in the call should do the trick.

The value of unit_shift (= ashift) should really be obtained from the value of ashift in the vdev_tree list in the pool label. I'm currently too busy to implement it, but PRs are welcome.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

I'm pushing my work-in-progress changes to https://github.com/eiselekd/py-zfs-rescue, which quite messy right now as for I'm trying to recover from digital death. I'll create a clean patch when I have recovered. The main changes so far are: UBArray also uses ashift 12. _map_alloc uses ashift 12 and also needs asize from dva as input. I ripped the lz4 from the zfs src and added it as a python dll in zfs/python-lz4-0.7.0%2Bdfsg/src/python-lz4zfs.c .
I have come so far that I can see my target dataset:

[+]  0 uncompressed bytes
[+] Dataset 42
[+]  dnode [DSL dataset] 1B 1L/16384 blkptr[0]=empty bonus[320]=[ds_dir_obj=39 ds_prev_snap_obj=18 ds_prev_snap_txg=1 ds_prev_next_obj=0 ds_snapnames_zapobj=43 ds_num_children=0 ds_creation_time=1493841428 ds_creation_txg=25 ds_deadlist_obj=44 ds_used_bytes=2078356136912 ds_compressed_bytes=2044477967872 ds_uncompressed_bytes=2389135736320 ds_unique_bytes=2078356136912 ds_fsid_guid=30636525258139123 ds_guid=7520248390229983303 ds_restoring=4 ds_bp=<[L0 DMU objset] 800L/800P DVA[0]=<0:2d000008000:2200> DVA[1]=<0:3f000000000:2200> DVA[2]=<0:0:200> birth=108373 fletcher4 off LE contiguous fill=7460273>]
[+]  creation timestamp 1493841428
[+]  creation txg 25
[+]  2389135736320 uncompressed bytes
[+] Dataset 21

However recovery currently still fails when ddss.analyse() is called on dataset 42:
The program tries to fetch a master_dnode with dnode_id 1 and traverses in some kind of tree,
for blockid 0. blocktree level is 7 (quite much?).

ObjectSet._dnodes_per_block == 32
BlockTree._level == 7
BlockTree._blocks_per_level == 128

the indices on traversal are [0,0,0,0,0,0].
However when traversing down the tree, level 0-3 go ok, however when
reaching level 4 the data seems corrupt and decompress stops working.
Trying to figure out what is wrong.

Is there maybe a change in structure of the blocktree in zfslinux or are there
features that I need to take in consideration that could cause the failure?

dnode = DNode(data=block_data[dnid*512:(dnid+1)*512])
in ObjectSet also seems to have the dependency on ashift. There
is a ObjectSet._dnodes_per_block calculation involved. Also the
BlockTree._blocks_per_level that create the traversal indexes depends on
the sector size loaded...

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

In my experience, object sets always have 7-level block pointer trees. This is logical as the number of objects in the set can grow very large and there should be enough growth potential for the dnode list. If the block tree breaks at a certain level, the block at the corresponding level may be corrupt. Try adding dva=1 to the DataSet constructor here. This will read from the second copy of the object set. This is another feature missing from the code - if reading a block fails, it should retry the other DVAs in the bptr if any.

512 in the call to DNode is not the sector size but the size of the dnode structure. This is also why computing ObjectSet._dnodes_per_block involves division by 512. I should make that a constant to prevent confusion. BlockTree._blocks_per_level do not depend on the sector size directly. The value is simply the size of the block that holds the root of the block tree as specified in its bptr divided by the size of the bptr structure.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

I'm testing it on a testpool that I created via
zpool create datapool -f -o ashift=12 -O -O compression=lz4 -O normalization=formD raidz /dev/loop0 /dev/loop1 /dev/loop2 and there the traversal also fails.

I open dataset 54 in the testpool:

dsid=54
ddss = Dataset(pool_dev, datasets[dsid], dva=1) 
ddss.analyse()
...
master_dnode = self[1] 
...
z = zap_factory(self._vdev, master_dnode)
...
self._rootdir_id = z["ROOT"]
rootdir_dnode = self[self._rootdir_id]

self._rootdir_id is 34. self._dnodes_per_block is 32
and the indices become: [0, 0, 0, 0, 1] (the testpool has level 6). The tree is already
populated, the bp that ObjectSet._get_dnode() gets from self._blocktree[..] lookup
is ok and can be read via block_data = self._vdev.read_block(bp, dva=dva). The decompress of blockdata doesnt result in no error, the logical size is 16384 and the decompress of block_data results in 16384 bytes. But then there is this code afterwards:

               block_data = self._vdev.read_block(bp, dva=dva)
        ...
        dnid = dnode_id % self._dnodes_per_block
        dnode = DNode(data=block_data[dnid*512:(dnid+1)*512])

dnode_id is 34. self._dnodes_per_block is 32 so dnid gets 2. However the
sliced "dnode" at offset 2 of block_data seems corrupt:

DNode: [ZFS directory] 1B 1L/131072 blkptr[0]=<[L0 ZFS directory] 40000L/7c0200P DVA[0]=<16778017:
 14523f29ab00a00000:200/grid=28> DVA[1]=<134912:a1a9fe00:20200200/grid=5> DVA[2]=<0:0:200> 
 birth=4 invalid unk_143 LE contiguous fill=0> bonus[168]

After that it fails...

The decompressed blockdata begins with:

00000000: 2d11 0103 0000 000b 0100 0000 0000 0000  -...............
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 003e 2103 0001 0061 80d1 b49b e407  ...>!....a......
00000050: 0b00 0f02 001b 1923 2f00 8952 4547 4953  .......#/..REGIS
00000060: 5452 5915 000f 0200 0a19 241e 0079 4c41  TRY.......$..yLA
00000070: ff01 0082 8f00 2d80 594f 5554 5314 000f  ......-.YOUTS...
00000080: 0200 ff47 5000 0000 0000 0000 0000 0000  ...GP...........
00000090: 0400 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000200: 1611 0103 0000 000b 0100 0000 0000 0000  ................
00000210: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000220: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000230: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000240: 0000 0017 2103 0001 0041 8063 3748 0900  ....!....A.c7H..
00000250: 0f02 00ff d950 0000 0000 0000 0000 0000  .....P..........
00000260: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000270: ff01 0034 8f00 1680 0000 0000 0000 0000  ...4............
00000280: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000290: 0400 0000 0000 0000 0000 0000 0000 0000  ................
000002a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000300: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000310: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000330: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000340: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000350: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000360: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000370: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000380: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000390: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000003f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000400: 1411 0101 2c00 000b 0100 a800 0000 0000  ....,...........
00000410: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000420: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000430: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000440: 0000 001c 2103 0001 0050 80d5 941f 290a  ....!....P....).
00000450: 0010 1005 000f 0200 ffd4 5000 0000 0000  ..........P.....
00000460: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000470: ff01 003e 8f00 1480 0000 0000 0000 0000  ...>............
00000480: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000490: 0400 0000 0000 0000 0000 0000 0000 0000  ................
000004a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000004b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000004c0: 5a50 2f00 0204 1800 ed41 0000 0000 0000  ZP/......A......
000004d0: 0200 0000 0000 0000 0400 0000 0000 0000  ................
000004e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000004f0: 2200 0000 0000 0000 4401 0000 0804 0000  ".......D.......
00000500: 2274 b55a 0000 0000 86ce 5d1b 0000 0000  "t.Z......].....
00000510: 2274 b55a 0000 0000 86ce 5d1b 0000 0000  "t.Z......].....
00000520: 2274 b55a 0000 0000 86ce 5d1b 0000 0000  "t.Z......].....
00000530: 2274 b55a 0000 0000 86ce 5d1b 0000 0000  "t.Z......].....
00000540: 0200 0000 0000 0000 0300 0000 0000 0000  ................
00000550: 0000 0010 bf01 1e00 0000 4020 a900 1200  ..........@ ....
00000560: 0000 0040 a900 1200 0000 0000 0000 0000  ...@............
00000570: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000580: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000590: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000005f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000600: 2e11 0103 0000 000b 0300 0000 0000 0000  ................
00000610: 0000 0000 0000 0000 a02a 0000 0000 0000  .........*......
00000620: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000630: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000640: 1000 0000 0000 0000 1000 0000 0000 0000  ................
00000650: 1000 0000 0000 0000 1000 0200 0000 0000  ................
00000660: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000670: 0200 0200 0207 2e80 0000 0000 0000 0000  ................
00000680: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000690: 0400 0000 0000 0000 0100 0000 0000 0000  ................
000006a0: 8ca7 705d 1100 0000 cf86 b1d8 890d 0000  ..p]............
000006b0: a274 769d eabe 0600 d234 a5d4 206f 8e02  .tv......4.. o..
000006c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000006d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000006e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000006f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000700: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000710: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000720: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000730: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000740: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000750: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000770: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000790: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000007f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000800: 2f11 0103 0000 000b 2000 0000 0000 0000  /....... .......
00000810: 0100 0000 0000 0000 4055 0000 0000 0000  ........@U......
00000820: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000830: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000840: 1000 0000 0000 0000 0000 0000 0000 0000  ................
00000850: 1000 0000 0000 0000 0000 0200 0000 0000  ................
00000860: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000870: 1f00 0700 0f07 2f80 0000 0000 0000 0000  ....../.........
00000880: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000890: 0400 0000 0000 0000 0100 0000 0000 0000  ................
000008a0: a1d3 0324 1200 0000 b3ad 6248 4247 0000  ...$......bHBG..
000008b0: cbe4 7631 6c1a 8c00 4688 387d ecb8 d4b7  ..v1l...F.8}....
000008c0: 1000 0000 0000 0000 2000 0000 0000 0000  ........ .......
000008d0: 1000 0000 0000 0000 2000 0200 0000 0000  ........ .......
000008e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000008f0: 1f00 0700 0f07 2f80 0000 0000 0000 0000  ....../.........
00000900: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000910: 0400 0000 0000 0000 0100 0000 0000 0000  ................
00000920: 315e 1254 7400 0000 a148 f3b4 c515 0100  1^.Tt....H......
00000930: d93b 6055 77c3 7a01 8ad8 d89c db2c e978  .;`Uw.z......,.x
00000940: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000950: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000960: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000970: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000980: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000990: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a00: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a10: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a20: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a30: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a40: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a50: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a60: 0000 0000 0000 0000 0000 0000 0000 0000  ................
....

Docoded it is:

0:[SA master node] 1B 1L/131072 blkptr[0]=<[L0 SA master node] 40000L/1040200P DVA[0]=<16778017:fc93769a300c20000:200/grid=62> DVA[1]=<588847872:a6928e8aa512005e00:1e001800/grid=2> DVA[2]=<134912:8298f2003c48321400:b2a4aa00/grid=21> birth=4 invalid unk_143 LE contiguous fill=0> blkptr[1]=empty blkptr[2]=empty
1:[ZFS delete queue] 1B 1L/131072 blkptr[0]=<[L0 ZFS delete queue] 40000L/680200P DVA[0]=<16778017:12906ec700820000:200/grid=23> DVA[1]=<20697:0:42000/grid=255> DVA[2]=<0:0:200> birth=4 invalid unk_143 LE contiguous fill=0> blkptr[1]=empty blkptr[2]=empty
2:[ZFS directory] 1B 1L/131072 blkptr[0]=<[L0 ZFS directory] 40000L/7c0200P DVA[0]=<16778017:14523f29ab00a00000:200/grid=28> DVA[1]=<134912:a1a9fe00:20200200/grid=5> DVA[2]=<0:0:200> birth=4 invalid unk_143 LE contiguous fill=0> bonus[168]
3:[SA attr registration] 1B 1L/131072 blkptr[0]=<[L0 SA attr registration] 600L/600P DVA[0]=<0:2000:2200> DVA[1]=<0:4002000:2200> DVA[2]=<0:0:200> birth=4 fletcher4 off LE contiguous fill=1> blkptr[1]=empty blkptr[2]=empty
4:[SA attr layouts] 2B 1L/131072 blkptr[0]=empty blkptr[1]=<[L0 SA attr layouts] 4000L/1000P DVA[0]=<0:4000:2200> DVA[1]=<0:4004000:2200> DVA[2]=<0:0:200> birth=4 fletcher4 lz4 LE contiguous fill=1> blkptr[2]=empty
5:<unallocated dnode>
6:<unallocated dnode>
7:<unallocated dnode>

Index 2 seems to have a different format or it is overwritten by junk, however it is the testpool that I just created from scratch... Maybe the dnode layout in zfslinux is different?

I guess I need to instrument the zfs kernel module to see what the real driver does
when it traverses the blocktree of the testpool ...

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

Here is a zdb output of the dataset traversed above (dataset 54):

Dataset datapool [ZPL], ID 54, cr_txg 1, 128K, 6 objects, rootbp DVA[0]=<0:18000:2000> DVA[1]=<0:4018000:2000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=800L/800P birth=4L/4P fill=6 cksum=984d0cf61:95d650da66c:61f58dc380bbf:32b72cbd7096a4e

    Deadlist: 0 (0/0 comp)

    mintxg 0 -> obj 57: object 57, 0 blkptrs, 0


    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         0    6   128K    16K  74.5K     512    32K    9.38  DMU dnode (K=inherit) (Z=inherit)
	dnode flags: USED_BYTES 
	dnode m> axblkid: 1
Indirect blocks:
               0 L5      0:14000:2000 0:4014000:2000 20000L/1000P F=6 B=4/4
               0  L4     0:12000:2000 0:4012000:2000 20000L/1000P F=6 B=4/4
               0   L3    0:10000:2000 0:4010000:2000 20000L/1000P F=6 B=4/4
               0    L2   0:e000:2000 0:400e000:2000 20000L/1000P F=6 B=4/4
               0     L1  0:c000:2000 0:400c000:2000 20000L/1000P F=6 B=4/4
               0      L0 0:8000:2000 0:4008000:2000 4000L/1000P F=1 B=4/4
            4000      L0 0:a000:2000 0:400a000:2000 4000L/1000P F=5 B=4/4

		segment [0000000000000200, 0000000000000400) size   512
		segment [0000000000004000, 0000000000004a00) size 2.50K

And objid 34 that py-zfs-rescue is traverse via the indirection tree:

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        34    1   128K    512      0     512    512  100.00  ZFS directory (K=inherit) (Z=inherit)
                                               168   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	path	/
	uid     0
	gid     0
	atime	Fri Mar 23 22:39:46 2018
	mtime	Fri Mar 23 22:39:46 2018
	ctime	Fri Mar 23 22:39:46 2018
	crtime	Fri Mar 23 22:39:46 2018
	gen	4
	mode	40755
	size	2
	parent	34
	links	2
	pflags	40800000144
	microzap: 512 bytes, 0 entries

Indirect blocks:
               0 L0 EMBEDDED et=0 200L/20P B=4

I verified that py-zfs-rescue traverses:

 0:14000:2000
 0:12000:2000
 0:10000:2000
 0:e000:2000
 0:c000:2000
 0:a000:2000

And the data seems to be right... the path is at 0x4c0 "/".

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

The root directory appears to be a Micro ZAP embedded in the dnode itself. This is not supported by py-zfs-rescue. Only embedded files are supported. I knew it was possible to have embedded micro ZAPs, but none of the datasets within the failed pool I worked with while developing the code had such a directory dnode. The zap_factory in zap.py should be modified to extract the embedded data from the dnode (if blkptr[0].fill == 0) and use it to parse a MicroZap.

Extracting the embedded data is tricky. It is not contiguous and overlaps with the DVA section of the block pointer array, which is also why the content of blkptr[0] is junk. I can't find the ZFS documentation right now and would rather advise you to look at the source code of the Linux kernel to see which parts of the dnode structure contain embedded data.

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

It is also the case that the dnode structure has changed in later ZFS versions and some fields have different meaning. The current code is based on version 10, which is pretty close (but not exactly similar) to the original ZFS specification. I used this draft.

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

One more thing: the bonus type for this directory dnode is 0x2c, which is not understood by py-zfs-rescue. It's content doesn't match the layout of the znode_phys_t defined in the draft. I looked into the hexdump and created by hand the following representation:

00000400:
14               dn_type
11               dn_indblkshift
01               dn_nlevels
01               dn_nblkptr
2c               dn_bonustype
00               dn_checksum
00               dn_compress
0b               dn_pad
01 00            dn_datablkszsec
a8 00            dn_bonuslen
00 00 00 00      dn_pad2
00000410:
0000000000000000 dn_maxblkid
0000000000000000 dn_secphys
00000420:
0000000000000000 dn_pad3[0]
0000000000000000 dn_pad3[1]
00000430:
0000000000000000 dn_pad3[2]
0000000000000000 dn_pad3[3]

00000440: dn_blkptr[0]
                                         True Meaning vs.   blkptr_t
0000 001c 2103 0001 0050 80d5 941f 290a  payload            DVA[0]
00000450:
0010 1005 000f 0200 ffd4 5000 0000 0000  payload            DVA[1]
00000460:
0000 0000 0000 0000 0000 0000 0000 0000  payload            DVA[2]
00000470:
ff01 003e 8f00 1480
                                         byte order = 1     endian = 1
                                         dedup = 0
                                         encryption = 0
                                                 lvl = 0
                                                 type = 0x14
                                         etype = DATA       chksum = 0
                                         embedded = 1
                                         comp = LZ4         comp = invalid (0x8f)
                                         psize = 63         psize = 15872
                                         lsize = 511        lsize = 511
0000 0000 0000 0000                      payload            padding
00000480:
0000 0000 0000 0000                      payload            padding
0000 0000 0000 0000                      payload            padding
00000490:
0400 0000 0000 0000                              birth_txg
0000 0000 0000 0000                      payload            fill count
000004a0:
0000 0000 0000 0000                      payload            checksum[0]
0000 0000 0000 0000                      payload            checksum[1]
000004b0:
0000 0000 0000 0000                      payload            checksum[2]
0000 0000 0000 0000                      payload            checksum[3]

000004c0: dn_bonus[], 0xa8 bytes, type 0x2c
                                         True Meaning  vs.  znode_phys_t
5a50 2f00 0204 1800                      sa_hdr_phys_t      zp_atime[0]
ed41 0000 0000 0000                      file mode          zp_atime[1]
000004d0:
0200 0000 0000 0000                      size or links      zp_mtime[0]
0400 0000 0000 0000                      gen                zp_mtime[1]
000004e0:
0000 0000 0000 0000                      uid                zp_ctime[0]
0000 0000 0000 0000                      gid                zp_ctime[1]
000004f0:
2200 0000 0000 0000                      parent             zp_crtime[0]
4401 0000 0804 0000                      pflags             zp_crtime[1]
00000500:
2274 b55a 0000 0000 86ce 5d1b 0000 0000  atime              zp_gen
00000510:
2274 b55a 0000 0000 86ce 5d1b 0000 0000  mtime              zp_mode
00000520:
2274 b55a 0000 0000 86ce 5d1b 0000 0000  ctime              zp_size
00000530:
2274 b55a 0000 0000 86ce 5d1b 0000 0000  crtime             zp_parent
00000540:
0200 0000 0000 0000                      ??                 zp_links
0300 0000 0000 0000                      ??                 zp_xattr
00000550:
0000 0010 bf01 1e00                      ??                 zp_rdev
0000 4020 a900 1200                      ??                 zp_flags
00000560:
0000 0040 a900 1200                      ??                 zp_uid

00000568: 0000 0000 0000 0000
00000570: 0000 0000 0000 0000 0000 0000 0000 0000
00000580: 0000 0000 0000 0000 0000 0000 0000 0000
00000590: 0000 0000 0000 0000 0000 0000 0000 0000
000005a0: 0000 0000 0000 0000 0000 0000 0000 0000
000005b0: 0000 0000 0000 0000 0000 0000 0000 0000
000005c0: 0000 0000 0000 0000 0000 0000 0000 0000
000005d0: 0000 0000 0000 0000 0000 0000 0000 0000
000005e0: 0000 0000 0000 0000 0000 0000 0000 0000
000005f0: 0000 0000 0000 0000 0000 0000 0000 0000

Wherever there are two columns with field meanings, the left one is the true one (derived from the output of zdb and the kernel source), while the right one is how py-zfs-rescue interprets it. Most notable are two things:

  • Embedded data is stored in the block pointer itself and can be compressed (as is the case here using LZ4). Old ZFS versions have no idea of data embedding at all. Only GenericDevice.read_block() needs to be changed in order to universally support embedded data. The rest should just work.
  • The bonus data structure is of type 0x2c and differs significantly from the Znode struture in the specs. A new BonusData class has to be added to dnode.py that properly handles that bonus type.

Note: You don't have to hack your own LZ4 decompressor. Just use python-lz4, skip the first 4 bytes of the compressed data, and pass the original size stored in lsize of the blkptr:

>>> d = b'\x00\x00\x00\x1c!\x03\x00\x01\x00P\x80\xd5\x94\x1f)\n\x00\x10\x10\x05\x00\x0f\x02\x00\xff\xd4P\x00\x00\x00\x00\x00'
>>> import lz4
>>> lz4.decompress(d[4:], 512)
b'\x03\x00\x00\x00\x00\x00\x00\x80\xd5\x94\x1f)\x00\x00\x00\x00\x10\x00\x00...'

Note that for embedded blkptr_t both lsize and psize are stored biased and the actual value is 1+value in blkptr. See the format here.

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

I would like to separate the two issues at hand here. You seem to have fixed the first one with the hard-coded ashift - please clean up the code and create a PR that refers and closes this issue. If possible, please decouple it from the support for LZ4.

The second one concerns the support for block pointers with embedded data and the new physical object structure (bonus data type 0x2c). As you can see, manually decoding the dnode and decompressing the embedded data results in an empty Micro ZAP (recognisable by the 0x8000000000000003 marker at the begging of the decompressed data) that the ZAP factory should be able to handle. I'll create a separate issue for that and let's continue the discussion there.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

I'll create a PR, first have to re-fork to get rid of the noise in my repo.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

I pushed the ashift patch. The lz4 decompress is a bit more complicated.
I think that lz4 used by zfs is not compatible with the one installed when doing
a install of package python-lz4 in ubuntu (which is github.com/steeve/python-lz4).
First it expects a "LE dest-size" header while zfs uses a "BE src-size" header, second even
if I supply the "dest-size" header it complains about corrupt input data.
I'll try to write the lz4 c code in python. I think that is simplest. Hang on it will take a while and
I will submit another PR.

from py-zfs-rescue.

hiliev avatar hiliev commented on July 28, 2024

Perhaps you missed the comment in my previous post. You already have the decompressed data size - it is in the lsize field of the block pointer. You only need to remove the first four bytes of the compressed data.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

I saw you comment, however there are maybe several python-lz4 implementations. The one that comes with apt-get install python-lz4 dosnt have the lz4.decompress(d[4:], 512) : see https://github.com/steeve/python-lz4/blob/8ac9cf9df8fb8d51f40a3065fa538f8df1c8a62a/src/python-lz4.c#L96
But even if I prepare the expected header then it failes with "Input corrupt error...". So I'll implement it by hand. It will take a while.

from py-zfs-rescue.

eiselekd avatar eiselekd commented on July 28, 2024

Also added a PR for the python only lz4 decompress.

from py-zfs-rescue.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.