Has a reference implementation for erasure correction or error correction in codex32 s

Source code for error correction?,about blockstreamresearch/codex32

Comments (26)

apoelstra commented on August 24, 2024 1

Afraid not. Though if you want to correct 1 or 2 errors a simple way to approach it would be to just generate residues for every possible error pattern and then do a lookup table based on that.

My plan for this is to continue an ongoing project to rewrite rust-bech32 so that it will work with arbitrary BCH codes; then this library will depend on that for all the coding theory stuff, and only implement the Lagrange polynomial stuff. And we can implement error correction there, so we'll get it simultaneously for codex32, bech32, the descriptor checksum, the checksum used by Liquid/Elements addresses, etc.

from codex32.

apoelstra commented on August 24, 2024 1

I'm thinking there are no guarantees at all because Bech32 had a weakness to insertion/deletion errors.

Yeah, that's the right intuition. There are no mathematically guaranteed-detectable errors. But in practice for I'd expect that you usually could, and for one or two it'd be fairly cheap to try. The reason is that you know the expected length of your string (in the common case that you are using 128/256/512-bit seeds and not something in between, which I think we technically support but nobody would ever do), so you know how many insertions/deletions have happened, on net.

If you have a 47-character string but were expecting a 48 character string, you can just try inserting a blank in every possible position, running error correction (a variant where you can specify known error locations, which makes it able to succeed with up to 8 missing tiles rather than just 4), and seeing if it succeeds. With overwhelming probability, but not a guarantee, if you actually have a small random set of errors that include insertions/deletions, you'll find the corrected string, and only the corrected string.

Like with bech32, there are non-random "structured" errors that will cause error correction to yield a bad result. But like bech32m, and unlike bech32, these "structured" errors cannot be found by a single fat-finger.

Adding to this the rules for seeds and the rules for addresses are a bit different. With addresses if you are unsure at all, you should throw the address away and re-request it, because there is no harm in throwing it away and catastrophic harm (burned coins) if you send coins to a bad address. With seeds, during generation, the same story applies. But during recovery it's kinda the opposite...if you are uncertain, it's easy to check (just try to derive a known address that you received coins at before, or any address with a balance, you won't find one that's not yours), and if you throw your result away there may be nobody to "re-request it" from.

So in summary, error correction implementations:

During initial wallet setup, behave like bech32: you can highlight substitution errors, but don't correct anything.
During recovery, then do whatever you can, including grinding for insertions/deletions, to find the seed. Then show the user the set of corrections and make them re-enter the corrected string (in the hope that they'll make a new corrected backup) before using it.

Let me bug some people on IRC before making RFC-style SHOULD NOT/SHOULD/MUST/MUST NOT/MAY proclamations. We may want to update the BIP with this sort of text.

from codex32.

roconnor-blockstream commented on August 24, 2024 1

Some further comments I have: BIP-32 defines a seed as a "byte sequence S of a chosen length (between 128 and 512 bits; 256 bits is advised) from a (P)RNG.", that is a number of bytes between 16 and 64. Because of this, I wanted our standard to be able to support everything that BIP-32 supports. That said, (and maybe we should make this clear in our spec) I strongly advise only using 16, 32, and 64 byte secrets (and between those choices strongly preferring 8 with a robust source of entropy when doing any hand computations). To the point of perhaps suggesting that BIP-32 be amended to require using only 16, 32, or 64 byte secrets. I'm not aware of any system doing otherwise anyways.

This way, inserting and removing just a few characters is detectable. That said, the error correction procedure is only designed to explicitly correct substitution errors.

(Ninja edited)

from codex32.

roconnor-blockstream commented on August 24, 2024 1

Since the master seed is thrown away in many implementations, it might make more sense to hash the master private key instead when generating an identifier in a deterministic manner.

from codex32.

roconnor-blockstream commented on August 24, 2024 1

But yes, the constants in the syndrome need to be replaced with codex32 specific constants by someone who knows what the constants mean and how to compute them.

from codex32.

apoelstra commented on August 24, 2024 1

Nice! When we implement error correction in rust-bech32 we'll have some numbers to beat.

from codex32.

BenWestgate commented on August 24, 2024 1

Found two omissions in the BIP for error correction implementation recommendations:

We do not specify how an implementation should implement error correction. However, we recommend that:

If a string with 8 or fewer erasures can have those erasures filled in to make a valid codex32 string, then the implementation suggests such a string as a correction.
If a string consisting of valid bech32 characters in the proper case can be made valid by substituting 4 or fewer characters, then the implementation suggests such a string as a correction.

It is missing the burst erasure correction case:

If a string with 13 or fewer consecutive erasures can have those erasures filled in to make a valid codex32 string, then the implementation suggests such a string as a correction.

It is also missing the combination case:

If a string can be made valid by a combination of substituting characters, filling non-contiguous erasures and contiguous erasures, then the implementation suggests such a string as a correction.

floor((1 - (contiguous_erasures / 13 + non_contiguous_erasures / 8)) * 4) = remaining_error_correction_capacity

Correctable Combination Examples:

4 erasures, 2 substitution errors
11 contiguous erasures and 1 non-contiguous erasure
9 contiguous erasures and 2 non-contiguous erasures
6 contiguous erasures, 2 non-contiguous erasures, 1 substitution error
4 contiguous erasures, 1 non-contiguous erasure, 2 substitution errors
3 contiguous erasures, 3 substitution errors

from codex32.

apoelstra commented on August 24, 2024 1

Even with the backup in memory there's a little bit of ambiguity between insertion/deletion errors and substitutions, but this sounds like a great idea to me. Assuming you think there's a deletion error, I'd show the offending 4-group box is red and with only 3 characters in it. Simililarly if there's an insertion, show the box in red with all 5 characters in it. No need to flag any other boxes as red. (I think this is what you're describing.)

from codex32.

BenWestgate commented on August 24, 2024

The specs guarantee correction of up to 4 substitution errors. Is there any relationship to how many insertion or deletion errors can be safely corrected?

I'm thinking there are no guarantees at all because Bech32 had a weakness to insertion/deletion errors.

Bech32 has an unexpected sipa/bech32#51: whenever the final character is a 'p', inserting or deleting any number of 'q' characters immediately preceding it does not invalidate the checksum.

Does that mean error correction implementations:

SHOULD NOT attempt to correct any random insertions and deletions
SHOULD attempt to correct them but fewer, say 2 instead of 4
MAY attempt to correct 4 random subsitutions, insertions or deletions and rely on the user's confirmation of the corrected string to protect against accepting a wrong but valid codex32 string?

from codex32.

BenWestgate commented on August 24, 2024

With my naive implementation, I could only correct a single edit in real-time. 2 edits takes a few seconds to check every possible error pattern.

As it is common for my testers to make more than a single error during recovery, I wanted to try to modity this bech32_ecc.js because I remember it was much faster to find 2 error locations:
https://github.com/sipa/bech32/blob/master/ecc/javascript/bech32_ecc.js

But I don't know how to set some of the constants to be codex32 appropriate like:

var GF1024_EXP
var GF1024_LOG

and possibly some in
function syndrome (residue)
function locate_errors (residue, length)

Is this something that is too difficult for me to do in a day without advanced mathematical knowledge even if I know the codex32 GENERATOR and MS32_CONST?

from codex32.

BenWestgate commented on August 24, 2024

But during recovery it's kinda the opposite...if you are uncertain, it's easy to check (just try to derive a known address that you received coins at before, or any address with a balance, you won't find one that's not yours), and if you throw your result away there may be nobody to "re-request it" from.

Related to this, I set the identifier characters to hash160(seed)[:4] and the payload padding bits of the codex32 secret to double_sha256(seed)[0] (like truncating the WIF serialization) so have an additional 22/24-bits of error detection for 128/256-bit seeds.

This can't be hand calculated and breaks plausible deniability of a "decoy" share but improves the chances of detecting errors from wrong but valid strings, even without a known address.

Is this a good idea for electronic implementations of codex32 backup generators?

Asking the user to choose a 4 character bech32 identifier seemed like too much work for them.

from codex32.

apoelstra commented on August 24, 2024

Is this something that is too difficult for me to do in a day without advanced mathematical knowledge even if I know the codex32 GENERATOR and MS32_CONST?

I'm not sure. You can see the different constants all laid out here or by checking the BIP (I think they're in slightly different formats in the BIP vs that source code, so not sure which version you need). But where you'll run into trouble is that the codex32 checksum is 65 bits meaning that it won't fit into a single 64-bit word. Bech32 has a 30-bit checksum, so implementations have a habit of using 32-bit words to implement them ... and then a "just change the constants" conversion runs into trouble if you don't have any uint128_t type in your language.

But in principle it should work.

Related to this, I set the identifier characters to hash160(seed)[:4] and the payload padding bits of the codex32 secret to double_sha256(seed)[0] (like truncating the WIF serialization) so have an additional 22/24-bits of error detection for 128/256-bit seeds.

This is a pretty neat idea. Yeah, I think it's worth suggesting that people do this, if they are using an electronic means of generation and they don't otherwise want to override the name.

Asking the user to choose a 4 character bech32 identifier seemed like too much work for them.

This is fair, but it is important that they be able to distinguish different seeds, so that e.g. if they have a shares from multiple friends, they don't get them mixed up. Using psuedorandom characters might make that harder.

from codex32.

BenWestgate commented on August 24, 2024

Since the master seed is thrown away in many implementations, it might make more sense to hash the master private key instead when generating an identifier in a deterministic manner.

That's true, unless something like #55 gets done.

And if you know the master private key, what use is knowing which codex32 set corresponds to it? You can already spend... Unless trying to create a new codex32 backup for an existing master seed with a higher threshold, but the assumption usually made is codex32 backups are "set in stone" (perhaps literally) and may not be able to be destroyed.

A more useful option may be the master pubkey hash: [81fefeef/44'/0'/0'] so the recovery codes can be matched to their watch-only wallet as well. "To recover, find your recovery codes beginning with 'ms1_81fe'... and import them to your external signer" (i'd need to use bech32, this is a quick example)

Trying to make those 20-bits do as much work as possible with following options:

master seed hash - hash160(seed)
master extended private key hash - double_sha256(extended_private_masterkey)
master public key hash - hash160(sha256(public_masterkey))

The best choice also depends where plausible deniability, if any, should be: User selects, every level is deniable. Seed hash, every level is deniable except after a threshold of shares is compromised. Xprv hash deniable link until spending is compromised. Master pubkey hash no deniability between a watch-only wallet and its recovery shares.

Legacy Bitcoin Core wallets calling getwalletinfo display:
"hdseedid": "1b50a849a48ddabb9763bd12c11942233a2d21f2"
which is hash160(hdseed) and was my original deterministic identifier to keep it consistent.
Descriptor wallets seem to discard this if they even generate a seed at all. But calling listdescriptors true displays:
"desc": "...(xprv.../<purpose>'/0'/0'/.../*)#...",
which is the extended private masterkey in each descriptor but not in an identifier safe format, except the last 4-bytes of the xprv serialization which is the double SHA-256 checksum. So a base58 decoder is required instead of hex in option 1 and 3.
Alternatively, calling listdescriptors displays:
"desc": "...([f4617790/<purpose>'/0'/0']xpub.../.../*)#..."
which is the hash160(sha256(public_masterkey)[:4] prepended to each child extended public key, also known as the master public key fingerprint.

The hex to bech32 conversion should be preferred. There's pros and cons to 1 vs 3.

Pros to hash160(seed):

Keeps compromised descriptor wallets and k-1 compromised shares plausibly deniable from each other.
Detects errors from wrong but valid codex32 strings (malicious tampering of k-1 shares).

Cons:

Pseudo-random characters without any reference stored make shares easier to mix up
Requires entering master seed on a trusted device to detect wallet tampering
Can't detect errors from by hand computation of private or public masterkeys
Worse manual error correction as identifier is not stored in the watch-only descriptor

Pros to hash160(sha256(public_masterkey):

Allows watch-only wallets to give recovery instructions with a suggested identifier to look for.
Requires work to grind a valid identifier to tamper a 'private keys enabled' wallet
Detects errors from wrong but valid codex32 strings (malicious tampering of k-1 shares)
Detects errors in any future by hand computation of public or private master keys
Better error correction as a watch-only wallet can provide the correct identifier

Cons:

May not want a compromised share's identifier to reveal it likely belongs to a 1,000 BTC watch-only wallet or vice versa.
20-bits isn't much work to make an xprv's master pubkey fingerprint match the codex32 identifier
Requires entering xprv on a trusted device to derive the fingerprint to detect wallet tampering
If #55 implemented, only one wallet gets an intelligible connection to the identifier

from codex32.

roconnor-blockstream commented on August 24, 2024

And if you know the master private key, what use is knowing which codex32 set corresponds to it?

I was imagining the scenario: Hey I found these shares in my closet that I forgot about. What are they for again? Are they for my existing wallet I'm using or for something else?

from codex32.

apoelstra commented on August 24, 2024

This is a good list of pros and cons. I'd also add that if #55 were somehow implemented, we'd have a hand-computed RNG, and maybe using that in place of sha2 would be nicer. (Though it would pretty-much force us to use hash(seed) since public_masterkey can't be determined by hand.)

I'd also note that any of these choices achieve the "detect errors from wrong but valid codex32 strings" benefit, though all of them require recovering the seed to do the check.

Then as for what the best thing to recommend, my intuition is that the better UX outweighs the concerns about deniability, and that hash160(public_masterkey) is the better way to go. It lets wallets provide the most useful instructions to users without needing to access (or even being able to access) secret key data. Remember that even the "public masterkey" is still an xpub and never appears on-chain, so the privacy loss is somewhat limited. It's not like visitors can find your shares, see the ID, then go home and scan the blockchain to discover how many coins you (or your friends) have.

My feeling is that people who want deniability are free to make up their own key IDs, or to copy key IDs from other peoples' xpubs they somehow come across, or whatever they want to do.

from codex32.

BenWestgate commented on August 24, 2024

I was imagining the scenario: Hey I found these shares in my closet that I forgot about. What are they for again? Are they for my existing wallet I'm using or for something else?

With option 3 this question could be answered with the master pubkey fingerprint from your watch-only wallet.

Do you see appreciable value in keeping the association between shares and an existing wallet a secret without the pin or wallet passphrase?

Option 1 would not be helpful to this situation without a full seed restore & wallet import so add this to its list of cons.

Though it would pretty-much force us to use hash(seed) since public_masterkey can't be determined by hand.

A hand-computed hash(seed) identifier would be better if one is implemented as it offer's option 1's pro of detecting k-1 maliciously tampered yet valid shares without electronics.

Remember that even the "public masterkey" is still an xpub and never appears on-chain, so the privacy loss is somewhat limited. It's not like visitors can find your shares, see the ID, then go home and scan the blockchain to discover how many coins you (or your friends) have.

Right, it requires the compromise of the watch-only wallet AND shares to give thieves useful information. And they still could be misled as long as implementations allow creating codex32 backups with chosen identifiers. (mine does not but it's not hard to add.)

My feeling is that people who want deniability are free to make up their own key IDs, or to copy key IDs from other peoples' xpubs they somehow come across, or whatever they want to do.

I agree, the perfect deniability of choosing an identifier is better than the limited deniability of hash(seed) since it allows for decoy shares, seeds and wallets.

So there is no reason to use 1 over "choose an identifier" unless such a hand-computable hash is implemented. Then that could still be a special case of "choose an identifier" that people doing hand computations wanting to detecting malicious tampering of k-1 shares may choose.

Regarding plausible deniability between watch-only wallets and hash160(public_masterkey) identified shares, it is easy to put fake master pubkey fingerprints in descriptors, if desired, to break the relationship and you can plausibly claim you did this, even when you did not. While option 2 has no way to plausibly deny shares when their identifier in the wallet, compromised spending, is possible.

from codex32.

BenWestgate commented on August 24, 2024

But where you'll run into trouble is that the codex32 checksum is 65 bits meaning that it won't fit into a single 64-bit word. Bech32 has a 30-bit checksum, so implementations have a habit of using 32-bit words to implement them ... and then a "just change the constants" conversion runs into trouble if you don't have any uint128_t type in your language.

I converted bech32_ecc.js it to python and added the codex32 constants provided from lib.rs. It successfully locates the error positions in bech32 addresses! But returns {'error': 'Invalid checksum', 'data_pattern': None} for codex32 strings, valid or invalid, which means residue != 0: which means polymod(hrpExpand(hrp) + data) ^ const != 0 and I am using the proper codex32 CONST and GENERATOR.

I think the issue lies in the def syndrome(residue): function, I notice all the values it it is bitwise xoring with are smaller than the maximum value for the 30-bit checksum.

So I believe def syndrome(residue) needs to bitwise xor with longer 65-bit numbers when encoding == encodings["CODEX32"]: but where do I come up with these numbers?

But in principle it should work.

I'd like to get it to work. I'm highly motivated because it makes codex32 secret recovery faster and much more confidence inspiring.

delete codeblock for readability

from codex32.

BenWestgate commented on August 24, 2024

besides the syndrome(residue) function appearing to be 30-bit specific, I also need to use ms32_polymod in the codex32 case not the one in the code above as the bitshifts are 6 character / 30-bit specific

def ms32_polymod(values):
    GEN = [
        0x19dc500ce73fde210,
        0x1bfae00def77fe529,
        0x1fbd920fffe7bee52,
        0x1739640bdeee3fdad,
        0x07729a039cfc75f5a,
    ]
    residue = 0x23181b3
    for v in values:
        b = (residue >> 60)
        residue = (residue & 0x0fffffffffffffff) << 5 ^ v
        for i in range(5):
            residue ^= GEN[i] if ((b >> i) & 1) else 0
    return residue

from codex32.

roconnor-blockstream commented on August 24, 2024

residue = 0x23181b3

Be aware that in the Codex32 spec, we precomputed the hrp and folded it into that constant. If you are going to compute the hrp yourself you'd start the residue at 1 instead (and then after processing the hrp, you should end up at 0x23181b3).

from codex32.

BenWestgate commented on August 24, 2024

CHARSET = 'qpzry9x8gf2tvdw0s3jn54khce6mua7l'

GF1024_EXP = [
  1, 32, 311, 139, 206, 553, 934, 180, 537, 145, 910, 131, 462, 373, 652,
  927, 675, 840, 938, 308, 235, 958, 948, 756, 1007, 979, 356, 172, 281,
  124, 240, 222, 41, 23, 736, 367, 460, 309, 203, 649, 831, 570, 454, 117,
  464, 693, 392, 1010, 115, 272, 348, 667, 383, 972, 644, 671, 511, 610,
  129, 398, 818, 922, 515, 977, 292, 747, 15, 480, 386, 690, 360, 300,
  1003, 851, 202, 681, 520, 689, 264, 604, 630, 513, 913, 867, 1021, 403,
  146, 1006, 1011, 83, 39, 471, 597, 854, 106, 560, 134, 366, 492, 2, 64,
  583, 278, 412, 370, 620, 321, 315, 267, 572, 262, 924, 707, 56, 567, 102,
  944, 628, 577, 470, 629, 609, 225, 766, 687, 712, 344, 539, 209, 457,
  405, 82, 7, 224, 734, 920, 579, 406, 50, 887, 381, 908, 195, 905, 99,
  784, 749, 207, 521, 657, 63, 727, 696, 40, 55, 983, 484, 258, 796, 877,
  573, 294, 683, 584, 246, 30, 960, 772, 109, 720, 600, 758, 943, 404, 114,
  304, 107, 528, 433, 485, 290, 555, 998, 755, 783, 269, 764, 751, 143, 78,
  903, 419, 933, 212, 361, 268, 732, 984, 4, 128, 430, 517, 785, 717, 504,
  642, 607, 534, 369, 524, 561, 166, 89, 359, 204, 617, 481, 418, 901, 483,
  482, 450, 245, 126, 176, 665, 319, 395, 914, 771, 141, 14, 448, 181, 569,
  422, 773, 77, 999, 723, 568, 390, 562, 198, 809, 250, 414, 306, 43, 87,
  167, 121, 80, 71, 679, 968, 516, 817, 1018, 371, 588, 118, 432, 453, 21,
  672, 808, 218, 169, 441, 229, 638, 769, 205, 585, 214, 297, 843, 970,
  580, 374, 748, 239, 830, 538, 241, 254, 286, 156, 558, 838, 618, 385,
  722, 536, 177, 697, 8, 256, 860, 298, 811, 186, 985, 36, 439, 293, 715,
  312, 363, 332, 155, 718, 408, 498, 962, 836, 554, 966, 964, 900, 451,
  213, 329, 59, 599, 790, 557, 806, 282, 28, 896, 323, 379, 844, 810, 154,
  750, 175, 377, 780, 365, 396, 882, 477, 789, 589, 86, 135, 334, 219, 137,
  142, 110, 688, 296, 875, 765, 719, 440, 197, 841, 906, 3, 96, 880, 413,
  338, 859, 458, 501, 802, 410, 434, 389, 594, 950, 692, 424, 709, 248,
  478, 885, 317, 459, 469, 533, 273, 380, 940, 500, 770, 173, 313, 331,
  123, 16, 512, 945, 596, 886, 349, 699, 72, 839, 586, 182, 601, 726, 664,
  287, 188, 793, 973, 676, 936, 372, 684, 680, 552, 902, 387, 658, 95, 423,
  805, 378, 876, 541, 17, 544, 646, 735, 952, 884, 285, 252, 350, 731, 824,
  730, 792, 1005, 915, 803, 442, 133, 270, 668, 415, 274, 284, 220, 105,
  592, 1014, 243, 190, 857, 394, 946, 564, 6, 192, 1001, 787, 653, 959,
  916, 963, 868, 797, 845, 778, 429, 613, 97, 848, 170, 473, 917, 995, 595,
  918, 899, 291, 523, 721, 632, 961, 804, 346, 603, 662, 223, 9, 288, 619,
  417, 997, 659, 127, 144, 942, 436, 325, 443, 165, 57, 535, 337, 827, 698,
  104, 624, 705, 120, 112, 368, 556, 774, 45, 151, 846, 874, 733, 1016,
  307, 11, 352, 44, 183, 633, 993, 531, 465, 661, 191, 889, 189, 825, 762,
  559, 870, 861, 266, 540, 49, 791, 525, 529, 401, 210, 425, 741, 463, 341,
  955, 788, 621, 353, 12, 384, 754, 815, 58, 631, 545, 678, 1000, 819, 954,
  820, 858, 490, 194, 937, 340, 923, 547, 742, 431, 549, 550, 582, 310,
  171, 505, 674, 872, 669, 447, 37, 407, 18, 576, 502, 834, 746, 47, 215,
  265, 636, 833, 650, 863, 330, 91, 295, 651, 895, 125, 208, 489, 162, 217,
  201, 713, 376, 812, 90, 263, 956, 1012, 179, 761, 591, 22, 704, 88, 327,
  507, 738, 303, 907, 35, 343, 1019, 339, 891, 253, 382, 1004, 947, 532,
  305, 75, 807, 314, 299, 779, 397, 850, 234, 926, 643, 639, 801, 506, 706,
  24, 768, 237, 894, 93, 487, 354, 108, 752, 879, 637, 865, 957, 980, 388,
  626, 641, 575, 358, 236, 862, 362, 364, 428, 581, 342, 987, 100, 1008,
  51, 855, 74, 775, 13, 416, 965, 932, 244, 94, 391, 530, 497, 930, 52,
  951, 660, 159, 590, 54, 1015, 211, 393, 978, 324, 411, 402, 178, 729,
  888, 157, 526, 625, 737, 335, 251, 446, 5, 160, 153, 654, 991, 228, 606,
  566, 70, 647, 767, 655, 1023, 467, 725, 760, 623, 289, 587, 150, 878,
  605, 598, 822, 794, 941, 468, 565, 38, 503, 866, 989, 164, 25, 800, 474,
  1013, 147, 974, 708, 216, 233, 1022, 499, 994, 627, 673, 776, 493, 34,
  375, 716, 472, 949, 724, 728, 856, 426, 645, 703, 200, 745, 79, 935, 148,
  814, 26, 832, 682, 616, 449, 149, 782, 301, 971, 612, 65, 615, 33, 279,
  444, 69, 743, 399, 786, 685, 648, 799, 781, 333, 187, 1017, 275, 316,
  491, 226, 670, 479, 853, 10, 320, 283, 60, 695, 456, 437, 357, 140, 46,
  247, 62, 759, 911, 163, 249, 510, 578, 438, 261, 1020, 435, 421, 869,
  829, 634, 897, 355, 76, 967, 996, 691, 328, 27, 864, 925, 739, 271, 700,
  168, 409, 466, 757, 975, 740, 495, 98, 816, 986, 68, 711, 184, 921, 611,
  161, 185, 953, 852, 42, 119, 400, 242, 158, 622, 257, 892, 29, 928, 116,
  496, 898, 259, 828, 602, 694, 488, 130, 494, 66, 519, 849, 138, 238, 798,
  813, 122, 48, 823, 826, 666, 351, 763, 527, 593, 982, 452, 53, 919, 931,
  20, 640, 543, 81, 103, 912, 835, 714, 280, 92, 455, 85, 231, 574, 326,
  475, 981, 420, 837, 522, 753, 847, 842, 1002, 883, 509, 546, 710, 152,
  686, 744, 111, 656, 31, 992, 563, 230, 542, 113, 336, 795, 909, 227, 702,
  232, 990, 196, 873, 701, 136, 174, 345, 571, 486, 322, 347, 635, 929, 84,
  199, 777, 461, 277, 508, 514, 1009, 19, 608, 193, 969, 548, 518, 881,
  445, 101, 976, 260, 988, 132, 302, 939, 276, 476, 821, 890, 221, 73, 871,
  893, 61, 663, 255, 318, 427, 677, 904, 67, 551, 614
]

GF1024_LOG = [
  -1, 0, 99, 363, 198, 726, 462, 132, 297, 495, 825, 528, 561, 693, 231,
  66, 396, 429, 594, 990, 924, 264, 627, 33, 660, 759, 792, 858, 330, 891,
  165, 957, 1, 804, 775, 635, 304, 592, 754, 90, 153, 32, 883, 248, 530,
  521, 834, 599, 911, 547, 138, 689, 703, 921, 708, 154, 113, 508, 565,
  324, 828, 1013, 836, 150, 100, 802, 903, 1020, 874, 807, 734, 253, 403,
  1010, 691, 646, 853, 237, 189, 788, 252, 927, 131, 89, 982, 935, 347,
  249, 629, 212, 620, 607, 933, 664, 698, 423, 364, 476, 871, 144, 687,
  998, 115, 928, 513, 453, 94, 176, 667, 168, 353, 955, 517, 962, 174, 48,
  893, 43, 261, 884, 516, 251, 910, 395, 29, 611, 223, 501, 199, 58, 901,
  11, 1002, 446, 96, 348, 973, 351, 906, 3, 833, 230, 352, 188, 502, 9, 86,
  763, 790, 797, 745, 522, 952, 728, 336, 311, 288, 719, 887, 706, 727,
  879, 614, 839, 758, 507, 211, 250, 864, 268, 478, 586, 27, 392, 974, 338,
  224, 295, 716, 624, 7, 233, 406, 531, 876, 880, 302, 816, 411, 539, 457,
  537, 463, 992, 575, 142, 970, 360, 243, 983, 786, 616, 74, 38, 214, 273,
  4, 147, 612, 128, 552, 710, 193, 322, 275, 600, 766, 615, 267, 350, 452,
  1009, 31, 494, 133, 122, 821, 966, 731, 270, 960, 936, 968, 767, 653, 20,
  679, 662, 907, 282, 30, 285, 886, 456, 697, 222, 164, 835, 380, 840, 245,
  724, 436, 640, 286, 1015, 298, 889, 157, 896, 1000, 844, 110, 621, 78,
  601, 545, 108, 195, 185, 447, 862, 49, 387, 450, 818, 1005, 986, 102,
  805, 932, 28, 329, 827, 451, 435, 287, 410, 496, 743, 180, 485, 64, 306,
  161, 608, 355, 276, 300, 649, 71, 799, 1003, 633, 175, 645, 247, 527, 19,
  37, 585, 2, 308, 393, 648, 107, 819, 383, 1016, 226, 826, 106, 978, 332,
  713, 505, 938, 630, 857, 323, 606, 394, 310, 815, 349, 723, 963, 510,
  367, 638, 577, 556, 685, 636, 126, 975, 491, 979, 50, 401, 437, 915, 529,
  560, 666, 852, 26, 832, 678, 213, 70, 194, 681, 309, 682, 341, 97, 35,
  518, 208, 104, 259, 416, 13, 280, 776, 618, 339, 426, 333, 388, 140, 641,
  52, 562, 292, 68, 421, 674, 374, 241, 699, 46, 711, 459, 227, 342, 651,
  59, 809, 885, 551, 715, 85, 173, 130, 137, 593, 313, 865, 372, 714, 103,
  366, 246, 449, 694, 498, 217, 191, 941, 847, 235, 424, 378, 553, 783,
  1017, 683, 474, 200, 581, 262, 178, 373, 846, 504, 831, 843, 305, 359,
  269, 445, 506, 806, 997, 725, 591, 232, 796, 221, 321, 920, 263, 42, 934,
  830, 129, 369, 384, 36, 985, 12, 555, 44, 535, 866, 739, 752, 385, 119,
  91, 778, 479, 761, 939, 1006, 344, 381, 823, 67, 216, 220, 219, 156, 179,
  977, 665, 900, 613, 574, 820, 98, 774, 902, 870, 894, 701, 314, 769, 390,
  370, 596, 755, 204, 587, 658, 631, 987, 949, 841, 56, 397, 81, 988, 62,
  256, 201, 995, 904, 76, 148, 943, 486, 209, 549, 720, 917, 177, 550, 700,
  534, 644, 386, 207, 509, 294, 8, 284, 127, 546, 428, 961, 926, 430, 567,
  950, 579, 994, 582, 583, 1021, 419, 5, 317, 181, 519, 327, 289, 542, 95,
  210, 242, 959, 461, 753, 733, 114, 240, 234, 41, 976, 109, 160, 937, 677,
  595, 118, 842, 136, 279, 684, 584, 101, 163, 274, 405, 744, 260, 346,
  707, 626, 454, 918, 375, 482, 399, 92, 748, 325, 170, 407, 898, 492, 79,
  747, 732, 206, 991, 121, 57, 878, 801, 475, 1022, 803, 795, 215, 291,
  497, 105, 559, 888, 742, 514, 721, 675, 771, 117, 120, 80, 566, 488, 532,
  850, 980, 602, 670, 271, 656, 925, 676, 205, 655, 54, 784, 431, 735, 812,
  39, 604, 609, 14, 466, 729, 737, 956, 149, 422, 500, 705, 536, 493, 1014,
  409, 225, 914, 51, 448, 590, 822, 55, 265, 772, 588, 16, 414, 1018, 568,
  254, 418, 75, 794, 162, 417, 811, 953, 124, 354, 77, 69, 856, 377, 45,
  899, 829, 152, 296, 512, 402, 863, 972, 967, 785, 628, 515, 659, 112,
  765, 379, 951, 875, 125, 617, 931, 307, 777, 203, 312, 358, 169, 487,
  293, 239, 780, 740, 408, 151, 781, 717, 440, 438, 196, 525, 134, 432, 34,
  722, 632, 861, 869, 554, 580, 808, 954, 787, 598, 65, 281, 146, 337, 187,
  668, 944, 563, 183, 23, 867, 171, 837, 741, 625, 541, 916, 186, 357, 123,
  736, 661, 272, 391, 229, 167, 236, 520, 692, 773, 984, 473, 650, 340,
  814, 798, 184, 145, 202, 810, 465, 558, 345, 326, 548, 441, 412, 750,
  964, 158, 471, 908, 813, 760, 657, 371, 444, 490, 425, 328, 647, 266,
  244, 335, 301, 619, 909, 791, 564, 872, 257, 60, 570, 572, 1007, 749,
  912, 439, 540, 913, 511, 897, 849, 283, 40, 793, 603, 597, 930, 316, 942,
  290, 404, 17, 361, 946, 277, 334, 472, 523, 945, 477, 905, 652, 73, 882,
  824, 93, 690, 782, 458, 573, 368, 299, 544, 680, 605, 859, 671, 756, 83,
  470, 848, 543, 1011, 589, 971, 524, 356, 427, 159, 746, 669, 365, 996,
  343, 948, 434, 382, 400, 139, 718, 538, 1008, 639, 890, 1012, 663, 610,
  331, 851, 895, 484, 320, 218, 420, 190, 1019, 143, 362, 634, 141, 965,
  10, 838, 929, 82, 228, 443, 468, 480, 483, 922, 135, 877, 61, 578, 111,
  860, 654, 15, 892, 981, 702, 923, 696, 192, 6, 789, 415, 576, 18, 1004,
  389, 751, 503, 172, 116, 398, 460, 643, 22, 779, 376, 704, 433, 881, 571,
  557, 622, 672, 21, 467, 166, 489, 315, 469, 319, 695, 318, 854, 255, 993,
  278, 800, 53, 413, 764, 868, 999, 63, 712, 25, 673, 940, 919, 155, 197,
  303, 873, 686, 1001, 757, 969, 730, 958, 533, 770, 481, 855, 499, 182,
  238, 569, 464, 947, 72, 642, 442, 87, 24, 688, 989, 47, 88, 623, 762,
  455, 709, 526, 817, 258, 637, 845, 84, 768, 738
]

encodings = {
    "BECH32": "bech32",
    "BECH32M": "bech32m",
    "CODEX32": "codex32",
}


def get_encoding_const(encoding):
    global CHECKSUM_LENGTH
    global GENERATOR
    if encoding == encodings["BECH32"]:
        CHECKSUM_LENGTH = 6
        GENERATOR = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
        return 1
    elif encoding == encodings["BECH32M"]:
        CHECKSUM_LENGTH = 6
        GENERATOR = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
        return 0x2bc830a3
    elif encoding == encodings["CODEX32"]:
        CHECKSUM_LENGTH = 13
        GENERATOR = [
            0x19dc500ce73fde210,
            0x1bfae00def77fe529,
            0x1fbd920fffe7bee52,
            0x1739640bdeee3fdad,
            0x07729a039cfc75f5a,
        ]
        return 0x10ce0795c2fd1e62a
    else:
        return None

__all__ = ['check', 'encodings']


def syndrome(residue):
    low = residue & 0x1f
    return low ^ (low << 10) ^ (low << 20) \
        ^ (0x3d0195bd if (residue >> 5) & 1 else 0) \
        ^ (0x2a932b53 if (residue >> 6) & 1 else 0) \
        ^ (0x072653af if (residue >> 7) & 1 else 0) \
        ^ (0x0cdc067e if (residue >> 8) & 1 else 0) \
        ^ (0x19ac89f5 if (residue >> 9) & 1 else 0) \
        ^ (0x15922369 if (residue >> 10) & 1 else 0) \
        ^ (0x29b443f2 if (residue >> 11) & 1 else 0) \
        ^ (0x03f826ed if (residue >> 12) & 1 else 0) \
        ^ (0x0574c8fa if (residue >> 13) & 1 else 0) \
        ^ (0x087935dd if (residue >> 14) & 1 else 0) \
        ^ (0x2c6dcf4f if (residue >> 15) & 1 else 0) \
        ^ (0x0acfbfbe if (residue >> 16) & 1 else 0) \
        ^ (0x158bfa75 if (residue >> 17) & 1 else 0) \
        ^ (0x2993d5e3 if (residue >> 18) & 1 else 0) \
        ^ (0x03b70fc6 if (residue >> 19) & 1 else 0) \
        ^ (0x051e8fd6 if (residue >> 20) & 1 else 0) \
        ^ (0x08b99aa5 if (residue >> 21) & 1 else 0) \
        ^ (0x1167b06a if (residue >> 22) & 1 else 0) \
        ^ (0x205f60d4 if (residue >> 23) & 1 else 0) \
        ^ (0x12aae581 if (residue >> 24) & 1 else 0) \
        ^ (0x04296874 if (residue >> 25) & 1 else 0) \
        ^ (0x0846f4c1 if (residue >> 26) & 1 else 0) \
        ^ (0x108d4d82 if (residue >> 27) & 1 else 0) \
        ^ (0x210ebf04 if (residue >> 28) & 1 else 0) \
        ^ (0x1299fb28 if (residue >> 29) & 1 else 0)


def locate_errors(residue, length):
    if residue == 0:
        return []

    syn = syndrome(residue)
    s0 = syn & 0x3FF
    s1 = (syn >> 10) & 0x3FF
    s2 = syn >> 20
    l_s0 = GF1024_LOG[s0]
    l_s1 = GF1024_LOG[s1]
    l_s2 = GF1024_LOG[s2]

    if l_s0 != -1 and l_s1 != -1 and l_s2 != -1 and (2 * l_s1 - l_s2 - l_s0 + 2046) % 1023 == 0:
        p1 = (l_s1 - l_s0 + 1023) % 1023
        if p1 >= length:
            return []
        l_e1 = l_s0 + (1023 - 997) * p1
        if l_e1 % 33:
            return []
        return [p1]

    for p1 in range(length):
        s2_s1p1 = s2 ^ (0 if s1 == 0 else GF1024_EXP[(l_s1 + p1) % 1023])
        if s2_s1p1 == 0:
            continue
        s1_s0p1 = s1 ^ (0 if s0 == 0 else GF1024_EXP[(l_s0 + p1) % 1023])
        if s1_s0p1 == 0:
            continue
        l_s1_s0p1 = GF1024_LOG[s1_s0p1]
        p2 = (GF1024_LOG[s2_s1p1] - l_s1_s0p1 + 1023) % 1023
        if p2 >= length or p1 == p2:
            continue
        s1_s0p2 = s1 ^ (0 if s0 == 0 else GF1024_EXP[(l_s0 + p2) % 1023])
        if s1_s0p2 == 0:
            continue
        inv_p1_p2 = 1023 - GF1024_LOG[GF1024_EXP[p1] ^ GF1024_EXP[p2]]
        l_e2 = l_s1_s0p1 + inv_p1_p2 + (1023 - 997) * p2
        if l_e2 % 33:
            continue
        l_e1 = GF1024_LOG[s1_s0p2] + inv_p1_p2 + (1023 - 997) * p1
        if l_e1 % 33:
            continue
        if p1 < p2:
            return [p1, p2]
        else:
            return [p2, p1]
    return []


def polymod(values):
    chk = 1
    for v in values:
        top = chk >> 5*CHECKSUM_LENGTH-5
        poly_div = 0x1ffffff if CHECKSUM_LENGTH == 6 else 0x0fffffffffffffff
        chk = (chk & poly_div) << 5 ^ v
        for i in range(5):
            chk ^= GENERATOR[i] if ((top >> i) & 1) else 0
    return chk


def hrpExpand(hrp):
    ret = []
    for p in range(len(hrp)):
        ret.append(ord(hrp[p]) >> 5)
    ret.append(0)
    for p in range(len(hrp)):
        ret.append(ord(hrp[p]) & 31)
    return ret


def check(bechString, validHrp, encoding):
    if len(bechString) > 90:
        return {"error": "Too long", "pos": list(range(90, len(bechString)))}
    const = get_encoding_const(encoding)
    has_lower = False
    has_upper = False
    for p in range(len(bechString)):
        if ord(bechString[p]) < 33 or ord(bechString[p]) > 126:
            return {"error": "Invalid character", "pos": [p]}
        if 'a' <= bechString[p] <= 'z':
            has_lower = True
            if has_upper:
                return {"error": "Mixed case", "pos": [p]}
        if 'A' <= bechString[p] <= 'Z':
            has_upper = True
            if has_lower:
                return {"error": "Mixed case", "pos": [p]}

    bechString = bechString.lower()
    pos = bechString.rfind('1')
    if pos == -1:
        return {"error": "Missing separator '1'", "pos": None}
    if pos < 1 or pos + CHECKSUM_LENGTH >= len(bechString):
        return {"error": "Separator '1' at invalid position", "pos": [pos]}

    hrp = bechString[:pos]
    data = []
    for p in range(pos + 1, len(bechString)):
        d = CHARSET.find(bechString[p])
        if d == -1:
            return {"error": "Invalid character", "pos": [p]}
        data.append(d)

    if hrp not in validHrp:
        return {"error": "Unknown part before the separator '1'", "pos": list(range(0, len(hrp)))}

    residue = polymod(hrpExpand(hrp) + data) ^ const
    if residue != 0:
        epos = locate_errors(residue, len(data))
        if len(epos) == 0:
            return {"error": "Invalid checksum", "data_pattern": None}

        pattern = []
        for pos in range(len(data)):
            if len(data) - 1 - pos in epos:
                pattern.append(-1)
            else:
                pattern.append(data[pos])

        return {"error": "Invalid checksum", "data_pattern": pattern}

    return {"error": None, "hrp": hrp, "data": data[:-CHECKSUM_LENGTH]}

This is working to correct 2 substitution errors (and no more than 2) instantly in codex32 strings. But Only when they are in the hrp or checksum

from codex32.

BenWestgate commented on August 24, 2024

But yes, the constants in the syndrome need to be replaced with codex32 specific constants by someone who knows what the constants mean and how to compute them.

I have hunch the number of syndrome constants has something to how many 1's are in this polymod constant:
poly_div = 0x1ffffff if CHECKSUM_LENGTH == 6 else 0x0fffffffffffffff
There's 27 values being ^ to return syndrome(residue) for bech32 and 26 1's in 0x1ffffff.

So most likely I'll need 59 values for ms32_syndrome(residue) for codex32 as it's poly_div has 58 1's

from codex32.

BenWestgate commented on August 24, 2024

Here is a naive brute force error correction implementation that assumes only 48-character strings are valid and then corrects any distance 2 edits, and some distance 3, 4 and 5 edits when they involve multiple inserted characters. The corrections arrive in reasonable time, ranging between instant for single corrections, 1.5 seconds or 10 seconds for two corrections and max 26 seconds for a couple triple edit corrections on my i5-1135G7.

https://github.com/BenWestgate/Bails/blob/master/lib/python3.9/site-packages/codex32/naive_ms32_ecc.py

I optimized it to look for faster correcting and more probable errors before slower and less likely ones. And to speed up a bit when a valid_identifier is known, It might help to use multiple cores to get/check candidates faster but I'm new to that sort of programming.

One thing I didn't see mentioned earlier was correcting insertion errors is twice as fast as substitutions or deletions.

Correcting 5 insertion errors is faster than 2 deletion or 2 substitution. I did not see any invalid correction suggestions. The GUI that uses this library asks for confirmation, although I still must implement highlighting of the corrections during recovery.

from codex32.

BenWestgate commented on August 24, 2024

I should have used a lookup table for correcting substitutions and erasures (also deletions), as you suggested. It would spare the CPU some work and may perform better. Insertion errors still need brute force.

from codex32.

BenWestgate commented on August 24, 2024

During initial wallet setup, behave like bech32: you can highlight substitution errors, but don't correct anything.

In my implementation, I display the uppercase codex32 string with space every 4th character for legibility.

After display, the confirm written string entry dialog has 12 four character boxes horizontally. The typing flows box to box as if it were one field.

Currently, if there is a substitution error in a full box, the text in that box turns red, and if a full box contains the correct characters, it is disabled from editing.

One UX issue this causes is an early insertion or deletion error will cause every full box afterward to turn red due to the shift, which is non-intuitive because only the 1st red box has the typo.

Is there any issue with highlighting red the groups of 4 with insertion and deletion errors as well as substitutions and not highlighting error free but shifted boxes?

(Ofc, they won't lock until they are correct AND "typo-free")

I know the checksum can't guarantee detecting these errors but for the backup creation phase the codex32 string is still in memory so every kind of diff can be detected and highlighted to help them correct their handwriting faster.

from codex32.

BenWestgate commented on August 24, 2024

I don't have room for 5 monospace characters (or the huge amount of insertions someone could potentially make, in one box.)

If the user skips a character I can put a space, ?, or _ where their specific deletion(s) are detected, make the box red, that will keep the shift of remaining characters correct once difflib has enough data to tell the delete apart from a substitution and re-align them.

I think insertions will still have to shift the following characters. But it'll still be an improvement to highlight red the box(s) with the insertion(s) rather than every shifted box.

I can use orange for insertions and red for substitution and deletion errors (because I convert them into substitution errors as they type by filling in a marker character)

from codex32.

BenWestgate commented on August 24, 2024

The other option is automatically removing 1-2 insertions because the error correction in recovery can easily delete 5+ inserted characters.

If I highlight the individual insertions orange, the user can fix the string by selecting & deleting orange characters. Which doesn't need reference to the written copy so is unsafe. So whole groups of 4 with an insert need to be orange. But that's only 4 possible characters to try deleting.

So maybe I don't color distinguish between inserts and substitutions so the user always will need to check their paper to correct a red group.

It's true a blank in a delete location could be manually brute forced but not quicker than glancing at the paper copy.

Thanks for the idea to highlight deletes by placing less than 4 characters in the box.

I've had a dozen+ people alpha test a version which gave zero feedback as to error locations when confirming a "new codex32 backup".

There were few complaints because even without error location feedback it is still easier than bip39 to write and confirm.

But if I observe these testers they spend the most time correcting typos (or their handwriting) in this part so it's still the largest UI pain point.

I wish we could OCR their handwriting with the camera... I don't have the time or resources to implement it.

Then manually error correct that with highlighting would follow up if the camera read it wrong.
Turning the whole confirmation time per share from potentially minutes to seconds.

from codex32.

Source code for error correction? about codex32 HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent