Git Product home page Git Product logo

ppkrauss / sizedbigint Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.29 MB

Sized Natural is an alternatve representation for bit strings, using native BigInt as Natural. It is useful to store and interchange hashes, labels and hierarchical indexes.

Home Page: https://ppkrauss.github.io/SizedBigInt

JavaScript 42.02% SQLPL 0.96% PLpgSQL 11.65% HTML 38.79% CSS 6.59%
hierarchical-indexes natural-numbers quaternary hexadecimal

sizedbigint's Introduction

Sized Naturals

Sized Naturals are like Natural numbers, but for hierarchical labeling, where 0 and 00 are distinct entities.

Brief presentation

(summarizing the contents of the PDF - "Sized Naturals as foundation for hierarchical labelingand extend hexadecimals for any bit string size").

Sometimes we need natural numbers, but a kind of number where 0 is not equal to 00.

Sized BigInt's are arbitrary-precision integers (BigInt) with defined number of bits, to represent hashes, labels, encodings, hierarchical indexes or any other that need to differenciate 0 and 00, preserving all other numeric interpretations, like order (002>001) and the freedom to translate its positional notation to some especific radix (e.g. binary to quaternary or hexadecimal).

Basic examples

The following examples can be mathematically described as a finite sets of pseudo-numeric representations. Limiting examples in 8 bits:

  • Samples of base2 representations: X1 = {0, 1}   X2 = {0, 00, 01, 1, 10, 11}   X3 = ...
    X8 = {0, 00, 000, 000,..., 00000000, 00000001, ..., 11111111}.

  • The same set X8 without some (non-compatible) items, expressed in quaternary (base4):
    Y8= {0, 00, 000, 0000, 0001, 0002, 0003, 001, 0010, 0011, ..., 3333}.

  • Ordering. The order in ordinary mathematical sets is arbitrary, but to group or list elements we can adopt some order. The main ordering options for typical SizedBigInts are the lexicographic order, to enhance "same prefix" grouping or hierarchy; and the pseudo-numeric order. Using the bit-length as default criterium and numeric order as optional.

Here a set of elements illustrated with different representations, listed by lexicographic order of the binary representation:

                    Representation   
    (size,value)   BitString                   Base4
    (1,0)	    0                        ?
    (2,0)	    00                       0
    (3,0)	    000                      ?
    (4,0)	    0000                     00
    (5,0)	    00000                    ?
    (6,0)	    000000                   000
    (7,0)	    0000000                  ?
    (8,0)	    00000000                 0000
    (8,1)	    00000001                 0001
    (7,1)	    0000001                  ?
    (8,2)	    00000010                 0002
    (8,3)	    00000011                 0003
    (6,1)	    000001                   001
    (7,2)	    0000010                  ?
    (8,4)	    00000100                 0010
    (8,5)	    00000101                 0011
    ...            ...                      ...

Formal definition

Each SizedBigInt is an element of a set. The formal definition of this set is the mathematical reference-concept for implementations.

As showed in the table above, we can represent elements of a set X as ordered pairs, (l,n) of bit‑length l and numeric value n, a Natural number. Supposing a minimum bit-length function, minBL(), the set Xk is a SizedBigInt set constrained by k, the maximum number of bits:

where

Representations

Natural numbers can be expressed with positional notation, using the rule of "remove leading zeros". The rule is used in any base (radix) representation.

The SizedBigInt's are like BigInt's without the rule of remove leading zeros, and the SizedBigInt must be the same in any base representation. This last condiction is a problem: as we see in the table, there are no base4 representation for 0, because each digit in base4 need 2 bits.

How to convert base2 one-digit numbers 0 and 1 to base4?

The solution proposed here is to use a fake (optional) final digit that represent these values. To avoid cofusion with hexadecimal letters we can start with G to represent 0 and H to represent 1. It was named "hierarchical digit" (hDigit) because the ordinary base4 digits use two bits. The resulted notation is base4h. For base16 e neeed more hDigits, but the logic is the same.

Base4h

Listing some bit strings and its base4h representations.

    (size,value)    BitString                   Base4h
    (1,0)	    0                        G
    (2,0)	    00                       0
    (3,0)	    000                      0G
    (4,0)	    0000                     00
    (5,0)	    00000                    00G
    (6,0)	    000000                   000
    (7,0)	    0000000                  000G
    (8,0)	    00000000                 0000
    (8,1)	    00000001                 0001
    (7,1)	    0000001                  000H
    (8,2)	    00000010                 0002
    (8,3)	    00000011                 0003
    (6,1)	    000001                   001
    (7,2)	    0000010                  001G
    (8,4)	    00000100                 0010
    (8,5)	    00000101                 0011
    ...             ...                      ...
    (7,127)         1111111                  333H
    (8,254)         11111110                 3332
    (8,255)         11111111                 3333

Base4h numbers are strings with usual base4 pattern and the halfDigit as optional suffix. This syntax rule can be expressed by a regular expression:

/^([0123]*)([GH])?$/

To translate from binary, only values with odd number of bits will be translate the last bit as halfDigit. The complete translation table, from binary to base4 representations, is:

{ "0":"G", "1":"H", "00":"0", "01":"1", "10":"2", "11":"3" }

Base16h

Extending the hexadecimal representation, in a similar way to the previous one used for base4h: the last digit as a fake-digit that can represent all these incompatible values — so using the halphDigit values G and H for 1-bit values, and including more values for 2 bits (4 values) and 3 bits (8 values). The total is 2+4+8=14 values, they can be represented by the letters G to T.

The name of this new representation is Base16h, because it is the ordinary Base16 "plus an optional hDigit", and is used to represent hierarchical bit strings. Its string pattern is:

/^([0-9a-f]*)([G-T])?$/

   TABLE-3

(size,value)  	BitString     Base16h
(1,0)		0       	G
(2,0)		00      	I
(3,0)		000     	M
(4,0)		0000    	0
(5,0)		00000   	0G
(6,0)		000000  	0I
(7,0)		0000000 	0M
(8,0)		00000000	00
(8,1)		00000001	01
(7,1)		0000001 	0N
(8,2)		00000010	02
(8,3)		00000011	03
(6,1)		000001  	0J
...
(6,63) 		111111  	fL
(7,126)		1111110 	fS
(8,252)		11111100	fc
(8,253)		11111101	fd
(7,127)		1111111 	fT
(8,254)		11111110	fe
(8,255)		11111111	ff

To translate from a binary string with b bits, there are b % 4 last bits to be translated as special digits. Splitting the value as binary prefix (part[0]) and suffix (part[1] with 1, 2 or 3 last bits),

let part = strbin.match(/^((?:[01]{4,4})*)([01]*)$/)

the prefix will be translated to usual hexadecimal number, and the suffix, when exists, translated by this complete "bits to haslDigit" map:

{
 "0":"G","1":"H",
 "00":"I","01":"J","10":"K","11":"L",
 "000":"M","001":"N","010":"O","011":"P","100":"Q","101":"R","110":"S","111":"T"
}

   The textual contents, images and datasets of this git repository are dedicated to the public domain.
  

Algorithms and software source-code licensed as Apache-2.0.

sizedbigint's People

Contributors

ppkrauss avatar

Watchers

 avatar

sizedbigint's Issues

Translation and complements for SQL and PL/pgSQL

PostgreSQL offers native bit string datatype, the bit varying, and int8 (also called "bigint", not be confused with arbitrary-pecision Javascript's BigInt), that are the main tools and casts for SizedNaturals when using with indexes (geocodes, hashes, etc.) with maximum of 62 bits — there is a loss by signal and "hidden bit strategy".

For PostgreSQL the arbitrary-precision number is the datatype numeric, that is the real analog of Javascript's BigInt, and can be used for implementations focusing on generic SizedNaturals.

replace bigint_log2() by optimezed function

Change naive implementation, that is there only for asserts,

  static bigint_log2(n) { 
     const C1 = BigInt(1)
     const C2 = BigInt(2)
     for(var count=0, n=n; n>C1; count++)  n = n/C2
     return count
  }

By an optimized one... Perhaps, adaptating this,

  static bigint_log2_pe(value, base) {
    // ret.p=quotient, ret.e=log_base(value)
    // example: integerLogarithm(256n, 2n) returns { p: 256n, e: 8 }
    const C0 = BigInt(0)
    const C1 = BigInt(1)
    const C2 = BigInt(2)
    if (base  <= value) {
        var tmp = SizedBigInt.bigint_log2_pe(value, base*base);
        var p = tmp.p;
        var e = tmp.e;
        var t = p*base;
        return (t <= value) ? { p: t, e: e * C2 + C1 } : { p: p, e: e * C2 };
    }
    return { p: C1, e: C0 };
}

See also how to use WebAssembler, to use
log-base-2-instruction as here or here...

change Base16h alphabet and minor corrections

The complementar alphabet of Base8h and Base16h can be optimized excluding vowels (I,O,U) and symbols that may be easily confused with each other (I and W), and excluing X because is used as hexadecimal convertion prefix (e.g. x012).

Other minor tasks:

  • introduce base2h as distinct from base2, because only base2h accepts 0 is different tham of 00.
    • Control base2 to transform any alternative padding zeros to canonic zero.
  • rename: change useHDigit flag to isHier. It setted for any hierarchical base, even without hDigit, like base2h.
  • rename base32pt to base32nvu.
  • enhance kx_trConfig() method: include all logic of cache generation for if (r.isHierar); correct r.regex_b2; check (r.base==2); check (r.base>64); when isHierar check r.alphabet.length<(r.base*2-2).
  • remove SizedBigInt.kx_tr['4h-to-2'], etc. initializations, not need it, is by demand.

Move demo and benchmarks to example folder of src_js

The src_js folder can need focus on class source-code, move all to src_js/examples. Check also all other old issues that can be closed, and README's.

Other issues:

  • add integer division
  • release as v1.0.0 (!)

Change software licence from CC0 to Apache2

Stop being CC0 or MIT to be Apache2. Justifications:

  • CC0 or MIT?   Does it almost sound like asking "CC0 or CC-BY"? The answer is so much, but in general when we are unknown, CC-BY helps against opportunistic possession, since CC0 does not have to quote the source, it is forgotten and then resurges with small modifications as if 100% originated from another author ... When on the other hand the algorithm is popular or a larger company, there is no such risk because the "footprints" of the original author are not undone over time.

  • MIT or Apache?   As this article recalls, MIT and Apache2 are pretty much the same thing, but while MIT allows "hidden patents" creating legal uncertainty, the Apache license does not allow. This was evident, for example, in the controversy of July 2016, with ReactJS giving a gap to Facebook patents.


(português) Deixar de ser CC0 ou MIT para ser Apache2. Justificativas:

  • CC0 ou MIT?   Equivale praticamente a perguntar "CC0 ou CC-BY"? A resposta é tanto faz, mas em geral quando somos desconhecidos, o CC-BY ajuda contra a posse oportunista, visto que CC0 não obriga citar a fonte, ela fica esquecida e em seguida ressurge com pequenas modificações como se 100% originada de outro autor... Quando por outro lado o algoritmo é popular ou de uma empresa maior, não há esse risco pois as "pegadas" do autor original não são desfeitas com o tempo.

  • MIT ou Apache?   Conforme lembra este artigo, MIT e Apache2 são praticamente a mesma coisa, mas enquanto MIT permite "patentes escondidas" gerando insegurança jurídica, a licença Apache não permit. Isso ficou evidente, por exemplo, na controversia de julio de 2016, com o ReactJS dando brecha para patentes do Facebook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.