Git Product home page Git Product logo

Comments (3)

simonratner avatar simonratner commented on August 15, 2024

Thanks, Jason.

The problem with your input is that it is a string containing a Unicode character. This behaviour is pretty much by design — base32k.encodeBytes works on bytes, not strings, by packing sequences of 15 bits from consecutive input bytes into a single UTF-16 character (hence the name, 215 = 32768).

Input to base32k.encodeBytes should be binary data, represented as a string of bytes only for convenience since javascript doesn't have a better binary data type. Alternatively, you can also represent your binary data as 4-byte integers and use base32k.encode instead. Unicode string input to base32k.encodeBytes is an error and will break it; I mention this briefly at the bottom of the README when talking about stringified JSON, but perhaps I should make it more explicit.

You can't save any space on storing Unicode strings using this method. If your input is Unicode, storing it as a plain javascript string is already as efficient as it will get, since there is no wasted space in the internal representation.

from base32k.

simonratner avatar simonratner commented on August 15, 2024

By the way, in the case where you may have small Unicode fragments within a larger ascii-only string and still want to save some storage space on the large portion, you can work around this by escaping all Unicode characters within your JSON input. The JSON parser will happily unescape them for you. For example:

json = '"\\u2013"'; // 8211 == 0x2013
decoded = base32k.decodeBytes(base32k.encodeBytes(json));
console.log(JSON.parse(decoded) === String.fromCharCode(8211));

The space saving will of course depend on the ratio of Unicode to non-Unicode characters, since you are now using 6 bytes to represent each Unicode character that would otherwise only occupy 2 bytes.

from base32k.

junosuarez avatar junosuarez commented on August 15, 2024

Cool, thanks for the explanation - this makes quite a bit more sense now. I verified this with

i = 0; while(a = String.fromCharCode(i), base32k.decodeBytes(base32k.encodeBytes(a)) === a) { i++ }

I'n using this to store JSON data in localStorage, combined with a compression algorithm which is returning a string. The trick in this case was to encode my initial JSON.stringified data using \uXXXX characters for anything over charcode 255.

from base32k.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.