First of all, great job with this! I noticed it breaking on a few characters in some J

Breaking on some characters about base32k HOT 3 CLOSED

simonratner commented on August 15, 2024

Breaking on some characters

from base32k.

Comments (3)

simonratner commented on August 15, 2024

Thanks, Jason.

The problem with your input is that it is a string containing a Unicode character. This behaviour is pretty much by design — base32k.encodeBytes works on bytes, not strings, by packing sequences of 15 bits from consecutive input bytes into a single UTF-16 character (hence the name, 2¹⁵ = 32768).

Input to base32k.encodeBytes should be binary data, represented as a string of bytes only for convenience since javascript doesn't have a better binary data type. Alternatively, you can also represent your binary data as 4-byte integers and use base32k.encode instead. Unicode string input to base32k.encodeBytes is an error and will break it; I mention this briefly at the bottom of the README when talking about stringified JSON, but perhaps I should make it more explicit.

You can't save any space on storing Unicode strings using this method. If your input is Unicode, storing it as a plain javascript string is already as efficient as it will get, since there is no wasted space in the internal representation.

from base32k.

simonratner commented on August 15, 2024

By the way, in the case where you may have small Unicode fragments within a larger ascii-only string and still want to save some storage space on the large portion, you can work around this by escaping all Unicode characters within your JSON input. The JSON parser will happily unescape them for you. For example:

json = '"\\u2013"'; // 8211 == 0x2013
decoded = base32k.decodeBytes(base32k.encodeBytes(json));
console.log(JSON.parse(decoded) === String.fromCharCode(8211));

The space saving will of course depend on the ratio of Unicode to non-Unicode characters, since you are now using 6 bytes to represent each Unicode character that would otherwise only occupy 2 bytes.

from base32k.

junosuarez commented on August 15, 2024

Cool, thanks for the explanation - this makes quite a bit more sense now. I verified this with

i = 0; while(a = String.fromCharCode(i), base32k.decodeBytes(base32k.encodeBytes(a)) === a) { i++ }

I'n using this to store JSON data in localStorage, combined with a compression algorithm which is returning a string. The trick in this case was to encode my initial JSON.stringified data using \uXXXX characters for anything over charcode 255.

from base32k.

Breaking on some characters about base32k HOT 3 CLOSED

Comments (3)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent