port of https://github.com/datproject/rabin to assembly script
hugomrdias / rabin-wasm Goto Github PK
View Code? Open in Web Editor NEWRabin fingerprinting implemented in WASM
Rabin fingerprinting implemented in WASM
port of https://github.com/datproject/rabin to assembly script
RangeError: WebAssembly.Compile is disallowed on the main thread, if the buffer size is larger than 4KB. Use WebAssembly.compile, or compile on a worker thread.
at instantiateSync (/home/travis/build/ipfs/js-ipfs-unixfs/node_modules/@assemblyscript/loader/index.js:306:11)
at loadWebAssembly (/home/travis/build/ipfs/js-ipfs-unixfs/node_modules/rabin-wasm/dist/rabin-wasm.node.js:10:10)
at create (/home/travis/build/ipfs/js-ipfs-unixfs/node_modules/rabin-wasm/src/index.js:5:28)
at rabin (/home/travis/build/ipfs/js-ipfs-unixfs/packages/ipfs-unixfs-importer/src/chunker/rabin.js:67:19)
at rabin.next (<anonymous>)
at rabinChunker (/home/travis/build/ipfs/js-ipfs-unixfs/packages/ipfs-unixfs-importer/src/chunker/rabin.js:51:20)
at rabinChunker.next (<anonymous>)
at all (/home/travis/build/ipfs/js-ipfs-unixfs/node_modules/it-all/index.js:12:20)
at Context.<anonymous> (/home/travis/build/ipfs/js-ipfs-unixfs/packages/ipfs-unixfs-importer/test/chunker-rabin.spec.js:30:26)
ipfs-uodules/mocha/lib/runner.js:486:14)
at Immediate.<anonymous> (/home/travis/build/ipfs/js-ipfs-unixfs/node_modules/mocha/lib/runner.js:572:5)
at processImmediate (internal/timers.js:456:21)
bring back tests with as-pect
asp
new docs https://docs.assemblyscript.org/quick-start
Hey @hugomrdias I've been trying to define a new chunker interface which looks something like this:
/**
* Chunker API can be used to slice up the file content according
* to specific logic. It is designed with following goals in mind:
*
* 1. Stateless - All the state manangement is handled by the consumer, meaning
* it is consumers responsibilty to slice from the buffer to get a new slice.
*
* 2. Effect free - Since chunker does not read from the underlying source
* consumer is free to perform multiple calls while moving buffer offset
* or it could read more bytes and perform reads afterwards.
*
* 3. Doesn't manage resources - Chunker does not manage any resources, this
* guarantees that chunker can not use more memory than desired by consumer.
*/
export interface Chunker<T extends Readonly<unknown>> {
/**
* Context used by the chunker. It usually represents chunker
* configuration like max, min chunk size etc. Usually chunker implementation
* library will provide utility function to initalize a context.
*/
readonly context: T
/**
* Chunker takes a `context:T` object, `buffer` containing bytes to be
* chunked and `ended` flag that tells it if more bytes could be made
* available in the followup calls. Chunker is supposed to return positive
* integer constituting number of bytes (from the start of the buffer)
* that contain next chunk. If returned number is `0` that signifies that
* buffer contains no valid chunks. Returning negative numbers is not allowed.
*
* **Note:** Chunker MAY return `0` even if `ended && buffer.byteLength > 0`,
* it is consumers responisibility to handle remaining bytes, despite it not
* been a chunk.
*/
cut(context:T, buffer:Uint8Array, ended:boolean):number
}
However implementation here seems to eagerly collect all chunks as opposed to providing an API to do it step by step
Lines 153 to 165 in f0cf7ce
Also as far as I can tell original implementation did not do that, which makes me wonder if there was a specific reason API diverged here.
some ppl didn't install git on system
assemblyscript "github:assemblyscript/assemblyscript#v0.6"
I'm just wondering why you changed to used before:
export function mod(x: u64, p: u64): u64 {
while (degree(x) >= degree(p)) {
var shift = degree(x) - degree(p);
x = x ^ (p << shift);
}
return x;
}
instead more efficient version?
export function mod(x: u64, p: u64): u64 {
let shift: i32;
let dp = degree(p);
while ((shift = degree(x) - dp) >= 0) {
x ^= p << shift;
}
return x;
}
Is it some security reasons?
It looks like the browser support is there but I’m getting errors trying to run any tests with aegir
and this modules doesn’t have any of its own tests. I get the same error trying to run the tests from the js-ipfs
importer that relies on this.
Test Browser
START:
ℹ 「wdm」:
ℹ 「wdm」: Compiled successfully.
ℹ 「wdm」: Compiling...
「wdm」:
ℹ 「wdm」: Compiled with warnings.
✖ single chunk vs multi-chunk
Finished in 0.019 secs / 0.012 secs @ 03:35:58 GMT+0000 (Coordinated Universal Time)
SUMMARY:
✔ 0 tests completed
✖ 1 test failed
FAILED TESTS:
✖ single chunk vs multi-chunk
HeadlessChrome 77.0.3833 (Linux 0.0.0)
ReferenceError: compiled is not defined
at create (/root/rabin-generator/node_modules/rabin-wasm/src/index.js:8:1 <- node_modules/aegir/src/config/karma-entry.js:21802:12)
Command failed: karma start /root/rabin-generator/node_modules/aegir/src/config/karma.conf.js --files-custom --log-level error
@hugomrdias @achingbrain I had been working on Rust (re)implementation of rabin chunker compatible with current go implementation https://github.com/Gozala/rabin-wasm/
Some shallow profiling shows suggests that most time is wasted copying bytes into WASM memory, which is perhaps unsurprising. In addition there is a downside of async initialization.
This got me wondering why WASM implementation was chosen over pure JS and if some perf comparison used in making that decision. Without any data to support this, I am inclined to think that pure JS implementation would likely perform better is it just needs to do shifts and xor operations and both seems to be supported on BigInt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.