Git Product home page Git Product logo

cld3-asm's Introduction

npm node

cld3-asm

cld3-asm is isomorphic javascript binding to google's compact language detector v3 based on WebAssembly cld3 binary. This module aims to provide thin, lightweight interface to cld3 without requiring native modules.

Install

npm install cld3-asm

Usage

Loading module asynchronously

cld3-asm relies on wasm binary of cld3, which need to be initialized first.

import { loadModule } from 'cld3-asm';

const cldFactory = await loadModule();

loadModule loads wasm binary, initialize it, and returns factory function to create instance of cld3 language identifier.

loadModule({ timeout?: number }): Promise<CldFactory>

It allows to specify timeout to wait until wasm binary compliation & load.

Creating language identifier

create(minBytes?: number, maxBytes?: number): LanguageIdentifier

LanguageIdentifier exposes minimal interfaces to cld3's NNetLanguageIdentifier.

  • findLanguage(text: string): Readonly<LanguageResult> : Finds the most likely language for the given text.
  • findMostFrequentLanguages(text: string, numLangs: number): Array<Readonly<LanguageResult>> : Splits the input text into spans based on the script, predicts a language for each span, and returns a vector storing the top num_langs most frequent languages
  • dispose(): void : Destroy current instance of language identifier. It is important to note created instance will not be destroyed automatically.

There are simple examples for each environments. In each example directory do npm install && npm start.

Building / Testing

Few npm scripts are supported for build / test code.

  • build: Transpiles code to ES5 commonjs to dist.
  • test: Run cld / cld3-asm test both. Does not require build before execute test.
  • test:cld: Run integration test for actual cld3 wasm binary, using cld's test case.
  • test:cld3-asm: Run unit test against cld3-asm interface
  • lint: Run lint over all codebases
  • lint:staged: Run lint only for staged changes. This'll be executed automatically with precommit hook.
  • commit: Commit wizard to write commit message

License

cld3-asm's People

Contributors

greenkeeper[bot] avatar greenkeeperio-bot avatar kwonoj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cld3-asm's Issues

Maybe replace binaryendpoint to fn

Do not use binaryendpoint but allow override locatefile as needed for wasm binary woule be more bundler config agnostic (i.e file loader hash)

is there any option to remove detectable language?

is there any option to remove detectable language?
simply i tested 'test' but it gives like below.
{language: "de", probability: 0.6367550492286682, is_reliable: false, proportion: 1}
if i narrow detectable language, it might give better result.

An in-range update of prettier is breaking the build 🚨

The devDependency prettier was updated from 1.14.3 to 1.15.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

prettier is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/appveyor/branch: AppVeyor build failed (Details).
  • continuous-integration/travis-ci/push: The Travis CI build passed (Details).

Release Notes for Prettier 1.15: HTML, Vue, Angular and MDX Support

🔗 Release Notes

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of @types/node is breaking the build 🚨

The devDependency @types/node was updated from 10.11.0 to 10.11.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/appveyor/branch: AppVeyor build failed (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

graceful loading failure

Check when try to fetch in electron renderer / or either fall back asmjs loading failure - too verbose footprints.

Inaccurate results

Hi
I was randomly testing some text messages trying to detect the language using your assembly and got these results:

Like super duper sketchy
Language is: da
Probability: 0.9992335438728333

living in music loving art
Language is: no
Probability: 0.996842622756958

AMERICAN DIABETES ASSOCIATION ALERT DAY
Language is: hu
Probability: 0.26049503684043884

great late brunch in Lox five ways
Language is: fy
Probability: 0.878024160861969

Actually all of these are detected by cld3 as English and I don't why why it reported incorrect results

An in-range update of conventional-changelog-cli is breaking the build 🚨

The devDependency conventional-changelog-cli was updated from 2.0.5 to 2.0.7.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

conventional-changelog-cli is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/appveyor/branch: Waiting for AppVeyor build to complete (Details).
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of lint-staged is breaking the build 🚨

The devDependency lint-staged was updated from 8.0.2 to 8.0.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

lint-staged is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/appveyor/branch: AppVeyor build failed (Details).
  • continuous-integration/travis-ci/push: The Travis CI build passed (Details).

Release Notes for v8.0.3

8.0.3 (2018-10-30)

Bug Fixes

Commits

The new version differs by 1 commits.

  • 225a904 fix: Allow to use lint-staged on CI (#523)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Unhandled rejection shut down my service

Hi,
First of all great thanks for this amazing repos.

TL;DR

When using cld3-asm in a node service, on each unhandled rejection, my service will shut down without send any beforeExit or exit or SIGINT or SIGTERM event so I can handle those cases properly.

How to reproduce?

Open terminal:

mkdir cld3-asm-issue
cd cld3-asm-issue
nvm use 12
npm init --y
npm i express cld3-asm
touch index.js
// index.js
const cld3 = require('cld3-asm');
const app = require('express')();

process.on(
    'SIGTERM',
    () => process.stdout.write('SIGTERM\n')
);

process.on(
    'beforeExit',
    () => process.stdout.write('beforeExit\n')
);

process.on(
    'exit',
    (status) => process.stdout.write(`exit: ${status}\n`)
);

cld3.loadModule()
    .then(() => {
        app.listen(
            5432,
            () => console.log('listening...')
        );
    })
    .then(() => {
        setTimeout(() => Promise.reject(), 2000);
    })

Launch the service:

node index.js

Output:

listening...

/cld3-asm-issue/node_modules/cld3-asm/dist/cjs/lib/node/cld3.js:8
var Module=typeof Module!=="undefined"?Module:{};var moduleOverrides={};var key;for(key in Module){if(Module.hasOwnProperty(key)){moduleOverrides[key]=Module[key]}}Module["arguments"]=[];Module["thisProgram"]="./this.program";Module["quit"]=function(status,toThrow){throw toThrow};Module["preRun"]=[];Module["postRun"]=[];var ENVIRONMENT_IS_WEB=false;var ENVIRONMENT_IS_WORKER=false;var ENVIRONMENT_IS_NODE=true;if(Module["ENVIRONMENT"]){throw new Error("Module.ENVIRONMENT has been deprecated. To force the environment, use the ENVIRONMENT compile-time option (for example, -s ENVIRONMENT=web or -s ENVIRONMENT=node)")}var scriptDirectory="";function locateFile(path){if(Module["locateFile"]){return Module["locateFile"](path,scriptDirectory)}else{return scriptDirectory+path}}if(ENVIRONMENT_IS_NODE){if(!(typeof process==="object"&&typeof require==="function"))throw new Error("not compiled for this environment (did you b
abort() at Error
    at jsStackTrace (/cld3-asm-issue/node_modules/cld3-asm/dist/cjs/lib/node/cld3.js:8:11112)
    at stackTrace (/cld3-asm-issue/node_modules/cld3-asm/dist/cjs/lib/node/cld3.js:8:11283)
    at process.abort (/cld3-asm-issue/node_modules/cld3-asm/dist/cjs/lib/node/cld3.js:8:1058443)
    at process.emit (events.js:215:7)
    at processPromiseRejections (internal/process/promises.js:201:33)
    at processTicksAndRejections (internal/process/task_queues.js:94:32)

And my service died without telling anyone.

Is there a way to change this behavior? Maybe I missed some config. Is it mandatory for you to re-throw the error you catch via the unhandledRejection event?

findMostFrequentLanguages gives incorrect results

Tried your lib and "findLanguage" seems to work fine but combining languages and then using "findMostFrequentLanguages" seems to find only one language in couple cases.

const test = require("tape");
const { loadModule } = require("cld3-asm");

test("findMostFrequentLanguages", async t => {
  t.plan(7);
  const cldFactory = await loadModule();
  const identifier = cldFactory.create(0, 100);

  const textEN = "This piece of text is in English.";
  const textBG = "Този текст е на Български.";
  const textFI = "Tämä teksti on suomea.";
  const textSV = "Den här texten är på Svenska.";

  const testEN = identifier.findLanguage(textEN);
  t.equal(testEN.language, "en"); // ok

  const testBG = identifier.findLanguage(textBG);
  t.equal(testBG.language, "bg"); // ok

  const testFI = identifier.findLanguage(textFI);
  t.equal(testFI.language, "fi"); // ok

  const testSV = identifier.findLanguage(textSV);
  t.equal(testSV.language, "sv"); // ok

  const testEN_BG = identifier.findMostFrequentLanguages(
    `${textEN} ${textBG}`,
    3
  );
  t.deepEqual(testEN_BG.map(lang => lang.language), ["bg", "en"]); // ok

  const testEN_FI = identifier.findMostFrequentLanguages(
    `${textEN} ${textFI}`,
    3
  );
  t.deepEqual(testEN_FI.map(lang => lang.language), ["fi", "en"]); // not ok, just ["fi"]

  const testEN_SV = identifier.findMostFrequentLanguages(
    `${textEN} ${textSV}`,
    3
  );
  t.deepEqual(testEN_SV.map(lang => lang.language), ["sv", "en"]); // not ok, just ["sv"]
});

environment detection error on webworker

I'm trying to run this module as cloudflare worker. However I get "Error: environment detection error". I think this is because cloudflare workers don't implement "importScripts"

I'm quite new to workers and webassembly so not sure if this just is configuration problem in my end. However what i read from webpack produced code there was a check: "Module.ENVIRONMENT has been deprecated. To force the environment, use the ENVIRONMENT compile-time option (for example, -s ENVIRONMENT=web or -s ENVIRONMENT=node)" And also found commit related to environment detection

I created example repo for reproducing the issue.

Just run

  • npm install
  • npm run dev-build
  • npm run dev-server (in another terminal)
  • with postman or curl query localhost:3000
  • produces 500 error and in the dev-build terminal shows "Error: environment detection error"

todo

export LanguageIdentifier

Prepare 1.0.0 release with breaking changes

2 breaking changes in planning:

  • deprecate non-native wasm target: dropping asm.js binaries.
  • introduce SINGLE_FILE from emscripten, remove wasm binary lookup interfaces.

Both'll allow simplifying interfaces in general, as well as reducing module size.

This is blocked by upstream emscripten releases new version contains SINGLE_FILE option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.