Git Product home page Git Product logo

lunr-languages's Introduction

Lunr languages npm Bower

This project features a collection of languages stemmers and stopwords for Lunr Javascript library (which currently only supports English).

The following languages are available:

  • German
  • French
  • Spanish
  • Italian
  • Japanese
  • Dutch
  • Danish
  • Portuguese
  • Finnish
  • Romanian
  • Hungarian
  • Russian
  • Norwegian

How to use

Lunr-languages supports AMD and CommonJS. Check out the examples below:

In a web browser

The following example is for the German language (de).

Add the following JS files to the page:

<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script>

then, use the language in when initializing lunr:

var idx = lunr(function () {
    // use the language (de)
    this.use(lunr.de);
    // then, the normal lunr index initialization
    this.field('title', { boost: 10 });
    this.field('body');
});

That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.

In a web browser, with RequireJS

Add require.js to the page:

<script src="lib/require.js"></script>

then, use the language in when initializing lunr:

require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) {
    // since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
    // in the end, we will only need lunr.
    stemmerSupport(lunr); // adds lunr.stemmerSupport
    de(lunr); // adds lunr.de key

    // at this point, lunr can be used
    var idx = lunr(function () {
        // use the language (de)
        this.use(lunr.de);
        // then, the normal lunr index initialization
        this.field('title', { boost: 10 })
        this.field('body')
    });
});

With node.js

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.de.js')(lunr);

var idx = lunr(function () {
    // use the language (de)
    this.use(lunr.de);
    // then, the normal lunr index initialization
    this.field('title', { boost: 10 })
    this.field('body')
});

Indexing multi-language content

If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);

var idx = lunr(function () {
    this.use(lunr.multiLanguage('en', 'ru'));
    // then, the normal lunr index initialization
    // ...
});

You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).

If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:

lunr.multiLanguage('en', 'ru');
var idx = lunr.Index.load(serializedIndex);

Building your own files

The lunr.<locale>.js files are the result of a build process that concatenates a stemmer and a stop word list and add functionality to become lunr.js-compatible. Should you decide to make mass-modifications (add stopwords, change stemming rules, reorganize the code) and build a new set of files, you should do follow these steps:

  • git clone --recursive git://github.com/MihaiValentin/lunr-languages.git (make sure you use the --recursive flag to also clone the repos needed to build lunr-languages)
  • cd path/to/lunr-languages
  • npm install to install the dependencies needed for building
  • change the build/*.template files
  • run node build/build.js to generate the lunr.<locale>.js files (and the minified versions as well) and the lunr.stemmer.support.js file

Technical details & Credits

I've created this project by compiling and wrapping stemmers toghether with stop words from various sources so they can be directly used with Lunr.

lunr-languages's People

Contributors

mihaivalentin avatar ytyaru avatar leonid-shevtsov avatar ahnseonghyun avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.