Git Product home page Git Product logo

dictionary-tools's Introduction

Flake8 Pytest

Dictionary Tools

This repository contains tools for compiling and deploying dictionaries for LanguageTool.

Maintainer

The owner, maintainer, and main dev for this repository is @p-goulart. Any potential shell and perl components may be better explained by @jaumeortola, though.

Setup

Python dependencies

This is set up as a Poetry project, so you must have Poetry installed and ready to go.

Make sure you are using a virtual environment and then:

poetry install --with test,dev

System dependencies

In addition to the Python dependencies, you will also need to have Hunspell binaries installed on your system.

The most important one is unmunch. Check if it's installed:

which unmunch
# should return a path to a bin directory, like
# /opt/homebrew/bin/unmunch

If it's not installed, you may need to compile Hunspell from source. Clone the Hunspell repo and then, from inside it, these steps should work on Ubuntu:

# install a bunch of dependencies needed for compilation
sudo apt-get install autoconf automake autopoint libtool
autoreconf -vfi
./configure
make
sudo make install
sudo ldconfig

LT dependencies

The scripts here also depend on the languagetool Java codebase (for word tokenisation).

Make sure you have LT cloned locally, and export the following environment variable in your shell configuration:

export LT_HOME=/path/to/languagetool

If this is not done, the code in this project will set that variable as a default to ../languagetool (meaning one directory up from wherever this repo is cloned).

Usage

This repository should be a submodule of language-specific repositories. For example, the Portuguese repository.

โš ๏ธ Note that the name of this repository is in kebab-case, but Python modules should be imported in snake_case. Therefore, when importing this as a submodule, make sure to set the path to dict_tools, which uses the underscore. If you don't do this, you may fail to import it as a module.

build_tagger_dicts.py

This is the script that takes compiles source files into a binary dictionary to be used by the LT POS tagger, Word Tokeniser, and Synthesiser.

You can check the usage parameters by invoking it with --help:

poetry run python scripts/build_tagger_dicts.py --help

build_spelling_dicts.py

This is the script that takes all the Hunspell and helper files as input and yields as output binary files to be used by the Morfologik speller.

You can check the usage parameters by invoking it with --help:

poetry run python scripts/build_spelling_dicts.py --help

dictionary-tools's People

Contributors

danielnaber avatar p-goulart avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.