Git Product home page Git Product logo

php-spellchecker's Introduction

PHP-Spellchecker

PHP-Spellchecker

Build Status Code coverage Code coverage PHP-Spellchecker chat room License

Check misspellings from any text source with the most popular PHP spellchecker.


About

PHP-Spellchecker is a spellchecker abstraction library for PHP. By providing a unified interface for many different spellcheckers, you’re able to swap out spellcheckers without extensive rewrites.

Using PHP-Spellchecker can eliminate vendor lock-in, reduce technical debt, and improve the testability of your code.

Features

PHP-Spellchecker is a welcoming project for new contributors.

Want to make your first open source contribution? Check the roadmap, pick one task, open an issue and we'll help you go through it πŸ€“πŸš€

Install

Via Composer

$ composer require tigitz/php-spellchecker

Usage

Check out the documentation and examples

Using the spellchecker directly

You can check misspellings directly from a PhpSpellcheck\Spellchecker class and process them on your own.

<?php
// if you made the default aspell installation on your local machine
$aspell = Aspell::create();

// or if you want to use binaries from Docker
$aspell = new Aspell(new CommandLine(['docker','run','--rm', '-i', 'starefossen/aspell']));

$misspellings = $aspell->check('mispell', ['en_US'], ['from_example']);
foreach ($misspellings as $misspelling) {
    $misspelling->getWord(); // 'mispell'
    $misspelling->getLineNumber(); // '1'
    $misspelling->getOffset(); // '0'
    $misspelling->getSuggestions(); // ['misspell', ...]
    $misspelling->getContext(); // ['from_example']
}

Using the MisspellingFinder orchestrator

You can also use an opinionated MisspellingFinder class to orchestrate your spellchecking flow:

PHP-Spellchecker-misspellingfinder-flow

Following the well-known Unix philosophy:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

<?php
// My custom text processor that replaces "_" by " "
$customTextProcessor = new class implements TextProcessorInterface
{
    public function process(TextInterface $text): TextInterface
    {
        $contentProcessed = str_replace('_', ' ', $text->getContent());

        return $text->replaceContent($contentProcessed);
    }
};

$misspellingFinder = new MisspellingFinder(
    Aspell::create(), // Creates aspell spellchecker pointing to "aspell" as it's binary path
    new EchoHandler(), // Handles all the misspellings found by echoing their information
    $customTextProcessor
);

// using a string
$misspellingFinder->find('It\'s_a_mispelling', ['en_US']);
// word: mispelling | line: 1 | offset: 7 | suggestions: mi spelling,mi-spelling,misspelling | context: []

// using a TextSource
$inMemoryTextProvider = new class implements SourceInterface
{
    public function toTexts(array $context): iterable
    {
        yield new Text('my_mispell', ['from_source_interface']);
        // t() is a shortcut for new Text()
        yield t('my_other_mispell', ['from_named_constructor']);
    }
};

$misspellingFinder->find($inMemoryTextProvider, ['en_US']);
//word: mispell | line: 1 | offset: 3 | suggestions: mi spell,mi-spell,misspell,... | context: ["from_source_interface"]
//word: mispell | line: 1 | offset: 9 | suggestions: mi spell,mi-spell,misspell,... | context: ["from_named_constructor"]

Roadmap

The project is still in its initial phase, requiring more real-life usage to stabilize its final 1.0.0 API.

Global

  • Add a CLI that could do something like vendor/bin/php-spellchecker "misspell" Languagetools EchoHandler --lang=en_US
  • Add asynchronous mechanism to spellcheckers.
  • Make some computed misspelling properties optional to improve performance for certain use cases (e.g., lines and offset in LanguageTools).
  • Add a language mapper to manage different representations across spellcheckers.
  • Evaluate strtok instead of explode to parse lines of text, for performance.
  • Evaluate MutableMisspelling for performance comparison.
  • Wrap Webmozart/Assert library exceptions to throw PHP-Spellchecker custom exceptions instead.
  • Improve the Makefile.

Sources

  • Make a SourceInterface class that's able to have an effect on the used spellchecker configuration.
  • League/Flysystem source.
  • Symfony/Finder source.

Text processors

  • Markdown - Find a way to keep the original offset and line of words after stripping.
  • Add PHPDoc processor.
  • Add HTML Processor (inspiration).
  • Add XLIFF Processor (inspiration).

Spell checkers

Handlers

  • MonologHandler
  • ChainedHandler
  • HTMLReportHandler
  • XmlReportHandler
  • JSONReportHandler
  • ConsoleTableHandler

Tests

  • Add or improve tests with different text encoding.
  • Refactor duplicate Dockerfile content between PHP images.

Versioning

We follow SemVer v2.0.0.

There still are many design decisions that should be confronted with real-world usage before thinking about a v1.0.0 stable release:

  • Are TextInterface and MisspellingInterface really useful?
  • Is using generators the right way to go?
  • Should all the contributed spellcheckers be maintained by the package itself?
  • How to design an intuitive CLI given the needed flexibility of usage?
  • Is the "context" array passed through all the layers the right design to handle data sharing?

Testing

Spell checkers come in many different forms, from HTTP API to command line tools. PHP-Spellchecker wants to ensure real-world usage is OK, so it contains integration tests. To run these, spellcheckers need to all be available during tests execution.

The most convenient way to do it is by using Docker and avoid polluting your local machine.

Docker

Requires docker and docker-compose to be installed (tested on Linux).

$ make build # build container images
$ make setup # start spellcheckers container
$ make tests-dox

You can also specify PHP version, dependency version target and if you want coverage.

$ PHP_VERSION=8.2 DEPS=LOWEST WITH_COVERAGE="true" make tests-dox

Run make help to list all available tasks.

Environment variables

If spellcheckers execution paths are different than their default values (e.g., docker exec -ti myispell instead of ispell) you can override the path used in tests by redefining environment variables in the PHPUnit config file.

Contributing

Please see CONTRIBUTING.

Credits

License

The MIT License (MIT). Please see license file for more information.

Logo: Elements taken for the final rendering are Designed by rawpixel.com / Freepik.

php-spellchecker's People

Contributors

calumchamberlain avatar dali-rajab avatar dependabot-preview[bot] avatar jacksleight avatar krsriq avatar renovate-bot avatar renovate[bot] avatar sarahdayan avatar sgigou avatar spekulatius avatar szepeviktor avatar tigitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-spellchecker's Issues

Process with command "'hunspell' '-a' '-d' 'en_US'" has failed running with exit code 1(General error)

Upon trying the first example in the docs "Using the spellchecker directly" I get the following error
Process with command "'hunspell' '-a' '-d' 'en_US'" has failed running with exit code 1(General error)

$hunspell = Hunspell::create();

// en_US hunspell dictionary is available
$misspellings = $hunspell->check('mispell', ['en_US'], ['from_example']);
foreach ($misspellings as $misspelling) {
    $misspelling->getWord(); // 'mispell'
    $misspelling->getLineNumber(); // '1'
    $misspelling->getOffset(); // '0'
    $misspelling->getSuggestions(); // ['misspell', ...]
    $misspelling->getContext(); // ['from_example']
}

Upon checking installed dictionaries from Laravel I get the following error which is strange because from command line it works(see last code block for hunspell -D
Process with command "'hunspell' '-D'" has failed running with exit code 1(General error)

$hunspell = Hunspell::create();

$hunspell->getSupportedLanguages();

I tried first with Aspell and got the same error.

Do you have any pointers as to where the issue might be coming from?

Linux Mint 20
PHP 7.4
Laravel 6.20.16
I found that both Aspell and Hunspell were already installed on Linux mint.

$ aspell -v
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
$ hunspell -v
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.7.0)
$ hunspell -D
SEARCH PATH:
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/USERNAME/.openoffice.org/3/user/wordbook:/home/USERNAME/.openoffice.org2/user/wordbook:/home/USERNAME/.openoffice.org2.0/user/wordbook:/home/USERNAME/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_CA
/usr/share/hunspell/en_ZA
/usr/share/hunspell/es_US
/usr/share/hunspell/es_SV
/usr/share/hunspell/en_AU
/usr/share/hunspell/es_VE
/usr/share/hunspell/es_NI
/usr/share/hunspell/pt_PT
/usr/share/hunspell/pt_BR
/usr/share/hunspell/es_BO
/usr/share/hunspell/es_PA
/usr/share/hunspell/es_PY
/usr/share/hunspell/es_HN
/usr/share/hunspell/en_GB
/usr/share/hunspell/es_CU
/usr/share/hunspell/de_DE
/usr/share/hunspell/de_CH_frami
/usr/share/hunspell/es_CO
/usr/share/hunspell/fr_CH
/usr/share/hunspell/fr
/usr/share/hunspell/de_DE_frami
/usr/share/hunspell/de_AT_frami
/usr/share/hunspell/es_PR
/usr/share/hunspell/es_AR
/usr/share/hunspell/de_AT
/usr/share/hunspell/es_UY
/usr/share/hunspell/en_US
/usr/share/hunspell/fr_FR
/usr/share/hunspell/fr_CA
/usr/share/hunspell/it_CH
/usr/share/hunspell/de_CH
/usr/share/hunspell/ru_RU
/usr/share/hunspell/es_ES
/usr/share/hunspell/es_DO
/usr/share/hunspell/fr_LU
/usr/share/hunspell/es_GT
/usr/share/hunspell/es_CL
/usr/share/hunspell/es_MX
/usr/share/hunspell/fr_BE
/usr/share/hunspell/fr_MC
/usr/share/hunspell/es_CR
/usr/share/hunspell/it_IT
/usr/share/hunspell/es_EC
/usr/share/hunspell/es_PE
/usr/share/myspell/dicts/fr_CH
/usr/share/myspell/dicts/hyph_ru_RU
/usr/share/myspell/dicts/fr_FR
/usr/share/myspell/dicts/fr_CA
/usr/share/myspell/dicts/fr_LU
/usr/share/myspell/dicts/fr_BE
/usr/share/myspell/dicts/fr_MC

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.


  • Check this box to trigger a request for Renovate to run again on this repository

t() function in src/Text/functions.php prevents use with Drupal

Drupal has a default built-in function t(). As soon as I composer require tigitz/php-spellchecker, the site stops working because we have two definitions of the function t().

Detailed description

It looks like this is just a shortcut for new Text() and is used only a few times, and only in examples and tests.

Context

Why is this change important to you? How would you use it?
I would like to use this library in a Drupal project.

How can it benefit other users?
Others could too...

Possible implementation

Drop it - its only purpose is to save 7 key strokes. Or namespace it.

Hunspell class doesn't work in web

Detailed description

It does not work outside of CLI.
In cli hunspell someword and laravel artisan command with Hunspell::create() works fine, but in controller (http layer) it always returns empty array. Why?

Your environment

Ubuntu 20.04
PHP 8.0.6
Laravel 8.50

###Code sample

use PhpSpellcheck\Spellchecker\Hunspell;

        $spell = Hunspell::create();


        $misspellings = $spell->check($this->argument('phrase'), ['ru'], ['from_example']);
        $arr = [];
        foreach ($misspellings as $misspelling) {
            $arr[$misspelling->getWord()] = $misspelling->getSuggestions();
        }

How to fix that? Thanks!

Is it supporting RTL languages or abjad characters?

Hi, I just started testing this library for Kurdish language. but when started testing I found out that it's not supporting RTL languages. I did even test "PHP pspell" alone without this library. it was not working too. I know there's no dictionary files for Kurdish language, but I tried using custom dictionaries but it looks like none of them accept abjad characters in the first place. is there's any explanation to that please?
Thanks

Utf8 issue not fixed in master

I am having issues with Aspell incorrectly reading UTF-8 strings.

You fixed this with this commit

Unfortunately, composer is not pulling this commit into the main package.

Is there any way I can pull this working fix into my project?

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

Possible error in hunspell.php

Hello, i think that you have an error in hunspell.php line 68, we need to get the error output, but we have an exception, i've modified the code, to:

public function getSupportedLanguages(): iterable
    {
        $languages = [];
        $cmd = $this->binaryPath->addArg('-D');
        $process = new Process($cmd->getArgs());
        $process->run();
        $output = explode(PHP_EOL, $process->getErrorOutput());

The error is when you use the multiple spell checker with hunspell and language tools.

file_get_contents error with Portuguese and Deutch

Hello, i'm having an error when i try to process a text in Portuguese and Deutch.

ErrorException
file_get_contents(http://localhost:8011/v2/check): Failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error

That's my code, and it's working with Spanish, English, etc.

if ( $from == 'pt' ) {
     $from = 'pt-PT';
}
if ( $from == 'de' ) {
     $from = 'de-DE';
 }
// LanguageTools expects language formatted with a dash `en-US`
$misspellings = $spellchecker->check($text, [$from], ['from_example']);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.