tbroadley / spellchecker-cli Goto Github PK

View Code? Open in Web Editor NEW

119.0 5.0 16.0 2.47 MB

A command-line tool for spellchecking files.

License: MIT License

JavaScript 4.23% TypeScript 95.62% Shell 0.15%

spellchecker-cli's Introduction

Spellchecker CLI

A command-line tool for spellchecking files, built on top of retext and remark.

Use Case

You can help contributors to your open-source software project catch spelling mistakes in documentation by running Spellchecker CLI as a pre-commit or pre-push Git hook or as part of your continuous integration process.

Features

Run on any plain text file, with special handling for Markdown syntax
Check for spelling mistakes, repeated words, and/or correct usage of "a" and "and"
Check spelling using an American, British, Canadian, Australian, or South African English dictionary
Specify a custom dictionary of project-specific terms to be combined with the dictionary for the specified language
Generate the custom dictionary automatically based on misspellings found in the checked files

Installation

If you want to use Spellchecker CLI in a GitHub Actions workflow, try:

https://github.com/marketplace/actions/spellchecker-cli for a simple, customizable wrapper around Spellchecker CLI
https://github.com/marketplace/actions/spellchecker-cli-summary for a full-featured Spellchecker CLI workflow that can leave comments and status checks on your PRs

If you want to use Spellchecker CLI as a command-line tool on your own computer, you can install it globally:

npm install --global spellchecker-cli

# or

yarn global add spellchecker-cli

If you want to run Spellchecker CLI in a Git hook or in a CI environment, it's better to add it as a development dependency of your application:

npm install --save-dev spellchecker-cli

# or

yarn add --dev spellchecker-cli

If you want to use the Spellchecker CLI as part of a pre-commit hook:

- repo: https://github.com/tbroadley/spellchecker-cli
  rev: v6.2.0
  hooks:
    - id: spellchecker-cli
      name: spellcheck
      language_version: 18.19.1
      types: [markdown]
      stages: # optional: if you want to specify stages to run the hook on
        - '' # see https://pre-commit.com/#confining-hooks-to-run-at-certain-stages

Usage

Run Spellchecker CLI using the command spellchecker. This command takes the following options:

-f, --files <file|glob> <file|glob>...   A list of files or globs to spellcheck.
-l, --language <language>                The language of the files. The default language is en-US. The following
                                         languages are supported: en-AU, en-CA, en-GB, en-US, en-ZA, vi.
-d, --dictionaries <file> <file>...      Files to combine into a personal dictionary.
--generate-dictionary <file>             Write a personal dictionary that contains all found misspellings. Optionally,
                                         provide a filepath for the dictionary. The default filepath is
                                         dictionary.txt.
--no-gitignore                           Don't respect ignore files (.gitignore, .ignore, etc.).
-i, --ignore <regex> <regex>...          Spelling mistakes that match any of these regexes (after being wrapped with ^
                                         and $) will be ignored.
-p, --plugins <name> <name>...           A list of retext plugins to use. The default is "spell indefinite-article
                                         repeated-words syntax-mentions syntax-urls". The following plugins are
                                         supported: spell, indefinite-article, repeated-words, syntax-mentions,
                                         syntax-urls, frontmatter.
--no-suggestions                         Do not print suggested replacements for misspelled words. This option will
                                         improve Spellchecker's runtime when many errors are detected.
-q, --quiet                              Do not output anything for files that contain no spelling mistakes.
--frontmatter-keys <key> <key>...        A list of frontmatter keys whose values should be spellchecked. By default,
                                         no values are spellchecked. Only valid when the `frontmatter` plugin is used.
--reports <file> <file>...               A list of report files to generate. The type of report is based on the
                                         extension of the file. (Supported: .junit.xml and .json)
--config <path>                          A path to a config file.
-h, --help                               Print this help screen.

If you've installed Spellchecker CLI globally, you can simply run spellchecker to invoke the tool. If you used the --save-dev flag, run ./node_modules/.bin/spellchecker from the root directory of your project (or just spellchecker inside an NPM script).

Configuration files

Spellchecker CLI can also read configuration from a JSON or YAML file. By default, it will try to read .spellcheckerrc.yaml, .spellcheckerrc.yml, .spellcheckerrc.json, or .spellcheckerrc.jsonc in the root directory of your project (as determined by pkg-dir). If pkg-dir can't find the project's root directory (e.g. if you've installed Spellchecker CLI globally), it'll look in process.cwd() instead. You can also specify a different path using the --config command line argument.

You can specify any command line option in a config file. Just make sure to use camelcase option names in the config file, e.g. frontmatterKeys instead of frontmatter-keys.

Command line arguments will override any configuration read from a file.

Globs

Spellchecker CLI uses globby, which is based on glob, to parse globs. The tool passes the provided list of globs directly to globby. This means that you can, for instance, use ! to negate a glob:

spellchecker --files '**/*.md' '!test/**/*.md' 'test/README.md'

See the node-glob documentation for a full description of glob syntax.

Plugins

The following retext plugins are supported:

retext-spell: check spelling
retext-indefinite-article: check that "a" and "an" are used correctly
retext-repeated-words: check for for repeated words
retext-syntax-mentions: ignore GitHub mentions (e.g. @tbroadley) when spellchecking
retext-syntax-urls: ignore URL-like values (e.g. README.md, https://example.com) when spellchecking

The following remark plugins are supported:

remark-frontmatter: parse frontmatter for spellchecking (see Frontmatter)

When using the --plugins command-line option, make sure to remove retext- or remark- from the beginning of the plugin name. For example, to use only retext-spell and retext-indefinite-article, run:

spellchecker --files <glob> --plugins spell indefinite-article

Personal dictionaries

Each line in a personal dictionary is treated as a regular expression. You could use this feature to ignore words with a common form but too many possible instances to be included in a personal dictionary. For instance, you could use the regular expression [0-9a-f]{7} to match Git short SHAs.

These regular expressions are case-sensitive. If you want to ignore both the capitalized and uncapitalized version of a word, you should include both versions in the dictionary.

Each regex will be wrapped with ^ and $ before mistakes are tested against it. For example, if "ize" is included in the dictionary, "optimize" and other words that contain "ize" will not be ignored. To match "optimize", you could use the regular expression [A-Za-z]+ize.

A personal dictionary should either be a plaintext file or a JavaScript file. Since Spellchecker CLI uses ES modules, you should use the extension .cjs if you want to use CommonJS module syntax:

// dictionary.cjs
module.exports = ['foo', /^bazz?/];

Otherwise, use ES module syntax:

// dictionary.js or dictionary.mjs
export default ['foo', /^bazz?/];

Note that it isn't possible to ignore multi-word sections of a document using this feature, but only single words or groups of words that you don't want spell-checked.

Generating a personal dictionary

This option is useful for adding Spellchecker CLI to an existing open-source software project with a lot of documentation. Instead of fixing every spelling mistake in one pull request, contributors can gradually remove misspellings from the generated dictionary. It's also helpful to be able to generate a personal dictionary then remove the actual misspellings from the dictionary, leaving behind only project-specific terms.

Built-in dictionaries

The dictionaries subfolder contains base.txt, a basic dictionary of general software terms. It also contains starter dictionaries for Next.js and React projects created using create-next-app and create-react-app respectively. You can provide them as arguments to the spellchecker command:

spellchecker --dictionaries node_modules/spellchecker-cli/dictionaries/nextjs.txt --files ...

Ignore regexes

Each word passed to spellchecker through the --ignore flag will be treated as if it were part of a personal dictionary. These words will be converted into regexes wrapped with ^ and $. During spellchecking, words that match one of these regexes will be ignored.

spellchecker --files README.md --ignore "ize"

In this case, only the literal word "ize" will be ignored, not words that contain it, like "optimize". To match optimize, you could use the regular expression [A-Za-z]+ize.

Note that it isn't possible to ignore multi-word sections of a document using this feature, but only single words or groups of words that you don't want spell-checked.

Gitignore integration

By default spellchecker-cli does not spell-check files that are ignored by .gitignore files. This decreases the amount of files that need to be processed overall, but occasionally this is undesired. To disable this behavior, include the --no-ignore flag.

Markdown

Spellchecker CLI ignores the contents of inline code blocks and tables in Markdown files (i.e. files with the extension .md, .markdown or .mdx, ignoring capitalization).

Frontmatter

Spellchecker CLI can parse Markdown frontmatter when the frontmatter plugin is used. The --frontmatter-keys option can be used to specify a list of top-level keys to extract from the frontmatter. Other top-level keys will be ignored. This is useful for spellchecking only certain parts of the frontmatter. Both YAML and TOML formats are supported using the frontmatter plugin, so the Markdown files you are checking may have a mix of files that use either.

Exclude blocks

If you want to exclude whole blocks in a Markdown file from spellchecking, this could be achieved by using the HTML inline comments  and . Everything between these comments will be removed before proceeding with the spellcheck.

Reports

Reports can be generated showing all the issues found in a way that can be read by automation tools or CI/CD systems.

For example, Jenkins and Gitlab CI/CD use JUnit reports, and this allows the creation of those reports, so that the output may be read in the User Interface of those tools. This allows for proper CI/CD integration before deploying static site documentation.

The report type is determined by how the file ends. If the extension ends with .json for example, then it will generate a JSON report.

List of Report types:

JSON: ending in .json
JUnit: ending in junit.xml (Note, the whole file name may be junit.xml as well)

Development

Run yarn install to install dependencies. Then, run npx ts-node index.ts to run Spellchecker CLI. You can also run yarn spellchecker to run Spellchecker CLI against its own documentation, yarn lint to lint the JavaScript source files, and yarn test to run the test suite.

spellchecker-cli's People

Contributors

Stargazers

Watchers

Forkers

raipc madhavarshney garrettcadams bryanfriedman liusoon andife gadhagod sajrashid davemooreuws austenstone sergiocollado a2937 mistobaan ilhanbozcan mindlessroman stafyniaksacha

spellchecker-cli's Issues

Add support for more configuration file formats

As a follow-up to #71, allow for configuration files in formats besides YAML. I think it'd make sense to add support for a separate JSON file, and for config in package.json. We could then solve #67 by allowing you to specify your project's dictionary in the configuration file.

Upgrade from 4.8.1 to 4.10.0 fails on non-windows machines

Our dependabot opened a PR to perform the titular upgrade however the CI checks running on Git Hub actions execute the following script (via yarn) which fails:

 spellchecker -f [!CHANGELOG]*.md ./**/*.md -d .spelling -l en-GB
/usr/bin/env: ‘node\r’: No such file or directory

Executing the same command on windows works fine:

$ spellchecker -f [!CHANGELOG]*.md ./**/*.md -d .spelling -l en-GB
Spellchecking 2 files...

README.md: no issues found
./test/README.md: no issues found

HTML Entity

The line numbers are incorrect when using HTML entitys such as  .

test.md

| github&#x2011;token | Token to use to authorize. | ${{&nbsp;github.token&nbsp;}} |
| file&#x2011;json | JSON file containing the list of files to check. | ${{&nbsp;file-json&nbsp;}} |
| files&#x2011;changed | List of files to check. | ${{&nbsp;files-changed&nbsp;}} |

spellchecker test.md --reports test.json

test.json

[
    {
        "data": {},
        "messages": [
            {
                "message": "`github‑token` is misspelt",
                "name": "test.md:1:3-1:15",
                "reason": "`github‑token` is misspelt",
                "line": 1,
                "column": 3,
                "location": {
                    "start": {
                        "line": 1,
                        "column": 3,
                        "offset": 2
                    },
                    "end": {
                        "line": 1,
                        "column": 15,
                        "offset": 14
                    }
                },
                "source": "retext-spell",
                "ruleId": "github-token",
                "file": "test.md",
                "fatal": false,
                "actual": "github‑token",
                "expected": [],
                "url": "https://github.com/retextjs/retext-spell#readme"
            },
            {
                "message": "`file‑json` is misspelt",
                "name": "test.md:1:72-1:81",
                "reason": "`file‑json` is misspelt",
                "line": 1,
                "column": 72,
                "location": {
                    "start": {
                        "line": 1,
                        "column": 72,
                        "offset": 71
                    },
                    "end": {
                        "line": 1,
                        "column": 81,
                        "offset": 80
                    }
                },
                "source": "retext-spell",
                "ruleId": "file-json",
                "file": "test.md",
                "fatal": false,
                "actual": "file‑json",
                "expected": [],
                "url": "https://github.com/retextjs/retext-spell#readme"
            },
            {
                "message": "`JSON` is misspelt; did you mean `JASON`, `JON`, `SON`?",
                "name": "test.md:1:84-2:2",
                "reason": "`JSON` is misspelt; did you mean `JASON`, `JON`, `SON`?",
                "line": 1,
                "column": 84,
                "location": {
                    "start": {
                        "line": 1,
                        "column": 84,
                        "offset": 83
                    },
                    "end": {
                        "line": 2,
                        "column": 2,
                        "offset": 87
                    }
                },
                "source": "retext-spell",
                "ruleId": "json",
                "file": "test.md",
                "fatal": false,
                "actual": "JSON",
                "expected": [
                    "JASON",
                    "JON",
                    "SON"
                ],
                "url": "https://github.com/retextjs/retext-spell#readme"
            },
            {
                "message": "`file-json` is misspelt",
                "name": "test.md:2:53-2:62",
                "reason": "`file-json` is misspelt",
                "line": 2,
                "column": 53,
                "location": {
                    "start": {
                        "line": 2,
                        "column": 53,
                        "offset": 138
                    },
                    "end": {
                        "line": 2,
                        "column": 62,
                        "offset": 147
                    }
                },
                "source": "retext-spell",
                "ruleId": "file-json",
                "file": "test.md",
                "fatal": false,
                "actual": "file-json",
                "expected": [],
                "url": "https://github.com/retextjs/retext-spell#readme"
            }
        ],
        "history": [
            "test.md"
        ],
        "cwd": "C:\\Users\\auste\\source\\spellchecker-cli-action-summary",
        "contents": "| github&#x2011;token | Token to use to authorize. | ${{&nbsp;github.token&nbsp;}} |\r\n| file&#x2011;json | JSON file containing the list of files to check. | ${{&nbsp;file-json&nbsp;}} |\r\n| files&#x2011;changed | List of files to check. | ${{&nbsp;files-changed&nbsp;}} |",
        "value": "| github‑token | Token to use to authorize. | ${{ github.token }} |\r\n| file‑json | JSON file containing the list of files to check. | ${{ file-json }} |\r\n| files‑changed | List of files to check. | ${{ files-changed }} |\n"
    }
]

Result	Word	Line Start	Line End	Line Start Expected	Line End Expected
✅	github‑token	1	1	1	1
❌	file‑json	1	1	2	2
❌	JSON	1	2	2	2
✅	file-json	2	2	2	2

Accept programmatically generated dictionaries

The dictionary would be a JavaScript file that exported a list of words to whitelist.

Thinks an empty space " " is misspelt in some occasions

It thinks the space before a string like this :telephone: is a mistake. I use those strings to indicate emoji's in Jekyll's jemoji package. When it prints the line and column number the column number actually lines up with the first letter (at least from VS Code) for all the cases included in my screenshot.

This doesn't happen all the time, I have way more than the 5 emojis seen in my screenshot. So there is something special going on for these in particular.

retext-indefinite-article suggests "an" for "1:00" and "1:1"

"TypeError: Cannot read properties of undefined (reading 'filename')" when globally installed on Ubuntu

Hi, spellchecker-cli v5 doesn't work anymore on a library/node:lts-alpine container:

/ # npm install -g spellchecker-cli
/ # spellchecker --help
/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/resolve.js:111
		appRootPath = path.dirname(requireFunction.main.filename);
		                                                ^

TypeError: Cannot read properties of undefined (reading 'filename')
    at resolve (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/resolve.js:111:51)
    at module.exports (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/app-root-path.js:6:20)
    at Object.<anonymous> (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/index.js:4:18)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:170:29)
    at ModuleJob.run (node:internal/modules/esm/module_job:198:25)
    at async Promise.all (index 0)

Seems like it doesn't really spellcheck

Trying to use it in local repo with command:

spellchecker --generate-dictionary --files '**/*.tsx' --ignore '[a-z]+((d)|([A-Z0-9][a-z0-9]+))*([A-Z])?' '([A-Z][a-z0-9]+)((d)|([A-Z0-9][a-z0-9]+))*([A-Z])?' '[^a-zA-Z0-9-]+' '([\\w.]+)' '([\\w-]+)' '([\\w&]+)'

For example, I do an obvious mistake in one of the files - saccessfully, but as a result it prints only: no issues found. Am I doing it wrong?

GitHub Action - TypeError: Cannot read properties of undefined (reading 'filename')

I'm trying to run spellchecker-cli in a GitHub Action that runs a script in my repo to spellcheck all of my markdown files. After the 5.0.0 update, my GH Action no longer works. Do you have any idea on how I can fix it?

spellcheck.yaml

---
name: spellcheck
on:  # yamllint disable-line rule:truthy
  workflow_dispatch:
  pull_request:
  push:
    branches:
      - main

jobs:
  spellcheck:
    name: spellcheck
    runs-on: ubuntu-20.04

    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
      - name: Setup npm
        uses: actions/setup-node@v3
      - name: Setup spellchecker
        run: npm install --location=global spellchecker-cli
      - run: npm list --global spellchecker-cli
      - name: Run spellcheck
        run: ./scripts/spellcheck.sh

Failed GH Action Output

Run ./scripts/spellcheck.sh
  ./scripts/spellcheck.sh
  shell: /usr/bin/bash -e {0}
/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/resolve.js:111
		appRootPath = path.dirname(requireFunction.main.filename);
		                                                ^
TypeError: Cannot read properties of undefined (reading 'filename')
    at resolve (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/resolve.js:111:51)
    at module.exports (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/lib/app-root-path.js:6:[2](https://github.com/nicholaswilde/recipes/runs/7424402249?check_suite_focus=true#step:6:2)0)
    at Object.<anonymous> (/usr/local/lib/node_modules/spellchecker-cli/node_modules/app-root-path/index.js:4:18)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
    at Module.load (node:internal/modules/cjs/loader:981:[3](https://github.com/nicholaswilde/recipes/runs/7424402249?check_suite_focus=true#step:6:3)2)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:170:29)
    at ModuleJob.run (node:internal/modules/esm/module_job:198:2[5](https://github.com/nicholaswilde/recipes/runs/7424402249?check_suite_focus=true#step:6:6))
    at async Promise.all (index 0)
Error: Process completed with exit code 1.

Working GH Action Output

Run ./scripts/spellcheck.sh
  ./scripts/spellcheck.sh
  shell: /usr/bin/bash -e {0}
Spellchecking [3](https://github.com/nicholaswilde/recipes/runs/7345592539?check_suite_focus=true#step:5:3)[4](https://github.com/nicholaswilde/recipes/runs/7345592539?check_suite_focus=true#step:5:5)[6](https://github.com/nicholaswilde/recipes/runs/7345592539?check_suite_focus=true#step:5:7) files...
./cook/asian/Coconut Curried Vegetables with Rice.cook: no issues found
./cook/asian/Cool and Spicy Noodle Salad.cook: no issues found
./cook/asian/Crispy Peanut Tofu & Cauliflower Rice Stir-Fry.cook: no issues found
./cook/asian/Everyday Chinese Vegetable Stir-Fry.cook: no issues found
./cook/asian/Noodle-Free Pad Thai.cook: no issues found
./cook/beverages/Iced Chai Latte.cook: no issues found
./cook/beverages/Old Irish Coffee.cook: no issues found
./cook/beverages/Olde Tyme Lemonade.cook: no issues found
./cook/breads/Best-Ever Banana Bread.cook: no issues found
./cook/breads/Bread Machine French Bread.cook: no issues found
...

spellcheck.sh

#!/bin/bash
set -e
set -o pipefail
shopt -s globstar
shopt -s dotglob nullglob

a=$(git rev-parse --show-toplevel)
cd "${a}"
spellchecker -d dictionary.txt -f {"./cook/**/*.cook","./docs/**/*.md"}

It works locally on my machine with version 5.0.0.

npm list --global spellchecker-cli
/home/linuxbrew/.linuxbrew/lib
└── [email protected]

Ability to mark specific words as incorrect/inappropriate

Firstly, this spell checker is fantastic! One thing I would love to see would be the ability to flag words as incorrect in a personal dictionary.

For example - I would like to flag 'Seperate' as being incorrect (it is marked as correct when I use the en-GB dictionary, as is the word 'Separate')

Another usage - I would like to flag specific words (rude words perhaps) which shouldn't appear in any of my documentation.

I was thinking this could be done as some sort of 'exclusion dictionary', were all words or expressions in the exclusion dictionary are removed from the list of valid words (from the amalgamation of words in the main dictionary and custom dictionary) and are therefore treated as spelling mistakes.

Regex to avoid all words between {{ and }}

I would like to know how can all words between {{ and }} be ignored by the spellchecker.

I tried with
[A-Za-z]+}}

but it doesn't seem to use }} or {{ for some reason.

How can this be fixed?

Allow adding regex flags to entries in non-JS personal dictionaries

Tried to do that, no luck so far.

Support spellchecking JSDoc comments

Regular emoji throwing spellcheck warnings

This library is great!

Noticed that regular emoji are throwing warnings. For example:

./src/shared/en/aws/guides-custom-dns.md
     63:9-63:10  warning  `️` is misspelt  retext-spell  retext-spell
      80:4-80:5  warning  `️` is misspelt  retext-spell  retext-spell
  107:42-107:43  warning  `️` is misspelt  retext-spell  retext-spell

The lines in question:

63: > 🤷🏽‍♀️ DNS propagation can take time: have patience!
80: > ⛳️ Tip: These instructions will serve your app's production environment...
107: ...grab a cup of coffee or tea ☕️ – it can take a few minutes while...

May take a crack at a fix (would first try implementing remark-emoji, would that kind of thing be accepted as a PR?

Avoid spelling checking superscript numbers

When generating a normal dictionary to see what was what; on the bottom of the list the following character was added
⁹.

As it is still a 9, I'd prefer to not to see it spell-checked please along with the rest of the numbers.

This should give you the list of characters to look for.

https://www.htmlsymbols.xyz/miscellaneous-symbols/subscript-and-superscript/superscript-numbers

Help ignoring html tags like <li> <lo> <ul>

Hi folks,
Is it possible to ignore all html tags. I have tried various syntax but no luck.

  590  spellchecker --files '**/*.xml' -i ".*<.*>.*"
  591  spellchecker --files '**/*.xml' -i <.*>
  592  spellchecker --files '**/*.xml' -i '<.*>'
  593  spellchecker --files '**/*.xml' -i '\<.*\>'
  594  spellchecker --files '**/*.xml' -i '/<.*>/'

Still getting below results:

        5:6-5:8  warning  `ul` is misspelt; did you mean `kl`, `ml`, `ult`, `UL`, `bl`, `cl`, `fl`, `l`, `ll`, `pl`, `U`, `Uh`, `Um`, `Up`, `Ur`, `Us`, `Ut`, `Al`, `IL`, `JUL`, `LU`, `Tl`, `UK`, `UN`, `URL`, `UV`, `XL`?                                                                                                                                                         retext-spell  retext-spell
        6:8-6:10  warning  `li` is misspelt; did you mean `lii`, `mi`, `oi`, `lee`, `Li`, `bi`, `hi`, `i`, `ii`, `lib`, `lid`, `lie`, `lip`, `liq`, `lit`, `lix`, `pi`, `ti`, `vi`, `xi`, `L`, `La`, `Lb`, `Le`, `Lg`, `Lin`, `Liz`, `Ll`, `Ln`, `Lo`, `Lr`, `Ls`, `Lt`, `Lu`, `lei`, `lvi`, `lxi`, `AI`, `ALI`, `Ci`, `Di`, `ELI`, `GI`, `IL`, `LC`, `LP`, `Ni`, `RI`, `Si`, `WI`?  retext-spell  retext-spell

Thanks

When installed globally, Spellchecker CLI will read config from .spellcheckerrc.yml in spellchecker-cli's root directory

Test that this actually happens

Accept multiple dictionary files

Configuration file for command line options

I was wondering if there was a file in which I could keep all command line options, similar to the way Mocha does it. This could be useful in cases where there are a bunch of files to be checked, words to be ignored, and more. Kinda like:

{
  "files": ["src/index.md", "src/test.md"],
  "ignore": ["yeet"],
  "language": "en-GB"
}

This way you wouldn't have to keep remembering the files to be checked and other options.

By the way, this tool is super useful! Thanks for making it.

Add option for personal dictionary to be in `package.json`

I have too many files in my repository's root. For this reason, I would like to be able to move my personal dictionary to package.json. Like this:

{
  "name": "myProject",
  "dictionary": [
    "word1",
    "word2",
    "word3",
    ...
  ]
}

Specify path to write generated dictionary to

This would look like:

$ spellchecker --files *.md --generate-dictionary docs/dictionary.txt

Treat user-generated dictionaries as lists of regexes

Also, when using the --generate-dictionary option, include regexes passed using the --ignore option in the generated dictionary.

Make sure this doesn't seriously impact performance.

Support more retext plugins

This issue involves thinking of a way to allow configuration of these plugins.

Ignore URLs

If I have a url like the following

https://kubernetes.github.io/ingress-nginx/user-guide/tls/#automated-certificate-management-with-kube-lego

and run the cli tool with:

spellchecker --files '**/*.md' --plugins spell --ignore "http.*"

I'll get a warnings like:

warning  `kubernetes.github.io` is misspelt
warning  `ingress-nginx` is misspelt

Avoid regexes being surrounded with ^ & $

In the documentation, it says that any regexes are automatically surrounded with ^ & $. However, I need to use the following regex /(\[source,[a-z]+\])(\n\.{4}[\s\S]*?\.{4})/g, as it is, to match the two source blocks in the following AsciiDoc snippet.

Is this possible?

Limit access to push and tag events:

[source,console]
....
$ vault kv put secret/docker \
  username=octocat \
  password=correct-horse-battery-staple \
  x-drone-events=push,tag
....

You can combine annotations to limit by repository and event:

[source,console]
....
$ vault kv put secret/docker \
  username=octocat \
  password=correct-horse-battery-staple \
  x-drone-events=push,tag \
  x-drone-repos=octocat/*,spaceghost/*
....

Add "dev words" support

Words like npm and Vercel should always be allowed (both included in the default template for many new Next.js projects)

Add more technology words

Hello I'd like to formally add a whole bunch of technology words such as "JSON", "NoSQL", "Steganography", "RDBMS" in addition to function words like "VARCHAR" , "SUBSTR". I'd also like to append "https" to react.txt as it is the more secure version of the protocol.

I'd like to start by adding a file called sql.txt to the dictionaries folder featuring words dealing with databases and their functions. And add some more common programming related terms to the main dictionary.txt.

Comparison with github-spellcheck-cli

Hi there! This project seems similar to github-spellcheck-cli. The key difference I found is that spellcheck-cli is meant for running in a local repository and github-spellcheck-cli is for contributing corrections to any repository. I also found that they use different technologies under-the-hood. Have you considered merging the two projects or making github-spellcheck-cli do the GitHub part and use spellcheck-cli for the rest?

Thanks, Madhav.

Option for capitalized words in Markdown headers

Hey,

I am wondering, is there an option of some sorts that would prevent Capitalized Words in Markdown headers being reported? For instance,

# Using Namespaces

as "namespaces" isn't a common word, I am adding it to a dictionary

namespaces?

However, the spell checker will still report the header above as an issue, so I need to change the dictionary to

[Nn]amespaces?

which would allow "Namespace" to be used in the middle of a sentence, which is unintended. Is there a straightforward way for that kind of thing?

Cheers!

Automate the process of updating CHANGELOG.md

It's a bit annoying to manually add a new section (with a link to GitHub) to CHANGELOG.md as part of the release process.

Automatable using https://github.com/ianfixes/keepachangelog_manager_gem. I'd need to install Ruby but that's probably OK.

Compatible VSCode extensions

I was wondering, are there any compatible VSCode spell checking extensions that can share the dictionary with this CLI? Would be awesome to have consistent spell checking between VSCode and build scripts. I have found old stuff like remark-lint, but that is no longer maintained it seems, and I do not see how it would digest a regex-based dictionary.

All plugins running even if told otherwise

Might be making a really stupid error here, but anyway... If I specify only using two specific plugins when I run, retext-repeated-words still pops up and gives errors—but I don't want it to. I've set it to run with:
node_modules/.bin/spellchecker --files '20*/*/*.md' -l 'en-GB' --plugins spell syntax-urls --no-suggestions -d dictionary.txt -q

Also tried putting the config into a spellcheckerrc.json/.spellcheckerrc.yml but get the same results. I saw that all plugins on by default was introduced by #6, but haven't seen anyone else have this issue.

Gear emoji is seen as misspelt

When using a gear emoji in a markdown file, spellchecker flags the emoji but should probably be skipped.

Steps to reproduce the issue

echo ":gear:" > test.md 
spellchecker test.md
Spellchecking 1 file...

test.md
  1:2-1:3  warning  `️` is misspelt; did you mean `a`, `b`, `c`, `d`, `e`, `f`, `g`, `h`, `i`, `j`, `k`, `l`, `m`, `n`, `o`, `p`, `q`, `r`, `s`, `t`, `u`, `v`, `w`, `x`, `y`, `z`?  retext-spell  retext-spell

⚠ 1 warning

Expected behavior

echo ":gear:" > test.md 
spellchecker test.md 
Spellchecking 1 file...

test.md: no issues found

spellchecker version: v4.11.0

uname -a
Linux amd 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Add library of common dictionaries

Suggested in #84.

spellchecker-cli could have a suite of default dictionary files for projects based on different kinds of boilerplates (Next.js starter template, create-react-app boilerplate, etc). Then people could pass the path to the dictionary when invoking spellchecker:

spellchecker --dictionaries node_modules/spellchecker-cli/dictionaries/next-js.txt --files ...

It's verbose, but I think it's a good place to start. We could add some syntactic sugar on top, so that it isn't necessary to type out node_modules/spellchecker-cli/dictionaries.

Also, I don't know how people who have installed spellchecker-cli globally would know the path to these dictionaries. The syntactic sugar I mentioned above could help with this situation. In any case, I believe that people primarily install the package as a dev dependency.

Handle Markdown YAML frontmatter

https://jekyllrb.com/docs/frontmatter/

Idea:

--ignore-frontmatter flag to avoid spellchecking the frontmatter entirely
--frontmatter-keys flag to spellcheck only values corresponding to certain keys in the frontmatter

Upgrade to latest version of Remark and related packages

This is a bit of a challenge because the latest version of Remark is only published as an ES module, not as a CommonJS module. https://github.com/remarkjs/remark/releases/tag/14.0.0

And spellchecker-cli uses CommonJS.

Maybe the solution is to change tsconfig.json to compile the TypeScript down to ES modules? Then tell Node to use ES modules by setting the type field in package.json?

It seems that Node 14 is the oldest , and it has stable support for ES modules. But maybe still worth bumping this project's major version for safety.

Allow specification of a list of regexes, ignoring mistakes that match one of the regexes

e.g. [0-9a-f]{7} could be added to the dictionary to whitelist short Git SHAs.

Allow to save dictionary by HTTP method or S3 standard compatible

The reason for this issue is that i work with several projects and i have a dictionary for each one.

But this dictionaries are very similar between to each others, and i need add de same words in all of them.

i think if the dictionary is stored in HTTP GET/POST URL, i can update a single dictionary and reuse it as long as it takes

Another option is an S3 standard, like AWS S3, Digitalocean Spaces o Minio bucket.

Right now i can't develop this topic because i don't have free time... in some free time i'll start doing it if i's not ready yet.

Add option to only spellcheck parts of hyphenated words, not the entire word

For example, check space, infix, and ops when presented with space-infix-ops, but don't try to spell-check space-infix-ops.

Originally, this issue was about spell-checking the parts of the hyphenated word in addition to the hyphenated words. Updated based on the comments below.

Test fails on Windows(?)

Quiet flag isn't quiet

$ spellchecker -qf 'somefile'

Spellchecking 1 files...

Is not my definition of quiet. I submit that the tool should not print anything, including the whitespace, if -q is specified and the file has no mispelins.

Whether or not it prints all the extra lines and status if there ARE misspellings isn't so much of a concern for me.

Here's my use case:

nodemon --watch . --ext md --exec "find -iname '*.md' -and -not -path '*/node_modules/*' | parallel spellchecker --no-gitignore -qd .vscode/spellright.dict -f '{}'"

Which prints out a TON of lines that say "Spellchecking 1 files...` with lots of whitespace.

Why am I using parallel instead of globbing? It's lots, lots faster.

14 seconds across 45 files with parallel: time find -iname '*.md' -and -not -path '*/node_modules/*' | parallel spellchecker --no-gitignore -qd .vscode/spellright.dict -f '{}'
22 seconds across 45 files with xargs: time find -iname '*.md' -and -not -path '*/node_modules/*' -print0 | xargs - spellchecker --no-gitignore -qd .vscode/spellright.dict -f '{}'
55 seconds across 5 files with glob: time spellchecker -qd .vscode/spellright.dict -f '**/*.md' - Yes this is slower with less files than either of the above - and misses files because they are gitignored. Adding the --no-gitignore flag increases the file count to 27000. Ain't got time for that. :P

Spellcheck files ignored by `.gitignore`

Hello,

I'm interested in spellchecking files that are generated from my build process. However I ran into an issue because those files are covered in my .gitignore, and the spellchecker command respects those settings.

I've submitted a PR (#57) that adds an option to disable that.

Thanks,
Declan

Repeated words check flags consecutive images with the same alt text

Hypothetical repro case:

![word](https://example.com/a.png)
![word](https://example.com/b.png)

We probably wouldn't expect spellchecker-cli to fail on this because the alt texts of the two images aren't actually consecutive. But I believe it does fail on this.

Too many misspelling error even if all misspell patterns are mentioned in `--ignore`

api/data/series/query.md
  206:8-206:13  warning  Too many misspellings; no further spell suggestions are given  overflow      retext-spell

Probably this behavior is caused by default max setting set to 30 in retext-spell, and --ignore feature takes place after retext-spell plugin finishes its work.

Generated dictionary sometimes includes word twice, once followed by a period

For example, here.

Upgrade to 5.0.0 breaks on non-windows machines

It looks to be exactly the same issue as #86:

$ ./node_modules/.bin/spellchecker
env: node\r: No such file or directory

ignore the contents of markdown blockquotes

> none of this should be spell checked

How do I use --reports exactly?

I added --reports myreport.txt at the end hoping to get a summary of a spellcheck I did. But I could not find the file in the same directory. Can someone give me an example usage of this command?

Add Vietnamese support

Excuse me, I've find a bug in the spellchecker-cli. I'm using Mac OS X 10.11.6 and here is my command

spellchecker --plugins spell repeated-words syntax-urls --dictionaries dictionary/dictionary.txt dictionary/science.txt --files 'docs/data/**/*.md'  && echo Spellcheck passed.) || (echo Spellcheck failed! Please review and fix errors/add words to dictionary as needed. && exit 1)

As you see in the picture below, the spellchecker-cli do=id warn me about a word that is in the dictionary.txt, this seem only occurs with other words which have similar spelling with the one another english word
I don't know if you can fix this issue

Deprecation warning for date-format

npm WARN deprecated [email protected]: 0.x is no longer supported. Please upgrade to 4.x.

Seems to come from https://www.npmjs.com/package/junit-report-builder. spellchecker-cli is on version 1.3.3, the latest is 3.0.0.

See tbroadley/spellchecker-cli-action#4