Git Product home page Git Product logo

spellr's Introduction

Spellr

Build Status Gem Version

Spell check your source code for fun and occasionally finding bugs

This is inspired by https://github.com/myint/scspell, and uses wordlists from SCOWL and MDN.

What makes a spell checker a source code spell checker?

  1. It tokenizes CamelCase and snake_case and kebab-case and checks these as independent words including CAMELCase with acronyms.
  2. It skips urls
  3. It skips things that heuristically look like base64 or hex strings rather than words. This uses a bayesian classifier and is not magic. Find the balance of false-positive to false-negative that works for you with the key_heuristic_weight configuration option.
  4. It comes with some wordlists for built in commands in some common programming languages, and recognizes hashbangs.
  5. Configure whether you want US, AU, CA, or GB english (or all of them).
  6. It checks directories recursively, obeying .gitignore
  7. It's easy to add terms to wordlists
  8. It's easy to integrate with CI pipelines
  9. It's very configurable

A brief aside on "correct" spelling.

There's no correct way to spell anything. You can't trust dictionaries, they only react to the way everyone else uses words. Any agreement about certain spellings is a collective hallucination, and is a terrible proxy for attention or intelligence or education or value. Those who get to declare what "correct" spelling is, or even what counts as a real word, tend to be those groups that have more social power and it's (sometimes unconsciously) used as a way to maintain that power.

However, in a programming context spelling things consistently is useful, where method definitions must match method calls, and comments about these are clearer when also matching. It also makes grepping easier, not that you'd find the word 'grepping' in most dictionaries.

Installation

This is tested against ruby 2.5-3.0.

With Bundler

Add this line to your application's Gemfile:

gem 'spellr', require: false

Then execute:

$ bundle install

With Rubygems

$ gem install spellr

With Docker

execute this command instead of spellr. This is otherwise identical to using the gem version

$ docker run -it -v $PWD:/app robotdana/spellr

Usage

The main way to interact with spellr is through the executable.

$ spellr # will run the spell checker
$ spellr --interactive # will run the spell checker, interactively
$ spellr --wordlist # will output all words that fail the spell checker in spellr wordlist format
$ spellr --quiet # will suppress all output
$ spellr --autocorrect # for if you're feeling lucky

To check a single file or subset of files, just add paths or globs:

$ spellr --interactive path/to/my/file.txt and/another/file.sh
$ spellr --wordlist '*.rb' '*_test.js'

There are some support commands available:

$ spellr --dry-run # list files that will be checked
$ spellr --version # for the current version
$ spellr --help # for the list of flags available

First run

Feel free to just spellr --interactive and go, but I prefer this process when first adding spellr to a large project.

$ spellr --dry-run

Look at the list of files, are there some that shouldn't be checked (generated files etc)? .gitignored files and some binary file extensions are already skipped by default.

Add any additional files to ignore to a .spellr.yml file in your project root directory.

excludes:
  - ignore
  - /generated
  - "!files"
  - in/*
  - .gitignore
  - "*.format"

Then output the existing words that fail the default dictionaries.

$ spellr --wordlist > .spellr-wordlists/english.txt

Open .spellr-wordlists/english.txt and remove those lines that look like typos or mistakes, leaving the file in ascii order.

Now it's time to run the interactive spell checker

$ spellr --interactive

Interactive spell checking

To start an interactive spell checking session:

$ spellr --interactive

You'll be shown each word that's not found in a dictionary, it's location (path:line:column), along with suggestions, and a prompt.

file.rb:1:0 notaword
Did you mean: [1] notwork, [2] nonword
[a]dd, [r]eplace, [s]kip, [h]elp, [^C] to exit: [ ]

Type h for this list of what each letter command does

[1]...[2] Replace notaword with the numbered suggestion
[a] Add notaword to a word list
[r] Replace notaword
[R] Replace this and all future instances of notaword
[s] Skip notaword
[S] Skip this and all future instances of notaword
[h] Show this help
[ctrl] + [C] Exit spellr

What do you want to do? [ ]

If you type a numeral the word will be replaced with that numbered suggestion

file.txt:1:0 notaword
Did you mean: [1] notwork, [2] nonword
[a]dd, [r]eplace, [s]kip, [h]elp, [^C] to exit: [2]
Replaced notaword with nonword

If you type r or R you'll be shown a prompt with the original word and it prefilled ready for correcting:

file.txt:1:0 notaword
Did you mean: [1] notwork, [2] nonword
[a]dd, [r]eplace, [s]kip, [h]elp, [^C] to exit: [r]

  [^C] to go back
  Replace notaword with: notaword

To submit your choice and continue with the spell checking click enter. Your replacement word will be immediately spellchecked. To instead go back press Ctrl-C once (pressing it twice will exit the spell checking).

Lowercase r will correct this particular use of the word, uppercase R will also all the future times that word is used.


If you instead type s or S it will skip this word and continue with the spell checking.

Lowercase s will skip this particular use of the word, uppercase S will also skip future uses of the word.


If you instead type a you'll be shown a list of possible wordlists to add to. This list is based on the file path, and is configurable in .spellr.yml.

file.txt:1:0 notaword
Did you mean: [1] notwork, [2] nonword
[a]dd, [r]eplace, [s]kip, [h]elp, [^C] to exit: [a]

  [e] english
  [^C] to go back
  Add notaword to which wordlist? [ ]

Type e to add this word to the english wordlist and continue on through the spell checking. To instead go back to the prompt press Ctrl-C once (pressing it twice will exit the spell checking).

Disabling the tokenizer

If the tokenizer finds a word you don't want to add to the wordlist (perhaps it's an intentional example of a typo, or a non-word string not excluded by the heuristic) then add any kind of comment containing spellr:disable-line to the line.

open('mispeled_filename.txt') # spellr:disable-line

You can also disable multiple lines, by surrounding the offending code with spellr:disable and spellr:enable

# spellr:disable
it "Test typo of the: teh" do
  fill_in(field, with: "teh")
end
# spellr:enable

If your language supports inline comments you can also surround with spellr:disable and spellr:enable in the same line:

<span><!-- spellr:disable -->nonsenseword<!-- spellr:enable --></span>

Configuration

Spellr's configuration is a .spellr.yml file in your project root. This is combined with the gem defaults defined here. There are top-level keys and per-language keys.

word_minimum_length: 3 # any words shorter than this will be ignored
key_minimum_length: 6 # any strings shorter than this won't be considered non-word strings
key_heuristic_weight: 5 # higher values mean strings are more likely to be considered words or non-words by the classifier.
excludes:
  - ignore
  - "!files"
  - in/*
  - .gitignore
  - "*.format"
includes:
  - limit to
  - "files*"
  - in/*
  - .gitignore-esque
  - "*.format"

The includes format is documented here.

Also within this file are language definitions:

languages:
  english: # this must match exactly the name of the file in .spellr-wordlists/
    locale: # US, AU, CA, or GB
      - US
      - AU
  ruby:
    includes:
      - patterns*
      - "*_here.rb"
      - limit-which-files
      - the/wordlist/**/*
      - /applies_to/
    key: r # this is the letter used to choose this wordlist when using `spellr --interactive`.
    hashbangs:
      - ruby # if the file has no extension and the hashbang/shebang contains ruby
             # this file will match even if it doesn't otherwise match the includes pattern.

If you want a file to have a file-specific wordlist: e.g. for terms specific to logstash:

languages:
  logstash: # this can be anything
    includes:
      - path/to/logstash/file

Rake

Create or open a file in the root of your project named Rakefile. adding the following lines

# Rakefile
require 'spellr/rake_task'
Spellr::RakeTask.generate_task

This will add the rake spellr task. To provide arguments like the cli, use square brackets. (ensure you escape the [] if you're using zsh) rake 'spellr[--interactive]'

To provide default cli arguments, the first argument is the name, and subsequent arguments are the cli arguments.

# Rakefile
require 'spellr/rake_task'
Spellr::RakeTask.generate_task(:spellr_quiet, '--quiet')

task default: :spellr_quiet

or rake spellr will be in interactive mode unless the CI env variable is set.

# Rakefile
require 'spellr/rake_task'
spellr_arguments = ENV['CI'] ? [] : ['--interactive']
Spellr::RakeTask.generate_task(:spellr, **spellr_arguments)

task default: :spellr

Travis

To have this automatically run on travis, add :spellr to the default rake task.

# Rakefile
require 'spellr/rake_task'
Spellr::RakeTask.generate_task

task default: :spellr

or if you already have :default task, add :spellr to the array.

require 'spellr/rake_task'
Spellr::RakeTask.generate_task

task default: [:spec, :spellr]

or etc.

Also follow the travis documentation to have travis run rake:

# .travis.yml
sudo: false
language: ruby
cache: bundler
rvm:
  - 3.0
before_install: gem install bundler

Ignoring the configured patterns

Sometimes you'll want to spell check something that would usually be ignored, e.g. .git/COMMIT_EDITMSG even though spellr ignores the .git directory.

For this you can use the --suppress-file-rules command line argument.

$ spellr --suppress-file-rules .git/COMMIT_EDITMSG

Note: This still ignores files outside of the current directory

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests.

To install this gem onto your local machine, run bundle exec rake install.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/robotdana/spellr.

License

The gem is available as open source under the terms of the MIT License. Wordlists packaged with this gem have their own licenses, see them in https://github.com/robotdana/spellr/tree/main/wordlists

spellr's People

Contributors

robotdana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

timlesallen

spellr's Issues

Allow file-specific wordlists

I often come across words that are very specific to a single file, and the probability of that same word being used elsewhere in the codebase legitimately is extremely low (low enough that it's more likely to be the result of a typo).

In this situation, it would be great to be able to have wordlists for specific files.

consistent numbers to wordlists

show all wordlists when choosing a wordlist to add to, but with those that don't match the file greyed out.

Because sometimes i'll be in 'add add add' mode and add something to the wrong wordlist

Ignore tsx.snap / js.snap / etc. files

Salutations, @robotdana big fan.

My company is using this wonderful gem of a gem in our codebase but I have encountered an issue where it's detecting words in snap files. Could you take a look, please?

Screen Shot 2019-10-07 at 11 52 02

How to integrate spellr with gitlab ci

I have been using pronto and pronto-spell in gitlab ci pipelines for spell checks/comments.

But I would very much like to use spellr instead. (or a pronto-spellr)

Point 8 says:
It's easy to integrate with CI pipelines.

Are there some undocumented instructions of sorts somewhere for this?

Or is this only gonna be possible after the STDIN functionality has been added?

Also, how would one go about adding a custom output formatted?

Cheers,
Khalil

create new wordlists/languages while interactive

it would be nice to dynamically create new languages to add words to, associating the current path, or adding the current path to a language that doesn't currently apply to it.

I don't yet know how i want to do this, as i don't want to dynamically write to the .yml and remove comments.

Possibly just append a new key to the file interactive_session_[timestamp]: then merge all these in at the end, and someone tidy can manually merge their yaml file together if they wish.

or create a new .spellr-[timestamp].yml that we glob and merge. (I think i like this better)

For now, you can ctrl-C, then modify .spellr.yml, then spellr -i again.

css wordlist

nowrap and hsl hsla rgb rgba rem, i think most other things are english or short

prune command

clear words from your wordlists that aren't needed

shell args

literally a space then -\w+ then space or newlines

Validation for dictionaries

Hi, my team have encountered expected incorrect behaviour when custom dictionaries are misconfigured (incorrect order, case, incorrect newlines), so I've made a function to find these errors.

The code can be wrapped in a new spellr command to validate dictionaries.
Or it can be optionally be called when spell checking (for setups with small dictionaries).
Also you can spec test bundled dictionaries like this, if there is no validation currently

def find_duplicates(array)
    prev = nil
    indexes = []
    array.each_with_index do |curr, index|
        indexes.append(index) if prev == curr
        prev = curr
    end
    indexes
end

def find_not_ascending(array)
    prev = nil
    indexes = []
    array.each_with_index do |curr, index|
        indexes.append(index) if prev && curr < prev
        prev = curr
    end
    indexes
end

def find_not_lowercase(array)
    indexes = []
    array.each_with_index do |curr, index|
        indexes.append(index) unless curr == curr.downcase
    end
    indexes
end

error_type = "error"

errors = []
Dir.glob('.spellr_wordlists/*.txt').select do |file|
    next unless File.file? file
    contents = File.readlines(file)

    find_not_ascending(contents).each do |index|
        errors.append "#{file}:#{index+1}: #{error_type}: words must be ordered ascending"
    end

    find_duplicates(contents).each do |index|
        errors.append "#{file}:#{index+1}: #{error_type}: duplicate word"
    end

    find_not_lowercase(contents).each do |index|
        errors.append "#{file}:#{index+1}: #{error_type}: words must be lowercase"
    end

    if contents.count > 0
        if contents.first.length == 1
            errors.append "#{file}:0: #{error_type}: first line must not be empty"
        end

        unless contents.last.end_with? "\n"
            errors.append "#{file}:#{contents.count}: #{error_type}: must have newline at the end of file"
        end
    end
end

Support spellchecking stdin

Checking the contents of stdin would mean that it could be connected into things like git hooks so that commits could be spellchecked.

tweaks to the UI of interactive

possibilities:

[a,s,S,r,R,e] or [?] for help

reduce options: (either allowing R and S as secret options or removing them)

[a]dd, [r]eplace, [s]kip, [?]help

Single line disable

Would it be possible to implement logic similar to how eslint disables rules?
eg.

# spellr-disable-next-line
KEY = 'SOME-KEY-NOT-IGNORED-HEURISTICALLY'.freeze
KEY = 'SOME-KEY-NOT-IGNORED-HEURISTICALLY'.freeze # spellr-disable-line

Thanks @alexrogers

Ignore tokens matching character ranges

Hi!
Is it possible to add an option to ignore character ranges for tokens?
If the whole token matches one ignored character set then it will be skipped. This will still prevent mixed languages in a word but will ignore languages with different character sets.

We (unfortunately) write some comments and strings in Russian and it triggers a Spellr warning almost every time
Simple dictionary checking doesn't work well with languages that has many cases (ex: Russian, Hindi) because you have to add all cases for each word to validate properly, and I was unable to find such dictionaries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.