Git Product home page Git Product logo

huacnlee / autocorrect Goto Github PK

View Code? Open in Web Editor NEW
829.0 6.0 27.0 7.17 MB

A linter and formatter to help you to improve copywriting, correct spaces, words, and punctuations between CJK (Chinese, Japanese, Korean).

Home Page: https://huacnlee.github.io/autocorrect

License: MIT License

Rust 78.01% Makefile 0.77% Shell 0.60% Dockerfile 0.03% HTML 1.72% JavaScript 5.22% TypeScript 6.74% SCSS 2.53% Python 0.54% Ruby 1.72% Java 2.12%
rust formatter linter copywriting autocorrect spellcheck webassembly

autocorrect's Introduction

AutoCorrect Icon

AutoCorrect

Go GitHub release (latest by date) Docker Image Version (latest server) Crates.io NPM PyPI version Gem Version Maven Central

🎯 AutoCorrect 的愿景是提供一套标准化的文案较正方案。以便于在各类场景(例如:撰写书籍、文档、内容发布、项目源代码...)里面应用,让使用者轻松实现标准化、专业化的文案输出 / 校正。

AutoCorrect is a linter and formatter to help you to improve copywriting, correct spaces, words, and punctuations between CJK (Chinese, Japanese, Korean).

Like Eslint, Rubocop and Gofmt ..., AutoCorrect allows us to check source code, and output as colorized diff with corrected suggestions. You can integrate to CI (GitLab CI, GitHub Action, Travis CI....) for use to check the contents in source code. Recognize the file name, and find out the strings and the comment part.

AutoCorrect 是一个基于 Rust 编写的工具,用于「自动纠正」或「检查并建议」文案,给 CJK(中文、日语、韩语)与英文混写的场景,补充正确的空格,纠正单词,同时尝试以安全的方式自动纠正标点符号等等。

类似 ESlint、Rubocop、Gofmt 等工具,AutoCorrect 可以用于 CI 环境,它提供 Lint 功能,能便捷的检测出项目中有问题的文案,起到统一规范的作用。

支持各种类型源代码文件,能自动识别文件名,并准确找到字符串、注释做自动纠正。

此方案最早于 2013 年 出现于 Ruby China 的项目,并逐步完善规则细节,当前准确率较高(极少数异常情况),你可以放心用来辅助你完成自动纠正动作。

autocorrect lint output

Features

  • Add spacing between CJK (Chinese, Japanese, Korean) and English words.
  • Correct punctuations into full-width near the CJK.
  • Correct punctuations into half-width in English content.
  • (Experimental) Spellcheck and correct words with your dictionary.
  • Lint checking and output diff or JSON result, so you can integrate everywhere (GitLab CI, GitHub Action, VS Code, Vim, Emacs...)
  • Allows using .gitignore or .autocorrectignore to ignore files that you want to ignore.
  • Support more than 28 file types (Markdown, JSON, YAML, JavaScript, HTML ...), use AST parser to only check for strings, and comments.
  • LSP server: autocorrect-lsp
  • Cross-platform for Linux, macOS, Windows, and WebAssembly, and as Native SDK for programming (Node.js, JavaScript Browser, Ruby, Python, Java).

典型应用场景

  • 撰写书籍、文档,新闻媒体等内容发布,应用于 Markdown、AsciiDoc、HTML 等文档场景,确保文案的标准化、专业化(案例:MDN 项目少数派)。
  • 集成 GitLab CI、GitHub Action、Travis CI 等 CI 环境,需要对项目进行自动化检查。
  • 集成到 Docusaurus、Hexo、Hugo、Jekyll、Gatsby 等静态网站生成器,在生成的时候自动格式化。
  • 利用语言支持的 SDK 集成到应用程序,在存储或输出网站内容的时候格式化,提升网站品质(如:Ruby ChinaV2EXLongbridge)。
  • 作为 VS Code、Intellij Platform IDE(已支持)、Vim、Emacs (待实现) 插件,需要对文案进行检查(Linter & Formatter),依靠 LintResult 给出的(Annotator、Diagnostic)提示。
  • 基于 WebAssembly 实现,作为 Chrome、Safari 等浏览器插件,应用于任何网站(待实现)
  • 也可以集成到 WYSIWYG Editor 里面,例如(ProseMirror、CKEditor、Slate、Draft.js、Tiptap、Monaco Editor、CodeMirror 等)。

Installation

Install on macOS

You can install it via Homebrew:

$ brew install autocorrect
Install on Windows

You can install it via Scoop:

$ scoop install autocorrect

Or you can just install it via this on Unix-like system:

$ curl -sSL https://git.io/JcGER | sh

After that, you will get autocorrect command.

$ autocorrect -V
AutoCorrect 2.4.0

Or install NPM:

$ yarn add autocorrect-node
$ yarn autocorrect -V

Upgrade

Since: 1.9.0

AutoCorrect allows you to upgrade itself by autocorrect update command.

$ autocorrect update

NOTE: This command need you input your password, because it will install bin into /usr/local/bin directory.

Usage

Use in CLI

$ autocorrect text.txt
你好 Hello 世界

$ echo "hello世界" | autocorrect --stdin
hello 世界

$ autocorrect --fix text.txt
$ autocorrect --fix zh-CN.yml
$ autocorrect --fix

Lint

$ autocorrect --lint --format json text.txt

$ autocorrect --lint text.txt
Error: 1, Warning: 0

text.txt:1:3
-你好Hello世界
+你好 Hello 世界

You also can lint multiple files:

$ autocorrect --lint

How to lint all changed files in Git:

$ git diff --name-only | xargs autocorrect --lint

Use in NPM

since: 2.7.0

AutoCorrect has been published in NPM with CLI command support. If you want to use it in Frontend or Node.js project, you can just install autocorrect-node package for without install AutoCorrect bin.

cd your-project
yarn add autocorrect-node

Now you can run yarn autocorrect command in your project. This command is same as autocorrect command.

$ yarn autocorrect -h

More docs: autocorrect-node/README.md

Configuration

Default config: .autocorrect.default

$ autocorrect init
AutoCorrect init config: .autocorrectrc

NOTE: If you download fail, try to use autocorrect init --local command again.

Now the .autocorrectrc file has been created.

.autocorrectrc is allows use YAML, JSON format.

Config file example:

# yaml-language-server: $schema=https://huacnlee.github.io/autocorrect/schema.json
# Config rules
rules:
  # Auto add spacing between CJK (Chinese, Japanese, Korean) and English words.
  # 0 - off, 1 - error, 2 - warning
  space-word: 1
  # Add space between some punctuations.
  space-punctuation: 1
  # Add space between brackets (), [] when near the CJK.
  space-bracket: 1
  # Add space between ``, when near the CJK.
  space-backticks: 1
  # Add space between dash `-`
  space-dash: 0
  # Convert to fullwidth.
  fullwidth: 1
  # To remove space near the fullwidth.
  no-space-fullwidth: 1
  # Fullwidth alphanumeric characters to halfwidth.
  halfwidth-word: 1
  # Fullwidth punctuations to halfwidth in english.
  halfwidth-punctuation: 1
  # Spellcheck
  spellcheck: 2
# Enable or disable in a specific context
context:
  # Enable or disable to format codeblock in Markdown or AsciiDoc etc.
  codeblock: 1
textRules:
  # Config special rules for some texts
  # For example, if we wants to let "Hello你好" just warning, and "Hi你好" to ignore
  # "Hello你好": 2
  # "Hi你好": 0
fileTypes:
  # Config the files associations, you config is higher priority than default.
  # "rb": ruby
  # "Rakefile": ruby
  # "*.js": javascript
  # ".mdx": markdown
spellcheck:
  # Correct Words (Case insensitive) for by Spellcheck
  words:
    - GitHub
    - App Store
    # This means "appstore" into "App Store"
    - AppStore = App Store
    - Git
    - Node.js
    - nodejs = Node.js
    - VIM
    - DNS
    - HTTP
    - SSL

Ignore option

Since: 2.2.0

When you want to config some special words or texts to ignore on format or lint.

The textRules config may help you.

For example, we want:

  • Hello世界 - To just give a warning.
  • Hi你好 - To ignore.

Use can config:

textRules:
  Hello世界: 2
  Hi你好: 0

After that, AutoCorrect will follow your textRules to process.

Ignore files

Use .autocorrectignore to ignore files

Sometimes, you may want to ignore some special files that not want to check.

By default, the file matched .gitignore rule will be ignored.

You can also use .autocorrectignore to ignore other files, format like .gitignore.

Disable by inline comment

If you just want to disable some special lines in a file, you can write a comment autocorrect-disable, when AutoCorrect matched the comment include that, it will disable temporarily.

And then, you can use autocorrect-enable to reopen it again.

For example, in JavaScript:

function hello() {
  // autocorrect-disable
  console.log("现在这行开始autocorrect会暂时禁用");
  console.log("这行也是disable的状态");
  // autocorrect-enable
  let a = "现在起autocorrect回到了启用的状态";
}

The output will:

function hello() {
  // autocorrect-disable
  console.log("现在这行开始autocorrect会暂时禁用");
  console.log("这行也是disable的状态");
  // autocorrect-enable
  let a = "现在起 autocorrect 回到了启用的状态";
}

Disable some rules

Since: 2.0

You can use autocorrect-disable <rule> in a comment to disable some rules.

Rule names please see: Configuration

function hello() {
  // autocorrect-disable space-word
  console.log("现在这行开始autocorrect会暂时禁用.");
  // autocorrect-disable fullwidth
  console.log("这行也是disable的状态.");
  // autocorrect-enable
  let a = "现在起autocorrect回到了启用的状态.";
}

Will get:

function hello() {
  // autocorrect-disable space-word
  console.log("现在这行开始autocorrect会暂时禁用。");
  // autocorrect-disable fullwidth, space-word
  console.log("这行也是disable的状态.");
  // autocorrect-enable
  let a = "现在起 autocorrect 回到了启用的状态。";
}

VS Code Extension

Install Extension

https://marketplace.visualstudio.com/items?itemName=huacnlee.autocorrect

内置 Visual Studio Code 插件,安装后会将 AutoCorrect 和 Visual Studio Code 完整集成,可以达到「保存自动格式化」或「纠正提示」。

如下图:

AutoCorrect for VS Code Extension

Intellij Platform Plugin

AutoCorrect for Intellij Platform Plugin

https://github.com/huacnlee/autocorrect-idea-plugin

GitHub Action

https://github.com/huacnlee/autocorrect-action

Add to your .github/workflows/ci.yml

steps:
  - name: Check source code
    uses: actions/checkout@v3

  - name: AutoCorrect
    uses: huacnlee/autocorrect-action@main

GitLab CI

Add to your .gitlab-ci.yml, to use huacnlee/autocorrect Docker image to check.

autocorrect:
  stage: build
  image: huacnlee/autocorrect:latest
  script:
    - autocorrect --lint
  # Enable allow_failure if you wants.
  # allow_failure: true

Work with ReviewDog

Since: 2.8.0

AutoCorrect can work with reviewdog, so you can use it in CI/CD. ReviewDog will post a comment to your PR with the AutoCorrect change suggestions. Then the PR committer can easy to accept the suggestions.

Use --format rdjson option to output the lint results as the reviewdog supported format.

autocorrect --lint --format rdjson | reviewdog -f=rdjson -reporter=github-pr-review

Use huacnlee/autocorrect-action can help you setup GitHub Action.

Use for programming

AutoCorrect makes for support use in many programming languages.

Benchmark

MacBook Pro (13-inch, M1, 2020)

Use make bench to run benchmark tests.

See autocorrect/src/benches/example.rs for details.

format_050              time:   [8.2420 µs 8.2657 µs 8.2937 µs]
format_100              time:   [14.199 µs 14.246 µs 14.298 µs]
format_400              time:   [40.511 µs 41.923 µs 43.798 µs]
format_html             time:   [204.94 µs 208.61 µs 214.07 µs]
halfwidth_english       time:   [2.4983 µs 2.5541 µs 2.6293 µs]
format_json             time:   [54.037 µs 57.023 µs 61.821 µs]
format_javascript       time:   [102.81 µs 104.41 µs 106.92 µs]
format_json_2k          time:   [8.7609 ms 8.9099 ms 9.1201 ms]
format_jupyter          time:   [81.765 µs 83.038 µs 85.321 µs]
format_markdown         time:   [879.27 µs 894.86 µs 918.30 µs]

spellcheck_50           time:   [1.6012 µs 1.6122 µs 1.6306 µs]
spellcheck_100          time:   [3.0968 µs 3.1696 µs 3.2653 µs]
spellcheck_400          time:   [10.136 µs 10.478 µs 10.898 µs]

lint_markdown           time:   [937.57 µs 942.59 µs 949.15 µs]
lint_json               time:   [59.174 µs 60.302 µs 61.763 µs]
lint_html               time:   [238.03 µs 241.38 µs 245.77 µs]
lint_javascript         time:   [111.64 µs 113.05 µs 114.82 µs]
lint_yaml               time:   [348.56 µs 350.11 µs 352.80 µs]
lint_to_json            time:   [941.25 µs 948.95 µs 958.26 µs]
lint_to_diff            time:   [1.0573 ms 1.0823 ms 1.1134 ms]

Real world benchmark

With MDN Translated Content project, it has about 30K files.

~/work/translated-content $ autocorrect --fix
AutoCorrect spend time: 8402.538ms

Other Extensions

The other implementations from the community.

User cases

License

This project under MIT license.

autocorrect's People

Contributors

dependabot[bot] avatar dunky-z avatar exhades avatar fulgari avatar huacnlee avatar kwanhur avatar messense avatar peterdavehello avatar tombener avatar yangtzech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

autocorrect's Issues

YAML lint 遗漏

responses:
  "400":
    description: "查询失败,请求参数错误."

Bad case: . -> 。has a space

Raw

引进给变量, 转换为机器代码. 这意味着任何变量命名的概念都会被删除

To

- 引进给变量, 转换为机器代码。 这意味着任何变量命名的概念都会被删除
+ 引进给变量,转换为机器代码。这意味着任何变量命名的概念都会被删除

Bad case: Han in URL

Raw:

wiki/网页浏览器列表#基於WebKit排版引擎
https://zh.wikipedia.org/wiki/网页浏览器列表#基於WebKit排版引擎
- wiki/网页浏览器列表#基於 WebKit 排版引擎
+ wiki/网页浏览器列表#基於WebKit排版引擎
- https://zh.wikipedia.org/wiki/网页浏览器列表#基於 WebKit 排版引擎
+ https://zh.wikipedia.org/wiki/网页浏览器列表#基於WebKit排版引擎

spellcheck: replacement in codeblock without '```'

Hello, Markdown do support such codeblock style (for example, use 4 whitespaces after a line without indent)

Test for words replacement in codeblock like this:

    // this is a 'echo' command line demo to print a string: linux
    $ echo linux

    // another demo
    $ cmd-linux -g linux

But the word 'linux' will be replaced by 'Linux' after running autocorrect.

Test for words replacement in codeblock like this:

    // this is a 'echo' command line demo to print a string: linux
    $ echo Linux

    // another demo
    $ cmd-linux -g Linux

If wrap the above codeblock with '```', it is ok.

may be two solve directions:

  1. line with '$' or '#' are command lines (but sometimes, users not explictly add '$' or '#' prompt), which should not be touched, otherwise, the command demo instructions will differ from the original requirement
  2. detect such codeblock style and let them work like the codeblock wrapped with '```'

VSCode是否可以配置.autocorrectrc?

插件介绍里说明提到有支持 .autocorrectrc 配置文件。但是没有说明配置方法,目前根据example添加了.autocorrectrc 配置文件放在待格式化文档同级目录,并不生效。请问VSCode如何配置 .autocorrectrc ?谢谢!

Support pre-commit hook

Under pre-commit, it can add custom hook. So hope autocorrect could support it.

For example, add .pre-commit-hooks.yaml in this repo, then users could import it locally in .pre-commit-config.yaml.

  • .pre-commit-hooks.yaml
-   id: autocorrect-lint
    name: autocorrect lint
    description: "Checks CJK files' copywriting"
    entry: autocorrect --lint
    language: rust
-   id: autocorrect-fix
    name: autocorrect fix
    description: "Fixes CJK files' copywriting"
    entry: autocorrect --fix
    language: rust
  • .pre-commit-config.yaml
- repo: https://github.com/huacnlee/autocorrect
  rev: v1.5.7
  hooks:
    - id: autocorrect-lint
    - id: autocorrect-fix

VSCODE 插件 Format document 命令错误

命令"AutoCorrect:Format document"导致错误(Error:ENOENT:no such file or directory,stat 'd:\Github\hugo.autocorrectrc)。

image

尝试重新安装插件,未能解决问题。

[JS] 运行wasm报错

node_modules/@huacnlee/autocorrect/autocorrect_bg.js 的170行wasm.__wbindgen_add_to_stack_pointer(16);在Chrome中报错并无法运行:

Uncaught (in promise) TypeError: wasm.__wbindgen_add_to_stack_pointer is not a function

实现代码

    let [value, setValue] = React.useState(`# Hello你好!`);
    let Out1;
    autocorrect.then((autocorrect) => {
        Out1 = autocorrect.format(value); //报错 
    });

Base case: <code>

- 关键字和<code>Promise</code>构造器创建它的对象
+ 关键字和 <code>Promise</code> 构造器创建它的对象

Bad case: Avoid add space near the link in Markdown file

它指向一个[示例](#示例)
它无需[握手](https://zh.wikipedia.org/wiki/握手_(技术))或改进、完善现有条。
- 它指向一个 [示例](#示例)
+ 它指向一个[示例](#示例)
- 它无需[握手](https://zh.wikipedia.org/wiki/握手_(技术)) 或改进、完善现有条。
+ 它无需[握手](https://zh.wikipedia.org/wiki/握手_(技术))或改进、完善现有条。

超长文本(20W字符)执行失败

您好,感谢分享,短文章很好用,效率和准确度都很高。但是在Correct长文章时,出现了问题。

20W的字符,I5 10代 用了8线程,运行了大概24小时仍然没有结束。

  • 是否有字符上限?
  • 建议字符是多少?

谢谢!

插件运行错误,格式化不生效

插件->运行时状态->未捕获的错误(1)

path should be a path.relative()d string, but got "............\Programs\VSCode\data\user-data\User\settings.json"

我使用的是 portable 版的 vsc,数据配置保存 vsc 目录下,也就是报错提示中的路径,报错可能与此有关

image

对 Markdown 的支持

大多数时候都很好用。感谢维护。

但似乎没有在 VSCode 插件中支持对 Markdown 文件的 Lint?

Details

image

另:

尝试在 VSCode 中开启插件:

"autocorrect.enable": true,
"autocorrect.formatOnSave": true,

然后新建一个 Markdown 文件,在文件中输入 N~2~为...,然后保存,会发现转成了 N~2~为...(第二个 ~ 变为了一个全角字符),这大概是预期之外的。

暂时禁用 AutoCorrect 并保存:

<!-- autocorrect: false -->
N~2~为...
<!-- autocorrect: true -->

会发现暂时禁用无效,仍然会被更正。

spellcheck: replacement happen in the url alias map

原文:

这里是 [链接][1]。
...

[1]: https://example.com/xxx/yyy/zzz-linux

运行 Autocorrect 之后:

这里是 [链接][1]。
...

[1]: https://example.com/xxx/yyy/zzz-Linux

The linux has been wrongly replaced by Linux, this should not happen.

Need space between link.

Based href

Maybe

- 这是[link链接](https://google.com/a/b/url不处理)测试

should be like

- 这是 [link链接](https://google.com/a/b/url不处理) 测试

image

中文全角的引号

中文全角的引号前面应该不需要加空格:,目前自动会加一个空格,不支持是不是预期的。譬如:

你好“世界“

会被格式化成

你好 “世界”

Markdown 中如果使用了 Wikilinks 不应该被添加空格

[[wikilink]] 作为跳转的 keyword 不应该被添加空格,否则就无法跳转了

- [[2021年十大漏洞利用]]
+ [[2021 年十大漏洞利用]]
- [[从研究者视角看漏洞研究之2010年代]]
+ [[从研究者视角看漏洞研究之 2010 年代]]

不支持 C++?

目前似乎不支持检查 C++ 源文件?

示例文件:hello.cc

// 你好world

运行 autocorrect

$ ./autocorrect hello.cc
无输出

.cc 重命名成 .txt

$ mv hello.cc hello.txt
$ ./autocorrect hello.txt
// 你好 world

Bad case: String in URL

- slug: Web/JavaScript/Reference/Operators/async 允许声明一个函数为一个包含异步操作的函数
+ slug: Web/JavaScript/Reference/Operators/async允许声明一个函数为一个包含异步操作的函数

[Bug] Lint 跳转失败

AutoCorrect 在 VSCode 的问题面板中给出了纠正建议,但是我点击建议无法正常跳转,提示无法打开...: Unable to resolve resource...

Details

demo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.