Comments (13)
目前计划是
- 对于不同来源的词库建立对应独立的仓库进行维护和发布,然后在本仓库提供汇总的下载连接
- 提供配置项,方便用户更换或搭配多种词库使用
本 issue 的目的是收集靠谱的词库来源,靠谱的来源要满足
无版权问题(必须明确声明允许自由使用,处理和分发)
词库内容靠谱(比如某论坛转发的找不到原作者的无法维护的词库就不行)
方便程序进行处理(比如 pdf,网页,就不行)
欢迎各位在此分享合适的词库。
from rime-easy-en.
默认词库来自于 https://github.com/skywind3000/ECDICT
生成工具 https://github.com/BlindingDark/rime_easy_eng_dict
该工具默认导出的词为带有词频 或 长度小于等于 9 的不包含数字的词
from rime-easy-en.
Word frequency词频语料库
- Global English
- 英英学習型词典的词频标示判读心得
- 单词释义比例词典
Learn these words first
New General Service List
- New General Service List
- General Service List - Wikipedia
- New General Service List - Wikipedia
- NGSL by Frequency | Quizlet
COCA
- COCA Word frequency: based on 450 million word COCA corpus
- Corpus of Contemporary American English (COCA)
- Word frequency: based on 450 million word COCA corpus
- COCA Frequency list::查看 60000 单词词频排名
BNC - British National Corpus
- [bnc] British National Corpus
- BNCweb registration - Main
- BNC database and word frequency lists
- British National Corpus (BYU-BNC)
- Paul Nation Vocabulary Lists based on BNC
- Companion Website for: Word Frequencies in Written and Spoken English
- BNC 15000 - Sustainable English
Oxford words list
- Oxford 3000 and 5000 | OxfordLearnersDictionaries.com
- About Word Lists at Oxford Learner's Dictionaries | OxfordLearnersDictionaries.com
Longman Communication 3000 and 9000
- Longman Communication 3000
- sapbmw/Longman-Communication-3000: Longman Communication 3000 word list, English Words List - Learn English Words
- Longman Communication 9000 from LDOCE6 - PDAWIKI
- Longman Communication 9000
google 10000
Longman Defining Vocabulary
- Longman Defining Vocabulary/alphabetically - FrathWiki
- The Longman Defining Vocabulary
- Longman Defining Vocabulary - FrathWiki
- Longman Defining Vocabulary - FrathWiki
Macmillan Dictionary Common English Words
Famous Freq Lists - Lextutor.ca
- VOA Special English Word Book
- Wikipedia:Lists of common misspellings - Wikipedia
- Hyper Collocation — dictionary based on arXiv repository
- 媒体语言语料库(MLC)
- Open American National Corpus | Open Data for Language Research and Education
- Richard Kennaway's Constructed Languages List
- Mills Basic Vocabulary - FrathWiki
- Basic English - Wikipedia
- jjzz/ZZ-WordFreq: words frequency top100k from BNC/ANC/COCA, dsl format, for goldendict
- [2016.4.26更新]17万词BNC+ANC+COCA词频词典
- 更新 Word Frequency of 170,000 Words
- Just The Word 搭配使用频率
from rime-easy-en.
@VimWei 尽量发些无版权疑问的,程序能处理的词库来源,比如一些大学使用的学习资料,我们应该是无权直接拿来用的,而且一些是 pdf 格式的,wiki 连接之类的,这些都不方便进行处理。麻烦整理一个筛选过后的列表。
from rime-easy-en.
专业词库: https://pinyin.sogou.com/dict/
专业词典: https://www.pdawiki.com/forum/forum.php?mod=viewthread&tid=42412
from rime-easy-en.
专业词库: https://pinyin.sogou.com/dict/
专业词典: https://www.pdawiki.com/forum/forum.php?mod=viewthread&tid=42412
- 无版权
- 无法处理
from rime-easy-en.
@VimWei 尽量发些无版权疑问的,程序能处理的词库来源,比如一些大学使用的学习资料,我们应该是无权直接拿来用的,而且一些是 pdf 格式的,wiki 连接之类的,这些都不方便进行处理。麻烦整理一个筛选过后的列表。
恩,上述纯粹就是我的浏览器书签导出的,确实未作整理。不过,我们没有必要把所有可用的资料都处理成现成的词库。
建议:给出一两个典型的词库案例,提供使用自定义词库的机制、如何自定义词库的教程等,其他的就让用户自己想办法解决即可。
from rime-easy-en.
使用自定义词库的机制、如何自定义词库的教程
普通用户没有这个能力和精力。
词库是面向最终使用者的,本 issue 目的是收集靠谱的词库来源。靠谱的来源要满足
- 无版权问题(必须明确声明允许自由使用,处理和分发)
- 词库内容靠谱(比如某论坛转发的找不到原作者的无法维护的词库就不行)
- 方便程序进行处理(比如 pdf,网页,就不行)
from rime-easy-en.
纯粹开源、无版权的资料,确实少之又少。使用效果也不好。
还是忽略上述资料吧,它们仅作为解释说明:什么是词频语料库、什么是专业词库。
PS:使用Rime的用户,估计都喜欢折腾。。。不能定义为普通用户。。。我曾经下载过,放弃了,这两天才又捡起来。。。
from rime-easy-en.
纯粹开源、无版权的资料,确实少之又少
ECDICT 是我能找到的最靠谱的开源词库了,可以围绕它来做几个裁剪和修补。
from rime-easy-en.
wiktionary: https://en.wiktionary.org/
Wiktionary is a wiki, which means that you can edit it, and all the content is dual-licensed under both the Creative Commons Attribution-ShareAlike 3.0 Unported License and the GNU Free Documentation License.
虽然原版是基于网页的,但已有不少基于此的mdx词典,应该比较容易转换。
from rime-easy-en.
rime是一个很好的输入法程序,但也存在一些较大的不足。其中一个就是词库的建立和精选。
提高词库的效率有两个两个方法,我需要的在里面,我不需要的不在里面。只关注其一,如加大词库数量无法提高词库的效率。
现在rime似乎无法删除一些已有词库里的词(其宣称的ctr+del,shift+del,ctr+k可以删除自造词,无法删除一些词库里的词,甚止降低已有词库权重也难以做到。降低已有词库权重偶而可以做到,很不稳定)。有些词一般用户用不到,如果从词库删除可以加大输入效率。能否对原有程序进行修改,使得可以删除任意词组。
解决词库里删除词语问题,也可以让用户删除一些个人隐私词语,共享出个人词库,从而精炼出好的词库。
from rime-easy-en.
能否对原有程序进行修改
不在本 issue 讨论范围之内。请去 rime 那边反馈意见。
不过你可以用文本编辑器直接修改词库。
from rime-easy-en.
Related Issues (20)
- 设置中英混输时出错 HOT 2
- 建议 HOT 1
- 手动安装,微软双拼无法开启中英混输 HOT 14
- 请问怎么调整明月拼音未输入完整拼音时, 中文优先在前 HOT 1
- 很奇怪的置顶词条无法删除:buts、yu、vs等 HOT 3
- 一些安装过程的注意点 HOT 2
- 手动安装没有作用 HOT 1
- 手动配置混输模式,与挂载词库产生冲突 HOT 2
- 单独输入的字母没法输入
- 如何限制easy en建议的词条数量 HOT 2
- 连续输入增强一直无法使用, 请指导帮忙, 谢谢 HOT 4
- 求连续增强输入rime.dll
- rime从 0.14.3升级到0.15.0不能连续输入了 HOT 1
- Feat: 模块化 HOT 1
- please update new version, is long times
- cannot show words HOT 3
- please merge https://github.com/skywind3000/ECDICT HOT 1
- 手机ios怎么配置呀有教程吗 HOT 1
- 请问选择候选词后自动插入空格的代码怎么写
- `%1 is not a valid Win32 application.` when using `wordninja_windows_64_lua53.dll` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rime-easy-en.