Comments (2)
I think this is just a case sensitivity issue.
$ cat words.txt|grep -i ^ned
NED
Neda
NEDC
Nedda
nedder
Neddy
Neddie
neddies
Neddra
Nederland
Nederlands
Nedi
Nedra
Nedrah
Nedry
Nedrow
Nedrud
While it would be nice for these files to be perfectly formatted, this is a good reminder to clean your data before doing calculations.
from english-words.
This problem does exist, however. I found 25 missing words with these python3 commands (pasted here for reference):
> import requests
> r = requests.get('https://raw.githubusercontent.com/dwyl/english-words/words.txt')
> r.status_code
200
> w = set(r.text.lower().split())
> len(w)
466546
> r = requests.get('https://raw.githubusercontent.com/dwyl/english-words/words_alpha.txt')
> r.status_code
200
> wa = set(r.text.lower().split())
> len(wa)
370103
> missing = wa - w
> len(missing)
25
> missing
{'preinferredpreinferring', 'stegnosisstegnotic', 'tangantangan', 'false', 'sturdiersturdies', 'peroxidicperoxiding', 'gynecicgynecidal', 'coevolvedcoevolves', 'preobtrudingpreobtrusion', 'kestrelkestrels', 'aliyahaliyahs', 'coracoprocoracoid', 'cylindrocylindric', 'killeekillee', 'antinganting', 'epigonousepigons', 'snailfishessnailflower', 'outwardsoutwarred', 'regeneratoryregeneratress', 'cryptocurrency', 'quadriquadric', 'subsultorysubsultus', 'brigantinebrigantines', 'caducecaducean', 'hypophypophysism'}
Note that there's this other problem of there seemingly being several words that have been merged together somehow, but it's also true that not all words in words_alpha.txt are in words.txt (ex "false").
from english-words.
Related Issues (20)
- Invalid word 'acceleratorh' still in zip-file.
- should "Oreo" be on the list? HOT 1
- is "overlubricatio" really a word? HOT 4
- The words cir and cyrano are swapped. HOT 1
- Doesn't contain floccinaucinihilipilification
- How is this copyrightable? HOT 1
- Dictionary is not in lexicographic order! (words_alpha.txt)
- Doesn't contain "unevaluable"
- missing words and plurals, a handful of misspellings, and incorrectly placed hyphens HOT 2
- What does "etwite" mean? HOT 2
- Missing word courgettes
- Referencing Bookworm Deluxe, 15069 words are missing from words_alpha.txt
- Contains common spelling error "ocurred".
- O
- How is 2 a word? HOT 3
- Missing Words HOT 3
- Seemingly invalid word: "greing"
- Missing Nudiustertian & Petrichor & Hippopotomonstrosesquippedaliophobia
- Several non-English words made it into the list HOT 4
- Error Words HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from english-words.