Comments (14)
Also is "giggish" actually a word?
Yes it is: https://www.wordnik.com/words/giggish
from english-words.
@NadavB I suspect the apostrophes may have been removed by mistake. Feel free to re-add them in. PR gladly accepted.
from english-words.
How?
from english-words.
@NadavB As far as I can tell, it looks like the process would be ad hoc for this project. Open words.txt
in a text editor, use a regex to find pairs of lines like (.*nt$).*n't$
and delete the lines that look bad.
Then, remove copies of the deleted lines from words_alpha
, update the corresponding zip files (why?), and submit a pull request.
from english-words.
But where is the current code that generated words_alpha.txt from words.txt so we can modify it?
from english-words.
I don't think it was ever committed. What I see in the history is that someone just added a words_alpha file, and other people modified it directly.
from english-words.
Also is "giggish" actually a word?
from english-words.
@LameLemon I couldn't find a definition for "giggish," and it looks like it came from the original infochimps dataset. You can probably remove it.
To address to the original issue of "are strange words a bug," I think we should say no and close the thread. The underlying reason for the presence of nonwords is the choice of data sources. More carefully curated corpora either cost more or have fewer words.
from english-words.
@dbrakman so can you commit it please? Otherwise people can't contribute to it...
from english-words.
@NadavB I understand why it should be committed, but I don't have that script. I didn't make these lists.
from english-words.
Ahh, I understand. So if someone from the authors see this thread, please commit, thanks...
from english-words.
@dbrakman It won't help. The word "aren" is found in words.txt as well. So unless someone show how the file words.txt was extracted from the corpus, I don't think this whole repository is usable at all.
from english-words.
'aaa' isn't a word either
from english-words.
H
from english-words.
Related Issues (20)
- Doesn't contain floccinaucinihilipilification
- How is this copyrightable? HOT 1
- Dictionary is not in lexicographic order! (words_alpha.txt)
- Doesn't contain "unevaluable"
- missing words and plurals, a handful of misspellings, and incorrectly placed hyphens HOT 2
- What does "etwite" mean? HOT 2
- Missing word courgettes
- Referencing Bookworm Deluxe, 15069 words are missing from words_alpha.txt
- Contains common spelling error "ocurred".
- O
- How is 2 a word? HOT 3
- Missing Words HOT 3
- Seemingly invalid word: "greing"
- Missing Nudiustertian & Petrichor & Hippopotomonstrosesquippedaliophobia
- Several non-English words made it into the list HOT 4
- Error Words HOT 3
- A lot of gibberish HOT 2
- Added to DiceWords repo -- Requesting how you'd like to be credited
- Roman numerals are in the file that shouldn't contain "numbers"
- Why are there subtitles inside the list???? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from english-words.