View Code? Open in Web Editor
NEW
Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the proximity of vowels or consonants).
License: GNU General Public License v3.0
Python 98.56%
Makefile 1.44%
russianwordsclusters's People
Contributors
Stargazers
Watchers
russianwordsclusters's Issues
Today formed group:
видеться/у-
обижать/обидеть
предвидеть
But we want:
видеться/у-
предвидеть
обижать/обидеть
Solution:
- either this can be fixed by not recognizing the relation and exclude обижать/обидеть (see comment TODO in the first case in the compare function)
- either have 2 relations: a Relation.STEM for видеться/у- and предвидеть, and a new Relation.lowerSTEM for обижать/обидеть and предвидеть
POC on different level of scoring:
- 1 for words having a shared root. Ex: делиться with разделять / выделяться...
- 0.9 for words with consonant or vowel transformation
Obj : generate clusters with shared root as priority, then letter transformations
- clean repo
- pip
- README
- rebase