The Oxford Dictionary defines wordhoard as a supply of words or a lexicon. Wordhoard is a Python 3 module that can be used to obtain antonyms, synonyms and definitions for words.
This Python module was spawned from a Stack Overflow bountied question. That question forced me to looked into the best practices for obtaining a comprehensive lists of synonyms for a given word. During my research, I developed the repository synonym discovery and aggregation and decided to create wordhoard.
Textual analysis is a broad term for various research methodologies used to qualitatively describe, interpret and understand text data. These methodologies are mainly used in academic research to analyze content related to media and communication studies, popular culture, sociology, and philosophy. Textual analysis allows these researchers to quickly obtain relevant insights from unstructured data. All types of information can be gleaned from textual data, especially from social media posts or news articles. Some of this information includes the overall concept of the subtext, symbolism within the text, assumptions being made and potential relative value to a subject (e.g. data science). In some cases it is possible to deduce the relative historical and cultural context of a body of text using analysis techniques coupled with knowledge from different disciplines, like linguistics and semiotics.
Word frequency is the technique used in textual analysis to measure the frequency of a specific word or word grouping within unstructured data. Measuring the number of word occurrences in a corpus allows a researcher to garner interesting insights about the text. A subset of word frequency is the correlation between a given word and that word's relationship to either antonyms and synonyms within the specific corpus being analyzed. Knowing these relationships is critical to improving word frequencies and topic modeling.
Wordhoard was designed to assist researchers performing textual analysis to build more comprehensive antonyms and synonyms lists.
Install the distribution via pip:
pip3 install wordhoard
from wordhoard import antonyms
results = antonyms.query_synonym_com('mother')
print(results)
['father', 'male parnt', 'child', 'descendant', 'follower']
from wordhoard import antonyms
antonyms_01 = antonyms.query_synonym_com('mother')
antonyms_02 = antonyms.query_thesaurus_com('mother')
antonyms_03 = antonyms.query_thesaurus_plus('mother')
antonyms_results = sorted(set([y for x in [antonyms_01, antonyms_02, antonyms_03] for y in x]))
print(antonyms_results)
['abort', 'begetter', 'brush', 'brush aside', 'brush off', 'child', 'dad', 'daughter', 'descendant', 'effect', 'end', 'father', 'follower',
forget', 'ignore', 'lose', 'male parent', 'miscarry', 'neglect', 'offspring', 'overlook', 'result', 'slight']
from wordhoard import synonyms
results = synonyms.query_synonym_com('mother')
print(results)
['female parent', 'ma', 'mama', 'mamma', 'mammy', 'mater', 'mom', 'momma', 'mommy', 'mother-in-law', 'mum', 'mummy', 'para I', 'parent',
'primipara', 'puerpera', 'quadripara', 'quintipara', 'supermom', 'surrogate mother']
from wordhoard import synonyms
synonym_01 = synonyms.query_collins_dictionary_synonym('mother')
synonym_02 = synonyms.query_synonym_com('mother')
synonym_03 = synonyms.query_thesaurus_com('mother')
synonym_04 = synonyms.query_thesaurus_plus('mother')
synonym_results = sorted(set([y for x in [synonym_01, synonym_02, synonym_03, synonym_04] for y in x]))
print(synonym_results)
['ancestor', 'antecedent', 'architect', 'author', 'begetter', 'beginning', 'child-bearer', 'creator', 'dam', 'female parent', 'forebearer', 'forefather', 'foster mother', 'founder', 'fount', 'fountain', 'fountainhead', 'genesis', 'inspiration', 'inventor', 'lady', 'ma', 'maker', 'mam', 'mama', 'mamma', 'mammy', 'mater', 'materfamilias', 'matriarch', 'mom', 'momma', 'mommy', 'mother-in-law', 'mum', 'mummy', 'nurse', 'old lady', 'old woman', 'origin', 'originator', 'para I', 'parent', 'predecessor', 'primipara', 'procreator', 'producer', 'progenitor', 'provenience', 'puerpera', 'quadripara', 'quintipara', 'sire', 'source', 'spring', 'start', 'stimulus', 'supermom', 'surrogate mother', 'wellspring']
from wordhoard import synonyms
list_of_words = ['mother', 'daughter', 'father', 'son']
synonyms_results = {}
for word in list_of_words:
results = synonyms.query_synonym_com(word)
synonyms_results[word] = results
for key, value in synonyms_results.items():
print(key, value)
mother ['female parent', 'ma', 'mama', 'mamma', 'mammy', 'mater', 'mom', 'momma', 'mommy', 'mother-in-law', 'mum', 'mummy', 'para I', 'parent', 'primipara', 'puerpera', 'quadripara', 'quintipara', 'supermom', 'surrogate mother']
daughter ['female offspring', 'girl', "mother's daughter"]
father ['begetter', 'dad', 'dada', 'daddy', 'father-in-law', 'male parent', 'old man', 'pa', 'papa', 'pappa', 'parent', 'pater', 'pop']
son ['Jnr', 'Jr', 'Junior', 'boy', 'male offspring', "mama's boy", "mamma's boy", 'man-child', "mother's boy"]
from wordhoard import dictionary
results = dictionary.query_synonym_com('mother')
print(results)
['a woman who has given birth to a child (also used as a term of address to your mother)']
from wordhoard import dictionary
definition_01 = dictionary.query_collins_dictionary_synonym('mother')
definition_02 = dictionary.query_synonym_com('mother')
definition_03 = dictionary.query_thesaurus_com('mother')
definition_results = [y for y in [definition_01, definition_02, definition_03]]
print(definition_results)
["a person's own mother", 'a woman who has given birth to a child (also used as a term of address to your mother)', 'female person who has borne children']
A more advanced example is provided in the example script nlp synonym use case.
wordhoard uses an in-memory dictionary cache, which helps prevent redundant queries to an individual resource for the same word. This application also uses Python logging to both the terminal and to the logfile wordhoard_error.yaml.
This package is designed to query these online sources for antonyms, synonyms and definitions:
- collinsdictionary.com
- wordnet.princeton.edu
- synonym.com
- thesaurus.com
- thesaurus.plus
This package has these dependencies:
- BeautifulSoup
- lxml
- requests
- urllib3
The MIT License (MIT). Please see License File for more information.