johnbumgarner / wordhoard Goto Github PK

View Code? Open in Web Editor NEW

111.0 4.0 11.0 552 KB

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

License: MIT License

Python 100.00%

antonyms synonyms definitions lexicon wordsearch hypernyms hyponyms homophones textual-analysis python

wordhoard's Introduction

Overviews

Primary Use Case

Textual analysis is a broad term for various research methodologies used to qualitatively describe, interpret and understand text data. These methodologies are mainly used in academic research to analyze content related to media and communication studies, popular culture, sociology, and philosophy. Textual analysis allows these researchers to quickly obtain relevant insights from unstructured data. All types of information can be gleaned from textual data, especially from social media posts or news articles. Some of this information includes the overall concept of the subtext, symbolism within the text, assumptions being made and potential relative value to a subject (e.g. data science). In some cases it is possible to deduce the relative historical and cultural context of a body of text using analysis techniques coupled with knowledge from different disciplines, like linguistics and semiotics.

Word frequency is the technique used in textual analysis to measure the frequency of a specific word or word grouping within unstructured data. Measuring the number of word occurrences in a corpus allows a researcher to garner interesting insights about the text. A subset of word frequency is the correlation between a given word and that word's relationship to either antonyms and synonyms within the specific corpus being analyzed. Knowing these relationships is critical to improving word frequencies and topic modeling.

WordHoard was designed to assist researchers performing textual analysis to build more comprehensive lists of antonyms, synonyms, hypernyms, hyponyms and homophones.

Installation

Install the distribution via pip:

pip3 install wordhoard

General Package Utilization

Please reference the WordHoard Documentation for package usage guidance and parameters.

Sources

This package is currently designed to query these online sources for antonyms, synonyms, hypernyms, hyponyms and definitions:

classicthesaurus.com
collinsdictionary.com
merriam-webster.com
synonym.com
thesaurus.com
wordhippo.com
wordnet.princeton.edu

Dependencies

This package has these core dependencies:

backoff
BeautifulSoup
deckar01-ratelimit
deepl
lxml
requests
urllib3

Additional details on this package's dependencies can be found here.

Development Roadmap

If you would like to contribute to the WordHoard project please read the contributing guidelines.

Items currently under development:

Expanding the list of hypernyms, hyponyms and homophones
Adding part-of-speech filters in queries

Issues

This repository is actively maintained. Feel free to open any issues related to bugs, coding errors, broken links or enhancements.

You can also contact me at John Bumgarner with any issues or enhancement requests.

Sponsorship

If you would like to contribute financially to the development and maintenance of the WordHoard project please read the sponsorship information.

License

The MIT License (MIT). Please see License File for more information.

Author

wordhoard's People

Contributors

Stargazers

Watchers

Forkers

lgs adiomari zhecanjameswang danielmlow jcarlosneto rituranjang80 sandy4321 gorluxor stungkit vulcanical datenmischwerk

wordhoard's Issues

find_definitions can returns antonyms

Find definitions will sometimes return antonyms because the antonym cache is checked by mistake.

Returns
----------
:return: list of definitions
:rtype: list
"""
valid_word = self._validate_word()
if valid_word:
check_cache = caching.cache_antonyms(self._word)

TypeError: object of type 'NoneType' has no len()

Hi John,

Thanks for putting wordhoard together! I keep running into the following error, though not all the time:
TypeError: object of type 'NoneType' has no len()
(details below)

What I've been doing is just to feed in to wordhoard a file with a list of unique words on it, in order to find antonyms, and then I write the dictionary object out to a second file.

Here's where it works correctly:

antonyms were found for the word: bassoon [Small note -- should be 'No' antonyms I think]
Please verify that the word is spelled correctly.
None

antonyms were found for the word: bassist [Again, I think it should be 'No' antonyms]
Please verify that the word is spelled correctly.

I tried writing code like this:

Check if the result is not None before processing

        if wordhoard_antonym_results is not None:
            with open(output_filename, 'a') as outfile:
                outfile.write(json.dumps(wordhoard_antonym_results) + '\n')

But that didn't work, I suspect because the failure is happening in code earlier than when my code runs.

At any rate, here is the error, which for some reason is now occurring all the time. Not sure if this is because my ipaddress is being blocked, or what:

mown
INFO:wordhoard.antonyms:Thesaurus.com had no antonym reference for the word mown
ERROR:wordhoard.utilities.request_html:A RequestException has occurred when requesting https://www.wordhippo.com/what-is/the-opposite-of/mown.html
ERROR:wordhoard.utilities.request_html: File "/usr/local/lib/python3.9/dist-packages/wordhoard/utilities/request_html.py", line 102, in get_website_html
response = self._requests_retry_session().get(self._url_to_scrape,
File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/requests/adapters.py", line 556, in send
raise RetryError(e, request=request)

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

A RequestException has occurred.
Please review the WordHoard logs for additional information.
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 878, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 878, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 878, in urlopen
return self.urlopen(
[Previous line repeated 2 more times]
File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 868, in urlopen
retries = retries.increment(method, url, response=response, _pool=self)
File "/usr/local/lib/python3.9/dist-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.wordhippo.com', port=443): Max retries exceeded with url: /what-is/the-opposite-of/mown.html (Caused by ResponseError('too many 503 error responses'))