This module can be used to replace keywords in sentences or extract keywords from sentences.
$ pip install flashtext
- Extract keywords
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple', 'New York') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.') >>> keywords_found >>> ['New York', 'Bay Area']
- Replace keywords
>>> keyword_processor.add_keyword('New Delhi', 'NCR region') >>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.') >>> new_sentence >>> 'I love New York and NCR region.'
- Case Sensitive example
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor(case_sensitive=True) >>> keyword_processor.add_keyword('Big Apple', 'New York') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') >>> keywords_found >>> ['Bay Area']
- No clean name for Keywords
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') >>> keywords_found >>> ['Big Apple', 'Bay Area']
Documentation can be found at FlashText Read the Docs.
$ git clone https://github.com/vi3k6i5/flashtext $ cd flashtext $ pip install pytest $ python setup.py test
It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.
To do the same with regex it will take a lot of time:
Docs count | # Keywords | : | Regex | flashtext |
---|---|---|---|---|
1.5 million | 2K | : | 16 hours | Not measured |
2.5 million | 10K | : | 15 days | 15 mins |
The idea for this library came from the following StackOverflow question.
- Issue Tracker: https://github.com/vi3k6i5/flashtext/issues
- Source Code: https://github.com/vi3k6i5/flashtext/
The project is licensed under the MIT license.