Git Product home page Git Product logo

emot's Introduction

Downloads GitHub issues GitHub forks GitHub stars GitHub license

Description of the emot:3.1 library

Emot is a python library to extract the emojis and emoticons from a text(string). All the emojis and emoticons are taken from a reliable source details are listed in source section.

Emot 3.1 release moto is: high-performance detection library for data-science specially for large scale datasets of text.

Emot use advance dynamic pattern generation. It means everytime when you create object it generate pattern based on the database(emo_unicode.py). You can add/delete/modify that file under library to create other dynamic pattern.

3.0 version provide more option such as bulk processing. It is useful when you have long list of "sentence or word" and want to use multiprocessing power to speedup the process.

It means you can dynamically create pattern for the emoji or emoticons and run it in multiprocessing to get maximum performance from multiple cores.

Again, I am thankful for all support and help from the community around the world. I hope this will help and make your life easier.

Compatibility

version 3.0 only support python 3.X.

Python 2.X is no longer supported.

Working

The Emot library takes a string/list of string as an input and returns a dictonary.

There are one class name emot containing four different function.

emot.emoji:

  • Input: It has one input: string
  • Output: It will return dictionary with 4 different value: dict
    • value = list of emojis
    • location = list of location list of emojis
    • mean = list of meaning
    • flag = True/False. False means library didn't find anything and True means we find something.

emot.emoticons

  • Input: It has one input: string
  • Output: It will return dictionary with 4 different value: dict
    • value = list of emoticons
    • location = list of location list of emoticons
    • mean = list of meaning
    • flag = True/False. False means library didn't find anything and True means we find something.

emot.bulk_emoji

  • Input: Two input: List of string and CPU cores pool: list[], int
    • By default CPU cores pool value is half of total available cores: multiprocessing.cpu_count()/2
  • Output: It will return list of dictionary with 4 different value: list of dict
    • value = list of emojis
    • location = list of location list of emojis
    • mean = list of meaning
    • flag = True/False. False means library didn't find anything and True means we find something.

emot.bulk_emoticons

  • Input: Two input: List of string and CPU cores pool: list[], int
    • By default CPU cores pool value is half of total available cores: multiprocessing.cpu_count()/2
  • Output: It will return list of dictionary with 4 different value: list of dict
    • value = list of emoticons
    • location = list of location list of emoticons
    • mean = list of meaning
    • flag = True/False. False means library didn't find anything and True means we find something.

Example

>>> import emot 
>>> emot_obj = emot.core.emot() 
>>> text = "I love python โ˜ฎ ๐Ÿ™‚ โค :-) :-( :-)))" 
>>> emot_obj.emoji(text) 
>>> {'value': ['โ˜ฎ', '๐Ÿ™‚', 'โค'], 'location': [[14, 15], [16, 17], [18, 19]], 'mean': [':peace_symbol:', 
':slightly_smiling_face:', ':red_heart:'], 'flag': True} 
>>> emot_obj.emoticons(test) >>> {'value': [':-)', ':-(', ':-)))'], 'location': [[20, 23], [24, 27], [28, 33]], 
'mean': ['Happy face smiley', 'Frown, sad, angry or pouting', 'Very very Happy face or smiley'], 'flag': True} 

Running bulk string emoji and emoticons detection. When user has access multiple processing cores.

>>> import emot 
>>> emot_obj = emot.core.emot() 
>>> bulk_test = ["I love python โ˜ฎ ๐Ÿ™‚ โค :-) :-( :-)))", "I love python 
๐Ÿ™‚ โค :-) :-( :-)))", "I love python โ˜ฎ โค :-) :-( :-)))", "I love python โ˜ฎ ๐Ÿ™‚ :-( :-)))"] 
>>>
>>> emot_obj.bulk_emoji(bulk_test) 
>>> [{'value': ['โ˜ฎ', '๐Ÿ™‚', 'โค'], 'location': [[14, 15], [16, 17], [18, 19]], 
    'mean': [':peace_symbol:', ':slightly_smiling_face:', ':red_heart:'], 'flag': True}, {'value': ['๐Ÿ™‚', 'โค'], 
    'location': [[14, 15], [16, 17]], 'mean': [':slightly_smiling_face:', ':red_heart:'], 'flag': True}, {'value': [
    'โ˜ฎ', 'โค'], 'location': [[14, 15], [16, 17]], 'mean': [':peace_symbol:', ':red_heart:'], 'flag': True}, 
    {'value': ['โ˜ฎ', '๐Ÿ™‚'], 'location': [[14, 15], [16, 17]], 'mean': [':peace_symbol:', ':slightly_smiling_face:'], 
    'flag': True}] 
>>>
>>> emot_obj.bulk_emoticons(bulk_test)
>>> [{'value': [':-)', ':-(', ':-)))'], 'location': [[20, 23], [24, 27], [28, 33]], 'mean': ['Happy face smiley', 
    'Frown, sad, angry or pouting', 'Very very Happy face or smiley'], 'flag': True}, {'value': [':-)', ':-(', ':-)))'], 
    'location': [[18, 21], [22, 25], [26, 31]], 'mean': ['Happy face smiley', 'Frown, sad, angry or pouting', 'Very 
    very Happy face or smiley'], 'flag': True}, {'value': [':-)', ':-(', ':-)))'], 'location': [[18, 21], [22, 25], 
    [26, 31]], 'mean': ['Happy face smiley', 'Frown, sad, angry or pouting', 'Very very Happy face or smiley'], 
    'flag': True}, {'value': [':-(', ':-)))'], 'location': [[18, 21], [22, 27]], 'mean': ['Frown, sad, angry or 
    pouting', 'Very very Happy face or smiley'], 'flag': True}]

Installation

Via pip:

$ pip install emot --upgrade

From master branch:

$ git clone https://github.com/NeelShah18/emot.git
$ cd emot
$ python setup.py install

Developing

$ git clone https://github.com/NeelShah18/emot.git
$ cd emot

Sources

Emoji Cheat Sheet

Official unicode list

official emoticons list

Authors

Neel Shah / @NeelShah18

Shubham Rohilla / @kakashubham

emot's People

Contributors

abhiaj avatar avesh-singh avatar chamathpali avatar jdhauswirth avatar neelshah18 avatar oliph avatar ryanwellsr avatar shubhamrohilla05 avatar tarikaltuncu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

emot's Issues

Hi I have a question๏ผŒ

code:
content='Q: what's your favorite song? :D A: I like too many songs to have a favorite'
ans = emot.emoticons(content)

Input:
Q: what's your favorite song? :D A: I like too many songs to have a favorite
Output:
[{'flag': False}]
I think my output it's wrong

Originally posted by @martin8408 in #11 (comment)

Getting error while using the latest version

Hi,
I am facing an issue while upgrading to the latest emot version:
This is my import statement
from emot.emo_unicode import UNICODE_EMO, EMOTICONS

Error:
ImportError: cannot import name 'UNICODE_EMO' from 'emot.emo_unicode' (/usr/local/lib/python3.7/dist-packages/emot/emo_unicode.py)

Please help

Missing ๐Ÿคจ and ๐ŸฆŸ emojis

Describe the bug
In the emot dictionary it's missing two emojis, ๐Ÿคจ and ๐ŸฆŸ , respectively the unicodes (U+1F928, U+1F99F)

To Reproduce
Steps to reproduce the behavior:

def convert_emojis(text):
    for emot in UNICODE_EMO:
        text = text.replace(emot, "_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()))
    return text

convert_emojis('Hello ๐Ÿคจ')

Expected behavior
'Hello :face_with_raised_eyebrow: '

Additional context
The same example applies to the mosquito emoji.

Reversed emoticons

When the emoticon is "reversed", the library fails in recognising the correct one, while the 2 single characters are detected. Examples:

Example Expected Detected 1 Detected 2
sad ): ): ) :
long drives /: /: :
feel g$$$$d:) :) d: )

The regex for the "(?_?)" emoticon is wrong, it just matches ")"

Im not sure this library is maintained, seeing as the last release is from 2018, but anyway

The current regex that is used is "\(?_?\)", the question marks are not escaped so "(?_?)" is never matched, in addition, this also causes ")" to be matched

Just need to change it to "\(\?_\?\)"

Note that a question mark needs to be escaped since in regex a question mark means that the previous character is optional

Inconsistent output types obtained

Describe the bug
The use of emot (version 2.1) provides inconsistent output. It provides list in some case and dictionary in some other cases.

To Reproduce
Steps to reproduce the behavior:

emoticon_details_1=emot.emoticons("The weather is โ˜๏ธ :), we might need to carry our โ˜‚๏ธ :(")    
emoticon_details_2=emot.emoticons("Cooool :)")   
print(type(emoticon_details_1))    
print(type(emoticon_details_2))

Expected behavior
The output type should be dictionary in all cases.

Inconsistent Results

Describe the bug
For very similar string's I am getting different results

To Reproduce
a = "I love python :)"

{'value': [':)'], 'location': [[14, 16]], 'mean': ['Happy face or smiley'], 'flag': True}

a = "This is :)"

{'value': [':)'], 'location': [[9, 11]], 'mean': ['Happy face or smiley'], 'flag': True}

a = "This story is great :)"

{'value': [':)'], 'location': [[20, 22]], 'mean': ['Happy face or smiley'], 'flag': True}

a = "This story is good :)"

[{'flag': False}]

Expected behavior
I was expecting a True flag for all of them.

Thank you NeelShah for this wonderful lib.

Only letter emojis

The module extracts 'oo' as emoticon in words like 'school', 'loop', and 'xp' in 'express', 'experience'

Don't give the complete collection with some words and characters

Hi, when use the method to find a emoticon with a word or sentence, gives this result

emot.emoticons(text)

{'value': [':-)'], 'location': [[16, 19]], 'mean': ['Happy face smiley'], 'flag': True}

{'value': [], 'location': [], 'mean': [], 'flag': False} "Emoticon not found"

but when use some words like:
good, tooth, book, poor, bluetooth, situations:1.), d807...wrongly, loud). , expected!!!!, )setup
, (v1.15g), experienced

give a list with a collection but skpping the others parameters
[{'flag': False}]

Example:

import emot

emoticon = emot.emoticons('bluetooth:)')

print( "Collection->" + str( emoticon) )

------------------------ shell--------------------------------
Collection->[{'flag':false}]

Plz use MIT or Apache2.0 Licence

Instead of GPL so people can use this library in productive environments.

By the way parts of your library was copycatted by a kaggle notebook (this is how I was directed to this page). But this notebook is not licenced under GPL3 (which it should since every reuse of gpl in another program will force that program also to be licenced under gpl3). Instead they use Apache 2.0. So strictly they already violate your licence terms. So plz change to MIT or APache2.0 if you want that people can freely use your library without restrictions.

Greetings :)

PS: By the way good job (Y)

Heart emoticon, i.e. <3, is not found

Describe the bug
The heart emoticon <3, which is listed on Wikipedia, is not found by the emoticons method.

To Reproduce

import emot

emot_obj = emot.core.emot()
test = "<3 :D some text <3 :D"
emot_obj.emoticons(test)
# {'value': [':D', ':D'], 'location': [[3, 5], [19, 21]], 'mean': ['Laughing, big grin or laugh with glasses', 'Laughing, big grin or laugh with glasses'], 'flag': True}
emot_obj.emoticons(test)["value"]
# [':D', ':D']

Expected behavior
In the example above, emot_obj.emoticons(test) should also find <3.

Desktop (please complete the following information):

  • OS: Windows 10
  • emot version: 3.1

Additional context

  • A similar bug was already reported: #6
  • The broken heart emoticon </3 is also not found

Workaround

  • Edit emot\emo_unicode.py by adding u"<3":"Heart" in the EMOTICONS_EMO dictionary.

standardize the variables

Hi Guys,
Please standardize the variables:
for emoji's your format is as below:
'โ˜ฎ' : ":Peace_symbol:" : Has ":" twice and the words are seperated by "_"
but for emoticons format is as below:
':-)' : 'Happy face smiley': Does not have any redundant characters.

Please use the emoticons format throughout.

Emoticon identified where it should not have been

Describe the bug
The emoticon D: is found in "Ticket ID: 99" where it should not have been.

To Reproduce
Steps to reproduce the behavior:

import emot
emot_extractor = emot.core.emot() 

text = """Ticket ID: 99"""
emoticons = emot_extractor.emoticons(text)

>>> {'value': ['D:'], 'location': [[8, 10]], 'mean': ['Sadness'], 'flag': True}

Expected behavior
Nothing should be found

Desktop (please complete the following information):

  • OS: ubuntu 18.04
  • Version emot=2.1 and emot=3.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.