Git Product home page Git Product logo

Comments (10)

PaprikaSteiger avatar PaprikaSteiger commented on September 13, 2024 1

@NeelShah18
Hey
I'm running into the same error.
Code throwing the error:

def replace_emoticon(line):
    emoticons = emot.emoticons(line)
    try:
        values = emoticons["value"]
        while len(values) != 0:
            value = values.pop(0)
            emoji = render_emoji(value) # other function; replaces emoticon with emoji if possible
            if emoji is not None:
                line = line.replace(value, emoji)
    except TypeError:
        print("emoji error")
        print(line)
        input()
    return line

lines have the shape: id \t sentiment \t text (German)
examples:
375830740166246400 \t neutral \t Experte befürchtet höhere Benzinpreise wegen Meldestelle URL

emot.emoticons(text) returns, as stated, only [{'flag': False}].

As well if you just pass the text part. Even if you copy it into an editor and insert it in the python editor directly to emot:

emot.emoticons("neutral Experte befürchtet höhere Benzinpreise wegen Meldestelle URL")
[{'flag': False}]

Using python 3.6, encoding all utf8 and that should be fine

from emot.

NeelShah18 avatar NeelShah18 commented on September 13, 2024

Thank you for the issue. Can you please provide the code to regenerate the error.

from emot.

NeelShah18 avatar NeelShah18 commented on September 13, 2024

@PaprikaSteiger Yes, that's true. I just check the Unicode library and "neutral" is not in the list. I believe unicode_library is a little old and I need to add more data there. Only reason why it is not able to detect "neutral" is because it is not on the list. If you can tell me what is most common emoticons use in german language I can add it.

Presently, I am working on speeding this library for large scale analysis and also planning to add more data their.

To solve this issue you can add the keyword in https://github.com/NeelShah18/emot/blob/master/emot/emo_unicode.py dictionary and it wll detect it automatically. One if for emoticons and one for emoji.

from emot.

NeelShah18 avatar NeelShah18 commented on September 13, 2024

@ViajeroHerrante this is the same issue for you as well. If you can give me the most common source of emoticons I can add it in the dictionary. I am also working on optimize it for large scale analysis in real-time. Please suggest me good dource of emoticons and emoji. Thank you.

Really apprecaite you interest in library. I am working to solve that problem as well.

from emot.

PaprikaSteiger avatar PaprikaSteiger commented on September 13, 2024

@NeelShah18
The Tweet collection I used can be found here:
https://github.com/WladimirSidorenko/CGSA/blob/master/data/SB10k/corpus_v1.0.cgsa.tsv
They contain some emoticons (like -.- o0 :S), which I personally use as well, that are not contained in the dictionary. I guess you can just feed the text of the tweets, see which emoticon it gets. For control I just used this regex to detect some suspicious patterns:

r"(^.*?)([[\]^`'?´¨}$£ö=)(/&%*\"+¦@#¬|¢{\-_:;,.<>\\]{3,7})((?:\s.*$)|$)" 
# the "ö" was used in the linked twitter corpus, though I don't thinkt that it is used that often in emoticons.

In the end, I chose a different approach for my personal task. Now I use a json file mapping emoticons to corresponding emoji, in an attempt to standardize emoticons. Something like that would be a nice feature, too. I could provide you with the small json file I started.

from emot.

NeelShah18 avatar NeelShah18 commented on September 13, 2024

@PaprikaSteiger yes, that will be great help.

from emot.

ViajeroHerrante avatar ViajeroHerrante commented on September 13, 2024

Hey whats up! Sorry for the abscense and for don't answer, I was too busy and I forgot it. I put you all words or collection of characters that gives the error. I send you my regards and I'll hope you be ok. :D

good
tooth
fooled
book
poor
bluetooth
too
looks
cool
)
d807
expect
bluetooths
:
-good
expectations
oozes
experienced
tool
look
bluetoooth
good7
room
expected
experience
poorly
sooner
inexpensive
boost
wood
looking
expensive
tools
waterproof
took
shooters
loop
reboots.overall
smoothly
explain
good.4
soon
supertooth
indoors
smoother
wooden
floor
boot
booking
good..
loops
looses
loose
hook

from emot.

KevinTeukengBecho avatar KevinTeukengBecho commented on September 13, 2024

@NeelShah18 I think the issue is when catching exception here and here, instead of using append, we can use __entities["flag"] = False or what was done here.

from emot.

PaprikaSteiger avatar PaprikaSteiger commented on September 13, 2024

@PaprikaSteiger yes, that will be great help.

Sorry as well for my delay. Here comes the emoticon to emoji dictionary with which I worked in the end.
The original is an emoticon-to-emoji map for JavaScript: https://github.com/banyan/emoji-emoticon-to-unicode[06.10.2020]

I changed the formatting and added some other emoticons.

I hope it is of any help

All the best
paprikasteiger_emoticon_emoji.zip

from emot.

NeelShah18 avatar NeelShah18 commented on September 13, 2024

@PaprikaSteiger , @KevinTeukengBecho @ViajeroHerrante Yep, I did the investigation and I decided to move with new type of detection to avoid this issue. We have new emoticons dataset as well and new template to make easy of adding new emoji or emoticon.

For now, I have push some changes to new branch "version_3.0".

I will try to finish it this week and publish new version.

Hope this helps. sorry for late reply.

from emot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.