Git Product home page Git Product logo

emojitrack-feeder's People

Contributors

mroth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

emojitrack-feeder's Issues

inclusion of analysis of tweet text to track/surface emoji meanings

Last summer, Instagram devs did some initial analysis of emojis on Instagram - http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji

The techniques in that article might serve as a nice starting place for exploring emoji meanings, perhaps to be incorporated on the server side with the "ensmallened" tweet text.

For example, Independent of their original purpose and definition, an emoji's meaning can be altered based on other factors that result in their usage. The parking symbol P is one instance I've noticed in looking at new trends for previously rarely used emoji.

investigate migration to Lua script execution on server

A Redis script is transactional by definition, so everything you can do with a Redis transaction, you can also do with a script, and usually the script will be both simpler and faster.
http://redis.io/topics/transactions

Look into replacing multiple Redis calls per transaction with a single script EVAL. This will make development slightly more complex but could radically decrease amount of Redis writes necessary from the client side (and since most client libs seem to top out in the 20K ops/sec range, we could really use this.)

potentially move multi-emoji tweet updates logic to Lua script

As per some stats I just ran, the average tweet that contains emoji contains 1.4 emoji characters.

Currently our event loop does a single redis EVALSHA for each emoji. It would be possible to modify the script to take an array of affected emoji codepoints, and do the forEach logic on the Redis server side. The tinyjson blob applies to each equally, so outgoing bandwidth would be spared.

We already cut down outgoing Redis traffic almost 10x via #5, but this could result in another ~30% reduction potentially, not nearly as dramatic, but cataloging here as a potential improvement in the future!

Look at performance of JSON versus Oj

Oj is definitely faster, but how much faster in our actual use case? Useful to know because we would not be able to use it in Jruby due to C extensions.

Emoji bigrams

Don't know if would be interesting or not yet, but was wondering if you had considered storing emoji bigrams. Seems like more than one usually appears in a tweet anyway.

update twitter gem version

Currently only being used as a Tweet entity provider to TweetStream (although we may eventually change that via #6).

This should be done along with deprecation of kiosk mode in #7 since we will be using it in less places then. Note, will need to run some tests to make sure nothing has changed, and DEFINITELY performance profile to guard against any regressions.

consider passing along `lang` variable per tweet

This would add a small amount of size to our ensmallen()-ed tweets, but people seem to be very curious about language distribution, so this is a first step towards making sure that information makes it into Redis.

Note this is different from actually aggregating based on language locally, which would be a separate issue if we decide to do it.

Timestamps when updates occur?

I am not sure of impact ... except it could probably take up a lot of space quickly... but it would be nice to have timestamps logged as part of the redis updates to facilitate trend analysis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.