Git Product home page Git Product logo

politic-bots's People

Contributors

cdparra avatar jammily avatar joausaga avatar josueibarra95 avatar lauraachon avatar marcemmad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

politic-bots's Issues

Improve the search function

Improve the search function to save more information about tweets by the using the flags object. A proposed structure for the flags is the following
"flags": {
"partidos" : {
"anr":2,
"plra":0,
"pq":0
},
"movimiento": {
"hc": 7,
"ca": 2
},
"candidatura": {
"sp": 3,
"ma": 1
}
The construction of the flag object should dynamic so we can support different key values.

Extend the add_tweet function to avoid duplicates

The add_tweet function of the db_manager module doesn't check whether the tweet already exists in the database before saving. We need to extend the function by checking before saving if the tweet exists.

Improve CSV parser

Improve the CSV parser to make it processes CSV files of different formats

Improve documentation of the repo

  1. Remove explanation of the heuristics since they are already in the wiki
  2. Explain the structure of the repo
  3. Explain how to configure and run the code from end-to-end (i.e., collect tweets, detect bots, etc.)

Cursor not found after processing ~3500 records

Logs:

INFO:root:Identifying 3322/67792 tweets (relevant)
INFO:root:Marking 0 RTS...

==> politic-bots-error.log <==
Traceback (most recent call last):
  File "run.py", line 92, in <module>
    run_task()
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "run.py", line 77, in run_task
    analyze_tweet_relevance()
  File "run.py", line 50, in analyze_tweet_relevance
    te.identify_relevant_tweets()
  File "/home/participa/politic-bots-generales/politic-bots/src/utils/data_wrangler.py", line 103, in identify_relevant_tweets
    for doc in search_res: 
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1176, in next
    if len(self.__data) or self._refresh():
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1110, in _refresh
    self.__send_message(g)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 974, in __send_message
    helpers._check_command_response(first)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/helpers.py", line 143, in _check_command_response
    raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: cursor id 28124978271 not found

Possible solutions

  • Reducing the batch size to keep the cursor alive.
  • Remove the timeout from the cursor.
  • Retry when the cursor expires.

Not recommended, but posible:

  • Query the results in batches manually.
  • Get all the documents before the cursor expires.

Sources:
https://stackoverflow.com/questions/38735225/typeerror-init-got-an-unexpected-keyword-argument-timeout-pymongo
https://stackoverflow.com/questions/44248108/mongodb-error-getmore-command-failed-cursor-not-found/44250410#44250410
https://stackoverflow.com/questions/50760889/pymongo-errors-cursornotfound-cursor-id-not-found-at-server

Change api.search for tweepy.Cursor

Found out that tweepy.Cursor (http://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html?highlight=tweepy.cursor#introduction) is the recommended method to search and iterate over a list of tweets. Cursor manages pagination automatically but still, we need to manually control the rate limit (see http://docs.tweepy.org/en/v3.5.0/code_snippet.html?highlight=tweepy.cursor)

Example.
tweepy.Cursor(api.search, q=“#hashtag”, count=5, tweet_mode='extended' include_entities=True)

Make sure to include the parameter include_entities so then we can iterate over the hashtags and mentioned user accounts without depending on having the full text of the tweet

Another relevant pointer
https://twittercommunity.com/t/retrieve-full-tweet-when-truncated-non-retweet/75542/17

Implementar heurística de handler falsos

Implementar heurística que permita identificar nombres de usuario que contengan cadena aleatoria de números (y letras) o que sean similares a nombres de usuario utilizados por personajes públicos, por ejemplo @maritoabdo (verdadero) @marioabdojunior (falso)

Clean DB

Erase "anr" from "paraguayelige" keyword.

AttributeError: 'NoneType' object has no attribute 'items' in `get_movement_user`

Traceback (most recent call last):
  File "run.py", line 92, in <module>
    run_task()
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "run.py", line 81, in run_task
    create_db_users()
  File "run.py", line 62, in create_db_users
    na.create_users_db()
  File "/home/participa/politic-bots-generales/politic-bots/src/analyzer/network_analysis.py", line 168, in create_users_db
    user_movements = self.__dbm_tweets.get_movement_user(user['screen_name'])
  File "/home/participa/politic-bots-generales/politic-bots/src/utils/db_manager.py", line 410, in get_movement_user
    for movement, flag in user_doc['movimiento'].items():
AttributeError: 'NoneType' object has no attribute 'items'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.