participapy / politic-bots Goto Github PK
View Code? Open in Web Editor NEWTools and algorithms to analyze Paraguayan Tweets in times of elections
Home Page: https://www.participa.org.py
License: GNU General Public License v3.0
Tools and algorithms to analyze Paraguayan Tweets in times of elections
Home Page: https://www.participa.org.py
License: GNU General Public License v3.0
Filter out tweets that use the interested hashtags but aren't necessarily related to the Paraguayan political and electoral context
Currently, the code uses the API of civic crowdanalytics to compute the sentiment of tweets. The code should be changed to use the python library cca-core developed by Marcelo.
Use the function save_tweets of the module db_manager to include the tweets into the database
Improve the search function to save more information about tweets by the using the flags object. A proposed structure for the flags is the following
"flags": {
"partidos" : {
"anr":2,
"plra":0,
"pq":0
},
"movimiento": {
"hc": 7,
"ca": 2
},
"candidatura": {
"sp": 3,
"ma": 1
}
The construction of the flag object should dynamic so we can support different key values.
It can be similar to https://github.com/ParticipaPY/politic-bots/blob/master/bots_politicos_r1.ipynb
The add_tweet function of the db_manager module doesn't check whether the tweet already exists in the database before saving. We need to extend the function by checking before saving if the tweet exists.
Improve the CSV parser to make it processes CSV files of different formats
Check and possibly refactor the code of the heuristic fake promoter. The algorithm needs a config file that is currently missing.
Temporal solución for the Monday's searches
Logs:
INFO:root:Identifying 3322/67792 tweets (relevant)
INFO:root:Marking 0 RTS...
==> politic-bots-error.log <==
Traceback (most recent call last):
File "run.py", line 92, in <module>
run_task()
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "run.py", line 77, in run_task
analyze_tweet_relevance()
File "run.py", line 50, in analyze_tweet_relevance
te.identify_relevant_tweets()
File "/home/participa/politic-bots-generales/politic-bots/src/utils/data_wrangler.py", line 103, in identify_relevant_tweets
for doc in search_res:
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1176, in next
if len(self.__data) or self._refresh():
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 1110, in _refresh
self.__send_message(g)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/cursor.py", line 974, in __send_message
helpers._check_command_response(first)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/pymongo/helpers.py", line 143, in _check_command_response
raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: cursor id 28124978271 not found
Possible solutions
Not recommended, but posible:
Sources:
https://stackoverflow.com/questions/38735225/typeerror-init-got-an-unexpected-keyword-argument-timeout-pymongo
https://stackoverflow.com/questions/44248108/mongodb-error-getmore-command-failed-cursor-not-found/44250410#44250410
https://stackoverflow.com/questions/50760889/pymongo-errors-cursornotfound-cursor-id-not-found-at-server
Change DB actual name to "internas2017".
Check and possibly refactor the code of the heuristic fake promoter. The algorithm uses a config file that currently is missing in the code.
Found out that tweepy.Cursor (http://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html?highlight=tweepy.cursor#introduction) is the recommended method to search and iterate over a list of tweets. Cursor manages pagination automatically but still, we need to manually control the rate limit (see http://docs.tweepy.org/en/v3.5.0/code_snippet.html?highlight=tweepy.cursor)
Example.
tweepy.Cursor(api.search, q=“#hashtag”, count=5, tweet_mode='extended' include_entities=True)
Make sure to include the parameter include_entities so then we can iterate over the hashtags and mentioned user accounts without depending on having the full text of the tweet
Another relevant pointer
https://twittercommunity.com/t/retrieve-full-tweet-when-truncated-non-retweet/75542/17
Currently, the wiki documents only the heuristic fake handlers and fake promoters but it show has a documentation of the rest of the heuristics as well as of the heuristic to be implemented, see here for a description of heuristics
Implementar heurística que permita identificar nombres de usuario que contengan cadena aleatoria de números (y letras) o que sean similares a nombres de usuario utilizados por personajes públicos, por ejemplo @maritoabdo (verdadero) @marioabdojunior (falso)
Para más información consultar Spot a Bot: Identifying Automation and Disinformation on Social Media
Run a second round of the analysis using data collected in the last three weeks
Use the document provided by Laura to improve the documentation of the heuristic fake handlers. Especially, there is missing an explanation of the constants used in the algorithm
Implementar heurística que permita identificar cuentas usadas principalmente para promocionar cuentas falsas o bots. Para más información consultar Spot a Bot: Identifying Automation and Disinformation on Social Media
Remove duplicated tweets
Implementar heurística que permita identificar nombres de usuario que contengan cadena aleatoria de números (y letras) o que sean similares a nombres de usuario utilizados por personajes públicos, por ejemplo @maritoabdo (verdadero) @marioabdojunior (falso)
Erase "anr" from "paraguayelige" keyword.
Traceback (most recent call last):
File "run.py", line 92, in <module>
run_task()
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/participa/politic-bots-generales/politic-bots/env/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "run.py", line 81, in run_task
create_db_users()
File "run.py", line 62, in create_db_users
na.create_users_db()
File "/home/participa/politic-bots-generales/politic-bots/src/analyzer/network_analysis.py", line 168, in create_users_db
user_movements = self.__dbm_tweets.get_movement_user(user['screen_name'])
File "/home/participa/politic-bots-generales/politic-bots/src/utils/db_manager.py", line 410, in get_movement_user
for movement, flag in user_doc['movimiento'].items():
AttributeError: 'NoneType' object has no attribute 'items'
To solve this issue we need to:
Currently, the bot detector only returns the probability of users of being bots but these probabilities should be saved in the database as part of the record
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.