Git Product home page Git Product logo

osometweet's Introduction

PyPI version v2

Introduction

The OSoMeTweet project intends to provide a set of tools to help researchers work with Twitter's V2 API.

The Wiki includes a detailed documentation of how to use all methods. Also, we will use the wiki to store knowledge gathered by those who are building this package.

Installation

Install the PyPI version

pip install osometweet

Warning 1: The package is still in development, so not all endpoints are included and those which are included may not be 100% robust. Please see the list of issues for known problems.

Warning 2: We will try to keep the interface of the package consistent, but there may be drastic changes in the future.

Use the newest features & local development

The PyPI version may be behind the GitHub version. To ensure that you are using the latest features and functionalities, you can install the GitHub version locally.

To do so, clone this project, go to the source directory, and run pip install -e .

If you want to do this with git it should look something like the below, run from your command line:

git clone https://github.com/osome-iu/osometweet.git
cd ./osometweet
pip install -e .

Requirements

python>=3.5
requests>=2.24.0
requests_oauthlib>=1.3.0

Tests

Go to tests directory and run:

python tests.py

Note: you will need to have the following environment variables set in order for the tests to work properly.

  • TWITTER_API_KEY
  • TWITTER_API_KEY_SECRET
  • TWITTER_ACCESS_TOKEN
  • TWITTER_ACCESS_TOKEN_SECRET
  • TWITTER_BEARER_TOKEN

If you're not sure what these are, check out this page to learn how Twitter authentication works.

How to seek help and contribute

OSoMeTweet will be a community project and your help is welcome!

See How to contribute to the OsoMeTweet package for more details on how to contribute.

Quick start

Here is an example of how to use our package to pull user information:

import osometweet

# Initialize the OSoMeTweet object
bearer_token = "YOUR_TWITTER_BEARER_TOKEN"
oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
ot = osometweet.OsomeTweet(oauth2)

# Set some test IDs (these are Twitter's own accounts)
ids2find = ["2244994945", "6253282"]

# Call the function with these ids as input
response = ot.user_lookup_ids(user_ids=ids2find)
print(response["data"])

which returns a list of dictionaries, where each dictionary contains the requested information for an individual user.

[
    {'id': '2244994945', 'name': 'Twitter Dev', 'username': 'TwitterDev'},
    {'id': '6253282', 'name': 'Twitter API', 'username': 'TwitterAPI'}
]

Learn how to use the package

Documentation on how to use all package methods are located in the Wiki.

Start here before using the example scripts!

Learn about Twitter V2

We have documented (and will continue to document) information about Twitter's V2 API that we deem is valuable. For example:

Example Scripts

We offer example scripts for working with different endpoints. We recommend that you read and understand the methods by reading the relevant package Wiki pages prior to using these scripts.

osometweet's People

Contributors

christorreslugo avatar mr-devs avatar yang3kc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

osometweet's Issues

Get rate-limit-manager to log print statements in separate script log?

I have a script that I am writing that uses the temp-mix branch's full search functionality to pull retweets of a list of users. You can see that script here, however, it's not really necessary for this question.

Basically, this is a long-running script and I'd like to record in my script's log the output of the rate_limit_manager.py print statements. What is the best way to do this? Do I just need to change these print statements to logger.info() lines, or is it more involved than that?

Implement OAuth 1.0a

Currently, osometweet only supports authentication through OAuth 2.0 bearer token. This form of authentication only allows for read-access. Performing any action that requires write-access, such as BotSlayer's send DM functionality, or accessing non-public fields, such as private engagement metrics, will be rejected by the API.

To solve this issue, we have to implement OAuth 1.0a. Ideally, the library will check whether the requested fields are allowed under the instance's authentication prior to calling the API. e.g. raise an error if authenticated using OAuth 2.0 and trying to post a tweet.

Utility function for initializing osometweet API

Very often it happens that I just want to make a single call to a function to see what it returns, etc.

As a result, I find myself typing out the below preamble before doing anything with the API all the time and it has become my least favorite thing about the package. πŸ‘Ό

# Initialize the OSoMeTweet object
bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
ot = osometweet.OsomeTweet(oauth2)

A simple solution is to just set up a utility function that does this for us by drawing on the user's environment variables. I am imagining one for both authorization contexts.

  • App (bearer token) context
    • Wiki example
  • User context
    • Wiki example

I imagine a loading function, as well as an initializing function for each context. Here is a rough example for the App context...

def load_bearer_token(env_key: str = "TWITTER_BEARER_TOKEN") -> str:
    """
    Load Twitter Keys from Local Environment.

    Parameters:
    -----------
    - env_key (str) : The name of the environment variable for your Twitter bearer token. (default = "TWITTER_BEARER_TOKEN")
    """
    # Set Twitter tokens/keys.
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    print("Loading bearer token...")
    bearer_token = os.environ.get(env_key, None)

    if bearer_token is None:
        raise Exception(
            f"No environment variable named `{env_key}`! "
            "Make sure to set this from your terminal via:\n\n"
            f"\t --> '{env_key}'='<your_twitter_bearer_token>' "
        )

    return bearer_token

def initialize_osometweet(
    env_key: str = "TWITTER_BEARER_TOKEN",
    manage_rate_limits: bool = True
) -> osometweet.api.OsomeTweet:
    """
    Return an authorized osometweet API object
    from which we can make API calls.

    Parameters:
    ----------
    - env_key (str) : The name of the environment variable for your Twitter bearer token. (default = "TWITTER_BEARER_TOKEN")
    """
    print("Initializing osometweet with oauth2a authentication...")

    bearer_token = load_bearer_token(env_key)

    oauth2 = osometweet.OAuth2(
        bearer_token=bearer_token,
        manage_rate_limits=manage_rate_limits
    )
    return osometweet.OsomeTweet(oauth2)

This approach allows users a bunch of freedom like:

  • Calling the loading functions on their own and initializing the standard way, if they want
  • Using whatever environment variables the user has set
  • Controling the rate limiting option from the initializing function like standard approach
  • If the user set it up so that their environment variable matches the default, they can simply call initialize_osometweet() with no input and get an osometweet.api.OsomeTweet object right away

Bug: Rate limit manager incorrectly throws an error for HTTP status 201

Despite not being present on Twitter's list of HTTP errors, it seems like when streaming rules are created, they return the successful 201 HTTP status code (details).

If the rate_limit_manager is enabled, the manager will throw an error even though we had no problems.

We should be able to edit the below line...

if response.status_code != 200:

... to match either of these codes with something like...

if response.status_code not in [200, 201]:

Example script: Scrape a list of users FULL tweet timeline

Using the full archive search we should be able to get users' entire list of tweets. I need to build this script for the superspreaders project anyways so I figured I would just add it to this repo.

The idea here is that the script should take in a file with one user ID per line and output a single file of tweets per user.

Method for controlling the authorization type used?

I copied and pasted the below from the documentation that I just created a PR for (#14)...

As you can see in the second method below, if we want to manually control which authorization type we use when making queries, we have to play with the OsomeTweet._use_bearer_token field (or parameter, or w/e it's called). That being said, you can see in the WARNING (at the bottom of the message) that you can set this to whatever you want and, if you don't set it correctly, it will break the class.

Would it be valuable to create a method which controls this - ensuring the user only passes in True or False? I am happy to do this, just wanted to make sure you guys agreed this was needed.

Let me know whenever. Thanks.


Controlling which authorization type you'd like to use

As mentioned above, osometweet defaults to OAuth 2.0 Bearer Token authorization. If you'd like to use OAuth 1.0a authorization you can do that in two ways.

  1. Don't provide the OsomeTweet class your bearer_token
    • osometweet needs your bearer_token for OAuth 2.0 Bearer Token authorization. Thus, if you do not provide this token, osometweet will look for your bearer_token - not find it - and then only use your user context Twitter keys/tokens (i.e. api_key, api_key_secret, access_token, access_token_secret) from then on. For example, simply initialize the OsomeTweet class like this...
from osometweet.api import OsomeTweet

api_key = "YOUR_TWITTER_API_KEY"
api_key_secret = "YOUR_TWITTER_API_KEY_SECRET"
access_token = "YOUR_TWITTER_ACCESS_TOKEN"
access_token_secret = "YOUR_TWITTER_ACCESS_TOKEN_SECRET"

ot = OsomeTweet(
	api_key = api_key,
	api_key_secret = api_key_secret,
	access_token = access_token,
	access_token_secret = access_token_secret
	)
  1. Manually
    • Perhaps you have a more complicated script and you'd like to switch which authorization osometweet uses for different methods. You can manually do this by controlling what osometweet does with OsomeTweet._use_bearer_token (boolean). For example:
from osometweet.api import OsomeTweet

api_key = "YOUR_TWITTER_API_KEY"
api_key_secret = "YOUR_TWITTER_API_KEY_SECRET"
access_token = "YOUR_TWITTER_ACCESS_TOKEN"
access_token_secret = "YOUR_TWITTER_ACCESS_TOKEN_SECRET"
bearer_token = "YOUR_TWITTER_BEARER_TOKEN"

ot = OsomeTweet(
	api_key = api_key,
	api_key_secret = api_key_secret,
	access_token = access_token,
	access_token_secret = access_token_secret,
	bearer_token = bearer_token
	)

# The below line tells osometweet to NOT use the bearer_token even though it has been provided
ot._use_bearer_token = False  # <-------

# You can then switch it back with...
ot._use_bearer_token = True

WARNING: OsomeTweet._use_bearer_token can only be set to the boolean values True or False - any other value will break the class.

Improve streaming

According to Twitter's page about consuming streaming data we want to be able to do the below:

  • Establish an HTTPS streaming connection to the filter stream endpoint.
  • Asynchronously send POST requests to the filter stream rules endpoint to add and delete rules from the stream.
  • Handle low data volumes – Maintain the streaming connection, detecting Tweet objects and keep-alive signals
  • Handle high data volumes – de-couple stream ingestion from additional processing using asynchronous processes, and ensure client-side buffers are flushed regularly.
  • Manage volume consumption tracking on the client-side.
  • Detect stream disconnections, evaluate and reconnect to the stream automatically.

Function that removes all filtered streaming rules

Since it seems like filtered streaming rules persist unless the user removes them (see #85 for an issue seeking to confirm this is true), a simple function that removes all existing filter rules seems like it would be useful.

Something like the below

def remove_all_filter_rules(ot)
    current_rules = ot.get_filtered_stream_rule()
    print(f'The current filtered stream rules are:\n{current_rules}\n')
    
    # Get all streaming rule ids and then remove them
    all_rules = [rule["id"] for rule in current_rules["data"]]
    delete_rule = {'delete': {'ids': all_rules}}
    response = ot.set_filtered_stream_rule(rules=delete_rule)
    print(f"API response from deleting rules:\n{response}\n")

Samples Stream Endpoint

Please claim if you think you can get to this.

  • API functionality
  • Add documentation
  • Add tests

Generalize make_request

Currently, OAuth{1,2}'s make_request uses requests.get() to contact the API. The filtered stream endpoint requires the use of POST to establish the rules that the stream will be filtered on. Other endpoints in the future might require POST or other methods. To get around this issue, I'll be changing make_requests to use requests.requests(method={"GET", "POST",etc}) and will update all the endpoints to pass the corresponding method to make_request.

Handle Endpoint Argument Options

I originally suggested adding **kwargs for all methods in #23 because I thought that there would be too many options. Looking at the endpoints, I actually think it's easier to just include all options explicitly.

Below is a list of endpoint methods and their available options.

Tweet Lookup

  • tweet_lookup
    • ids
    • expansions
      • attachments.poll_ids
      • attachments.media_keys
      • author_id
      • entities.mentions.username
      • geo.place_id
      • in_reply_to_user_id
      • referenced_tweets.id
      • referenced_tweets.id.author_id
    • media.fields
    • place.fields
    • poll.fields
    • tweet.fields
    • user.fields
  • Update Doc strings
  • Update Tests

User Lookup IDs

  • user_lookup_ids
    • ids
    • expansions
      • pinned_tweet_id
    • tweet.fields
    • user.fields
  • Update Doc strings
  • Update Tests

User Lookup Usernames

  • user_lookup_usernames
    • usernames
    • expansions
      • pinned_tweet_id
    • tweet.fields
    • user.fields
  • Update Doc strings
  • Update Tests

Get Followers

  • get_followers
    • id
    • expansions
      • pinned_tweet_id
    • max_results
    • pagination_token
    • tweet.fields
    • user.fields
  • Update Doc strings
  • Update Tests

Get Following

  • get_following
    • id
    • expansions
      • pinned_tweet_id
    • max_results
    • pagination_token
    • tweet.fields
    • user.fields
  • Update Doc strings
  • Update Tests

Remove

  • set_user_fields
  • set_tweets_fields
  • set_poll_fields
  • set_place_fields
  • set_media_fields

Adding more tests

API:

  • tweet lookup
  • user lookup ids and usernames
  • get_followers
  • get_following

fields/expansions:

  • expansions
  • tweet fields
  • user fields
  • media fields
  • poll fields
  • place files

utils:

  • utils.pause_until
  • utils.chunker

Add example scripts

Create scripts that can be used to gather data using the available endpoints.

Here are the endpoints available in osometweet at the moment:

  • tweet_lookup
  • user_lookup_ids
  • user_lookup_usernames
  • get_tweet_timeline
  • get_mentions_timeline
  • get_followers
  • get_following
  • search
  • sampled_stream
  • filtered_stream

Please feel free to claim one of these scripts if you're interested! πŸ˜„

See the examples/ folder for what is available.


Note: The original comment for this issue has been completely edited from it's initial form b/c it was originally opened as a question/discussion about whether we should do this. Now that we've decided to make this standard practice, I wanted to clean it up.

Automatic linting check

In an effort to tighten up the last bit of code I'd like to take the lead on going through everything with pylint. Right now, we get a pretty bad score almost entirely due to long lines in the docstrings (my bad πŸ˜‰ - I've made some changes in my editor which should keep this from happening in the future) but there are some other small things that can be addressed as well.

If no one has any objections, I am going to go through all of the code and tighten things up based on pylint's recommendations. (BTW, please let me know if you do have an issue with this for some reason and/or you recommend something else!)

However, I was wondering... is there a way to build this type of check into our tests pre-merge? If this is a lot of work, I'm not sure it will be worth it but, if it's easy, I think it could be nice to set up. For example, I think there is a way to get GitHub to block merges unless it passes a pylint check, however, I don't know what is involved to get this done.

Please let me know your thoughts when you can.

Streaming parameters for OAuth1a _make_one_request

The OAuth1a _make_one_request internal method includes the stream and json parameters. (See here)

I have two questions:

  1. It seems like the json parameter is not used, which I assume is a bug, however, whether to include it on the return line or delete it as a parameter depends on the next question...
  2. It seems like, for V2, we cannot access the streaming endpoints using OAuth1a authorization (you can find this info by checking out my earlier comment about this). Do we think that we should just remove stream and json?
    • If so, perhaps it's a good idea to include some sort of catch that throws an error if you try to access the streaming endpoints with OAuth1a? Too much?

get_all_avail_fields() not working for user_lookup_ids()

Here is the code and error I got:

>>> import osometweet
>>> import os
>>> bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
>>> all_fields = osometweet.fields.get_all_avail_fields()
>>> expansions = osometweet.UserExpansions()
>>> oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
>>> ot = osometweet.OsomeTweet(oauth2)
>>> ot.user_lookup_ids('12', fields=all_fields, expansions=expansions)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/matthewdeverna/Documents/GitHub/osometweet/osometweet/api.py", line 216, in user_lookup_ids
    return self._user_lookup(user_ids, "id", fields=fields, expansions=expansions)
  File "/Users/matthewdeverna/Documents/GitHub/osometweet/osometweet/api.py", line 289, in _user_lookup
    "Invalid parameter type: `query` must be" "either a list or tuple."
ValueError: Invalid parameter type: `query` must beeither a list or tuple.

Additive object fields?

So I was playing around with the user_look_ids() method and realized that our data retrieval methods sort of clash with the set_X_fields() methods.

For example...

# Get a list of all user_fields
>>> u_fields = o_utils.ObjectFields.return_user_fields()
>>> u_fields
['created_at', 'description', 'entities', 'id', 'location', 'name', 'pinned_tweet_id', 'profile_image_url', 'protected', 'public_metrics', 'url', 'username', 'verified', 'withheld']
# Set these fields 
>>> ot.set_user_fields(",".join(u_fields))
# Try and pull user_objects only passing the fields ["created_at","username"]
>>> response = ot.user_lookup_ids(user_ids = ids2find, user_fields = ["created_at","username"])
# This still returns all of user fields that I set before. 
>>> response["data"]
[{'created_at': '2013-12-14T04:35:55.000Z', 'url': 'https://t.co/3ZX3TNiZCY', 'id': '2244994945', 'verified': True, 'description': 'The voice of the #TwitterDev team and your official source for updates, news, and events, related to the #TwitterAPI.', 'location': '127.0.0.1', 'username': 'TwitterDev', 'entities': {'url': {'urls': [{'start': 0, 'end': 23, 'url': 'https://t.co/3ZX3TNiZCY', 'expanded_url': 'https://developer.twitter.com/en/community', 'display_url': 'developer.twitter.com/en/community'}]}, 'description': {'hashtags': [{'start': 17, 'end': 28, 'tag': 'TwitterDev'}, {'start': 105, 'end': 116, 'tag': 'TwitterAPI'}]}}, 'public_metrics': {'followers_count': 514823, 'following_count': 2035, 'tweet_count': 3648, 'listed_count': 1682}, 'pinned_tweet_id': '1293593516040269825', 'protected': False, 'profile_image_url': 'https://pbs.twimg.com/profile_images/1283786620521652229/lEODkLTh_normal.jpg', 'name': 'Twitter Dev'}, {'created_at': '2007-05-23T06:01:13.000Z', 'url': 'https://t.co/8IkCzCDr19', 'id': '6253282', 'verified': True, 'description': 'Tweets about changes and service issues. Follow @TwitterDev\xa0for more.', 'username': 'TwitterAPI', 'entities': {'url': {'urls': [{'start': 0, 'end': 23, 'url': 'https://t.co/8IkCzCDr19', 'expanded_url': 'https://developer.twitter.com', 'display_url': 'developer.twitter.com'}]}, 'description': {'mentions': [{'start': 48, 'end': 59, 'username': 'TwitterDev'}]}}, 'public_metrics': {'followers_count': 6056217, 'following_count': 39, 'tweet_count': 3693, 'listed_count': 12267}, 'pinned_tweet_id': '1293595870563381249', 'protected': False, 'profile_image_url': 'https://pbs.twimg.com/profile_images/942858479592554497/BbazLO9L_normal.jpg', 'name': 'Twitter API'}]

So, basically, whenever we are setting or updating the user_fields they simply add onto one another. I am not sure that this is the best approach. My expectation as a user is for whatever I have passed into a data retrieval method to override any previously set object fields.

Having said the above, I think this highlights the question of whether we want to keep these object field setting-methods (i.e. set_user_fields()) separate from the actual data call method (i.e. user_lookup_ids()) or not.

In my opinion, it makes more sense to convert these set_x_fields methods to internal methods which we can use under the hood to update the parameters. Then, we can imply which fields are retrievable from a specific endpoint based on the available parameters within its respective method by simply including only what is available. For example, place fields are available if you're pulling tweet timelines, however, they're not available for your pulling user account information. So we would include "place" as a parameter for the tweet timelines endpoint but not for the user lookup methods.

Hopefully this change wouldn't be too much trouble because I think we can just do the following:

  1. Add an underscore in front of the existing "set methods"
  2. Add the appropriate parameters to each "data retrieval" method (assuming they would each take a list)
  3. Use the now-internal "set methods" within the "data retrieval" methods

What do you guys think?

Functions for data cleaning

I think it would be good to build out functions for cleaning the data that osometweet returns.

I was watching one of the Twitter Twitch sessions about doing this and they shared this module from twarc which can be used to partially flatten the data JSON to create a data frame for a csv file. I don't think we can directly take this and work it into our package, however, I thought it could be good to reference.

I'm thinking that this would require it's own module, maybe called wrangle or data? Open to suggestions. I'm thinking of functions like:

  • wrangle.extract() which takes an input of a raw tweet dict and then "text", "urls", "hashtags", etc.
  • wrangle.flatten() which takes a single raw tweet dict and returns it in row form
  • etc.

Add a "everything" option to set_user_fields and alike

For me, it would be common to fetch all the fields for tweets and users. So maybe we can add an option for this. Seems we can modify utils.ObjectFields for this.

I'm not sure to do it in the method level or the class level.

Add User Lookup Method

Something that worked well for pulling user-account data that handles rate-limits as well as time-dependent errors

def connect_to_endpoint(oauth, params):
    """Downloads data from Twitter based on the `oauth` object passed and the
    `params` created with the `create_params()` function. 

    If time-dependent errors (429, 500, 503)

    """
    
    switch = True
    
    while switch:
        response = oauth.get("https://api.twitter.com/2/users", params=params)
        
        # Get number of requests left with our tokens
        remaining_requests = int(response.headers["x-rate-limit-remaining"])
        
        # If that number is one, we get the reset-time and wait until then, plus 15 seconds.
        # This is caught below as well, however, we want to program defensively, if possible.
        if remaining_requests == 1:
            buffer_wait_time = 15 
            resume_time = dt.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
            print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
            pause.until(resume_time)

        """To be safe, we check explicitly for these TIME DEPENDENT errors.
        That is, these errors can be solved simply by waiting a little while 
        and pinging Twitter again. So that's what we do."""
        if response.status_code != 200:

            # Too many requests error
            if response.status_code == 429:
                buffer_wait_time = 15 
                resume_time = dt.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.until(resume_time)

            # Twitter internal server error
            elif response.status_code == 500:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = dt.now().timestamp() + 30
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.unit(resume_time)

            # Twitter service unavailable error
            elif response.status_code == 503:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = dt.now().timestamp() + 30
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.unit(resume_time)

            # If we get this far, we've done something wrong and should exit
            raise Exception(
                "Request returned an error: {} {}".format(
                    response.status_code, response.text
                )
            )
        
        # If we get a 200 response, lets exit the function and return the response.json
        if response.ok:
            return response.json()

@ChrisTorresLugo this would require that I touch both #2 and #1. I don't need to touch #1 however this endpoint is much more effective with the user-based Oauth1 method.

pip install -e?

@yangkcatiu you mentioned in the user-lookup branch README.md ...

Local development

Clone this project, go to the source directory, type the following command to install the package locally:

pip install -e ./

Can you help me understand what this does exactly? The pip documentation says...

image

... which isn't very helpful.

From other Googling, it seems like maybe this allows me to import the version of the package that is present in the source folder on my local machine. Whereas, if I don't do this, running import osometweet would give me whatever the latest version of the package was when pushed to PyPi? I just tried this out and it seems like that is at least one difference, is there anything else that this does?

Handle rate limits

The headers returned by each API call contain the following fields:

  • x-rate-limit-limit: the rate limit ceiling for that given endpoint
  • x-rate-limit-remaining: the number of requests left for the 15-minute window
  • x-rate-limit-reset: the remaining window before the rate limit resets, in UTC epoch seconds

These fields are documented here. Method calls should parse these fields to determine when to sleep the process and for how long. Performing additional requests when the limit has been reached risks account suspension.

Add FIB score functionality

Would be great to build the below into the package somewhere...

def get_fib_scores(rt_counts):
    """
    Calculate the FIB-index.
    
    INPUT:
    - rt_counts (list) : list of retweet count values for all 
    retweets sent by a single user.
    
    OUTPUT:
    - fib_position (int) : super_spreader index where the retweet
    count value (of a sorted list of retweets) is greater than or
    cd  equal to the position in that list.
    """

    rt_counts.sort()

    # The "[::-1]" below makes this for-loop iterate from the last number instead of starting at zero. 
    for fib_position in range(1,len(rt_counts)+1)[::-1]:
        if rt_counts[-fib_position] >= fib_position:
            return fib_position
    
    # If the above criteria is never met, we return the fib_position as zero.
    fib_position = 0
    return fib_position

For anyone else seeing this, this is equivalent to the h-index algorithm - the above is the adapted version for the superspreaders project.

Here is a link to an out-of-the-box script that can take data directly from Moe's tavern.

  • Add function (maybe to utils.py?)
  • Remake the above linked-to script, but for twitter data from V2.

Doc-string format

Is there a reason to utilize this docstring format...

:param [str, list, tuple] tids
:returns requests.models.Response
:raises Exception, ValueError

vs. this one?

Parameters:
    - tids [str, list, tuple] - twitter id numbers
Returns:
    - requests.models.Response
Raises:
    - Exception, ValueError

The latter is, IMO, much more readable but I've seen the first a lot and I wasn't sure if there are any functional differences?

Normalize terms

Let's use variable names that are consistent with the newest twitter docs:

  • API_KEY
  • API_KEY_SECRET
  • BEARER_TOKEN
  • ACCESS_TOKEN
  • ACCESS_TOKEN_SECRET

Include **kwargs options in doc strings

If you haven't realized yet, I'm a bit anal about documentation πŸ˜„

I'd like to include docstrings like the ones I included in the timelines methods (#41) - i.e. include details on the kwargs (optional arguments) - for each method.

However, they're a bit long, does this bother anyone? I find myself using the docstrings constantly and get frustrated when docstrings have no information so it would personally make me quite happy to include this information - mostly because it is impossible for me to remember everything.

I will take the lead on this if you guys don't mind.

'everything' arg should accept only type = boolean

Right now the everything arg (in all methods) will accept random strings instead of a True or False boolean object because it evaluates strings as True.

For example:

ot.user_lookup_ids(list_tweet_ids[:100],β€˜blah’)

is the same as

ot.user_lookup_ids(list_tweet_ids[:100],everything=True)

since "blah" is a string and is treated as True.

Solution

Add a check like the below

if not isinstance(everything,bool):
    raise Exception

Include an "everything" option within each method option

For example, if we want to use the tweet_lookup() method.

You could do something like the following:

resp = self.ot.tweet_lookup(
    tids = random_tweet_id,
    everything = True
    )

And this would be equivalent to including all expansions, tweet.fields, place.fields, poll.fields, tweet.fields, and user.fields.

Another option would be to include and all option for each data object, however, since we have the return_objectfield() methods, I think this might be overkill. It's really easy to just call t_fields = o_utils.return_tweet_fields() and then place t_fields in the ot.tweet_lookup() call. As a result, I think it's best to build out the first option.


Methods

  • tweet_lookup
  • user_lookup_ids
  • user_lookup_usernames
  • get_followers
  • get_following

Unauthorized 401 Issue

When bearer token is not correct or expired, _user_lookup() in api.py is not handling 401 status code.

Document the exsiting methods and utils

To make it easier to use for other people (e.g., other group members), let's put some documentations in the readme file.

  • create a new documentation branch
  • document the initialization of OsomeTweet class, especially about the two different oauth methods
  • document tweet lookup
  • document user lookup ids and usernames
  • document utils.pause_until
  • document utils.chunker

Develop friends/followers methods

I will probably need to figure this out for the retweet cascade reconstruction part of the superspreaders project so I will have to look into this anyway. If anyone else wants to dig into this, please feel free to assign yourself and go for it because I probably won't get to this for a couple of weeks.

I liked the structural approach that Kaicheng took in his edits to the user lookup methods - having one base method and two user-facing methods (one for IDs and one for usernames) that feed into the base method - so I will try and do the same thing.

Please add any other thoughts you may have.


  • Base Method
  • Friends Method
  • Followers Method
  • Update README.md

Lists as input for the set_x_fields methods?

Right now all of the setting methods (set_place_fields, set_polls_fields, etc.) all take a string. What do we think about changing this to take a list of strings? We can handle the "stringification" within the method easily with a join. This seems more straight forward and less likely to lead to errors, IMO.

For example, we are assuming that people are going to build the url string correctly, but maybe it's best to do this for them to ensure it's correct?

If you guys are okay with this, I am happy to make these changes and can take care of #12 at the same time.

Update documentation

Update documentation to reflect #44 .

  • Tweet lookup
  • Timelines
  • User lookup
  • Followers lookup
  • Following lookup
  • Fields & expansions
  • Example scripts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.