osome-iu / osometweet Goto Github PK

View Code? Open in Web Editor NEW

30.0 6.0 3.0 213 KB

OSoMe Twitter tools. Including a package like tweepy but for the v2 Twitter api.

License: MIT License

Python 100.00%

twitter-api twitter-api-v2 twitter-apis twitter-api-stream twitter-api-tweets twitter-api-wrapper

osometweet's Introduction

Introduction

The OSoMeTweet project intends to provide a set of tools to help researchers work with Twitter's V2 API.

The Wiki includes a detailed documentation of how to use all methods. Also, we will use the wiki to store knowledge gathered by those who are building this package.

Install
Quick Start
Learn how to use the package
Learn about Twitter V2
Example scipts
Wiki

Installation

Install the PyPI version

pip install osometweet

Warning 1: The package is still in development, so not all endpoints are included and those which are included may not be 100% robust. Please see the list of issues for known problems.

Warning 2: We will try to keep the interface of the package consistent, but there may be drastic changes in the future.

Use the newest features & local development

The PyPI version may be behind the GitHub version. To ensure that you are using the latest features and functionalities, you can install the GitHub version locally.

To do so, clone this project, go to the source directory, and run pip install -e .

If you want to do this with git it should look something like the below, run from your command line:

git clone https://github.com/osome-iu/osometweet.git
cd ./osometweet
pip install -e .

Requirements

python>=3.5
requests>=2.24.0
requests_oauthlib>=1.3.0

Tests

Go to tests directory and run:

python tests.py

Note: you will need to have the following environment variables set in order for the tests to work properly.

TWITTER_API_KEY

TWITTER_API_KEY_SECRET

TWITTER_ACCESS_TOKEN

TWITTER_ACCESS_TOKEN_SECRET

TWITTER_BEARER_TOKEN

If you're not sure what these are, check out this page to learn how Twitter authentication works.

How to seek help and contribute

OSoMeTweet will be a community project and your help is welcome!

See How to contribute to the OsoMeTweet package for more details on how to contribute.

Quick start

Here is an example of how to use our package to pull user information:

import osometweet

# Initialize the OSoMeTweet object
bearer_token = "YOUR_TWITTER_BEARER_TOKEN"
oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
ot = osometweet.OsomeTweet(oauth2)

# Set some test IDs (these are Twitter's own accounts)
ids2find = ["2244994945", "6253282"]

# Call the function with these ids as input
response = ot.user_lookup_ids(user_ids=ids2find)
print(response["data"])

which returns a list of dictionaries, where each dictionary contains the requested information for an individual user.

[
    {'id': '2244994945', 'name': 'Twitter Dev', 'username': 'TwitterDev'},
    {'id': '6253282', 'name': 'Twitter API', 'username': 'TwitterAPI'}
]

Learn how to use the package

Documentation on how to use all package methods are located in the Wiki.

Start here before using the example scripts!

Learn about Twitter V2

We have documented (and will continue to document) information about Twitter's V2 API that we deem is valuable. For example:

Details on Twitter's new fields/expansions parameters
Available Endpoints
HTTP Status Codes and Errors
Academic Track Benefits and Details

Example Scripts

We offer example scripts for working with different endpoints. We recommend that you read and understand the methods by reading the relevant package Wiki pages prior to using these scripts.

osometweet's People

Contributors

Stargazers

Watchers

Forkers

6un9-h0-dan grukz pasan04

osometweet's Issues

Decouple osometweet and OAuth

It's probably time to extract the OAuth part out of the osometweet class.

Get rate-limit-manager to log print statements in separate script log?

I have a script that I am writing that uses the temp-mix branch's full search functionality to pull retweets of a list of users. You can see that script here, however, it's not really necessary for this question.

Basically, this is a long-running script and I'd like to record in my script's log the output of the rate_limit_manager.py print statements. What is the best way to do this? Do I just need to change these print statements to logger.info() lines, or is it more involved than that?

Make osometweet into a python package

add setup.py
upload to pypi to take the name first ..

Do the filtered streaming rules reset?

Need to figure out whether the filtered streaming rules persist forever, or they reset at some point.

I opened a question on twittercommunity.com asking exactly this but please do chime in if you've found an answer somewhere.

Create docstrings for user_lookup_ids() and user_lookup_usernames()

Create docstrings for these methods.

user_lookup_ids()
user_lookup_usernames()

Implement OAuth 1.0a

Currently, osometweet only supports authentication through OAuth 2.0 bearer token. This form of authentication only allows for read-access. Performing any action that requires write-access, such as BotSlayer's send DM functionality, or accessing non-public fields, such as private engagement metrics, will be rejected by the API.

To solve this issue, we have to implement OAuth 1.0a. Ideally, the library will check whether the requested fields are allowed under the instance's authentication prior to calling the API. e.g. raise an error if authenticated using OAuth 2.0 and trying to post a tweet.

Utility function for initializing osometweet API

Very often it happens that I just want to make a single call to a function to see what it returns, etc.

As a result, I find myself typing out the below preamble before doing anything with the API all the time and it has become my least favorite thing about the package. 👼

# Initialize the OSoMeTweet object
bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
ot = osometweet.OsomeTweet(oauth2)

A simple solution is to just set up a utility function that does this for us by drawing on the user's environment variables. I am imagining one for both authorization contexts.

App (bearer token) context
- Wiki example
User context
- Wiki example

I imagine a loading function, as well as an initializing function for each context. Here is a rough example for the App context...

def load_bearer_token(env_key: str = "TWITTER_BEARER_TOKEN") -> str:
    """
    Load Twitter Keys from Local Environment.

    Parameters:
    -----------
    - env_key (str) : The name of the environment variable for your Twitter bearer token. (default = "TWITTER_BEARER_TOKEN")
    """
    # Set Twitter tokens/keys.
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    print("Loading bearer token...")
    bearer_token = os.environ.get(env_key, None)

    if bearer_token is None:
        raise Exception(
            f"No environment variable named `{env_key}`! "
            "Make sure to set this from your terminal via:\n\n"
            f"\t --> '{env_key}'='<your_twitter_bearer_token>' "
        )

    return bearer_token

def initialize_osometweet(
    env_key: str = "TWITTER_BEARER_TOKEN",
    manage_rate_limits: bool = True
) -> osometweet.api.OsomeTweet:
    """
    Return an authorized osometweet API object
    from which we can make API calls.

    Parameters:
    ----------
    - env_key (str) : The name of the environment variable for your Twitter bearer token. (default = "TWITTER_BEARER_TOKEN")
    """
    print("Initializing osometweet with oauth2a authentication...")

    bearer_token = load_bearer_token(env_key)

    oauth2 = osometweet.OAuth2(
        bearer_token=bearer_token,
        manage_rate_limits=manage_rate_limits
    )
    return osometweet.OsomeTweet(oauth2)

This approach allows users a bunch of freedom like:

Calling the loading functions on their own and initializing the standard way, if they want
Using whatever environment variables the user has set
Controling the rate limiting option from the initializing function like standard approach
If the user set it up so that their environment variable matches the default, they can simply call initialize_osometweet() with no input and get an osometweet.api.OsomeTweet object right away

Bug: Rate limit manager incorrectly throws an error for HTTP status 201

Despite not being present on Twitter's list of HTTP errors, it seems like when streaming rules are created, they return the successful 201 HTTP status code (details).

If the rate_limit_manager is enabled, the manager will throw an error even though we had no problems.

We should be able to edit the below line...

osometweet/osometweet/rate_limit_manager.py

Line 94 in f512660

if response.status_code != 200:

... to match either of these codes with something like...

if response.status_code not in [200, 201]:

Example script: Scrape a list of users FULL tweet timeline

Using the full archive search we should be able to get users' entire list of tweets. I need to build this script for the superspreaders project anyways so I figured I would just add it to this repo.

The idea here is that the script should take in a file with one user ID per line and output a single file of tweets per user.

Method for controlling the authorization type used?

I copied and pasted the below from the documentation that I just created a PR for (#14)...

As you can see in the second method below, if we want to manually control which authorization type we use when making queries, we have to play with the OsomeTweet._use_bearer_token field (or parameter, or w/e it's called). That being said, you can see in the WARNING (at the bottom of the message) that you can set this to whatever you want and, if you don't set it correctly, it will break the class.

Would it be valuable to create a method which controls this - ensuring the user only passes in True or False? I am happy to do this, just wanted to make sure you guys agreed this was needed.

Let me know whenever. Thanks.

Controlling which authorization type you'd like to use

As mentioned above, osometweet defaults to OAuth 2.0 Bearer Token authorization. If you'd like to use OAuth 1.0a authorization you can do that in two ways.

Don't provide the OsomeTweet class your bearer_token
- osometweet needs your bearer_token for OAuth 2.0 Bearer Token authorization. Thus, if you do not provide this token, osometweet will look for your bearer_token - not find it - and then only use your user context Twitter keys/tokens (i.e. api_key, api_key_secret, access_token, access_token_secret) from then on. For example, simply initialize the OsomeTweet class like this...

from osometweet.api import OsomeTweet

api_key = "YOUR_TWITTER_API_KEY"
api_key_secret = "YOUR_TWITTER_API_KEY_SECRET"
access_token = "YOUR_TWITTER_ACCESS_TOKEN"
access_token_secret = "YOUR_TWITTER_ACCESS_TOKEN_SECRET"

ot = OsomeTweet(
	api_key = api_key,
	api_key_secret = api_key_secret,
	access_token = access_token,
	access_token_secret = access_token_secret
	)

Manually
- Perhaps you have a more complicated script and you'd like to switch which authorization osometweet uses for different methods. You can manually do this by controlling what osometweet does with OsomeTweet._use_bearer_token (boolean). For example:

from osometweet.api import OsomeTweet

api_key = "YOUR_TWITTER_API_KEY"
api_key_secret = "YOUR_TWITTER_API_KEY_SECRET"
access_token = "YOUR_TWITTER_ACCESS_TOKEN"
access_token_secret = "YOUR_TWITTER_ACCESS_TOKEN_SECRET"
bearer_token = "YOUR_TWITTER_BEARER_TOKEN"

ot = OsomeTweet(
	api_key = api_key,
	api_key_secret = api_key_secret,
	access_token = access_token,
	access_token_secret = access_token_secret,
	bearer_token = bearer_token
	)

# The below line tells osometweet to NOT use the bearer_token even though it has been provided
ot._use_bearer_token = False  # <-------

# You can then switch it back with...
ot._use_bearer_token = True

WARNING: OsomeTweet._use_bearer_token can only be set to the boolean values True or False - any other value will break the class.

Improve streaming

According to Twitter's page about consuming streaming data we want to be able to do the below:

Establish an HTTPS streaming connection to the filter stream endpoint.
Asynchronously send POST requests to the filter stream rules endpoint to add and delete rules from the stream.
Handle low data volumes – Maintain the streaming connection, detecting Tweet objects and keep-alive signals
Handle high data volumes – de-couple stream ingestion from additional processing using asynchronous processes, and ensure client-side buffers are flushed regularly.
Manage volume consumption tracking on the client-side.
Detect stream disconnections, evaluate and reconnect to the stream automatically.

Function that removes all filtered streaming rules

Since it seems like filtered streaming rules persist unless the user removes them (see #85 for an issue seeking to confirm this is true), a simple function that removes all existing filter rules seems like it would be useful.

Something like the below

def remove_all_filter_rules(ot)
    current_rules = ot.get_filtered_stream_rule()
    print(f'The current filtered stream rules are:\n{current_rules}\n')
    
    # Get all streaming rule ids and then remove them
    all_rules = [rule["id"] for rule in current_rules["data"]]
    delete_rule = {'delete': {'ids': all_rules}}
    response = ot.set_filtered_stream_rule(rules=delete_rule)
    print(f"API response from deleting rules:\n{response}\n")

Samples Stream Endpoint

Please claim if you think you can get to this.

API functionality
Add documentation
Add tests

Adding logging facilities

Replace print with logger.info and logger.error

Generalize make_request

Currently, OAuth{1,2}'s make_request uses requests.get() to contact the API. The filtered stream endpoint requires the use of POST to establish the rules that the stream will be filtered on. Other endpoints in the future might require POST or other methods. To get around this issue, I'll be changing make_requests to use requests.requests(method={"GET", "POST",etc}) and will update all the endpoints to pass the corresponding method to make_request.

Handle Endpoint Argument Options

I originally suggested adding **kwargs for all methods in #23 because I thought that there would be too many options. Looking at the endpoints, I actually think it's easier to just include all options explicitly.

Below is a list of endpoint methods and their available options.

Tweet Lookup

User Lookup IDs

User Lookup Usernames

Get Followers

Get Following

Remove

Adding more tests

API:

tweet lookup
user lookup ids and usernames
get_followers
get_following

fields/expansions:

utils:

utils.pause_until
utils.chunker

Add example scripts

Create scripts that can be used to gather data using the available endpoints.

Here are the endpoints available in osometweet at the moment:

Please feel free to claim one of these scripts if you're interested! 😄

See the examples/ folder for what is available.

Note: The original comment for this issue has been completely edited from it's initial form b/c it was originally opened as a question/discussion about whether we should do this. Now that we've decided to make this standard practice, I wanted to clean it up.

Filtered Stream Endpoint

Please claim if you think you can get to this.

API functionality
Add documentation
Add tests

Automatic linting check

In an effort to tighten up the last bit of code I'd like to take the lead on going through everything with pylint. Right now, we get a pretty bad score almost entirely due to long lines in the docstrings (my bad 😉 - I've made some changes in my editor which should keep this from happening in the future) but there are some other small things that can be addressed as well.

If no one has any objections, I am going to go through all of the code and tighten things up based on pylint's recommendations. (BTW, please let me know if you do have an issue with this for some reason and/or you recommend something else!)

However, I was wondering... is there a way to build this type of check into our tests pre-merge? If this is a lot of work, I'm not sure it will be worth it but, if it's easy, I think it could be nice to set up. For example, I think there is a way to get GitHub to block merges unless it passes a pylint check, however, I don't know what is involved to get this done.

Please let me know your thoughts when you can.

Streaming parameters for OAuth1a _make_one_request

The OAuth1a _make_one_request internal method includes the stream and json parameters. (See here)

I have two questions:

It seems like the json parameter is not used, which I assume is a bug, however, whether to include it on the return line or delete it as a parameter depends on the next question...
It seems like, for V2, we cannot access the streaming endpoints using OAuth1a authorization (you can find this info by checking out my earlier comment about this). Do we think that we should just remove stream and json?
- If so, perhaps it's a good idea to include some sort of catch that throws an error if you try to access the streaming endpoints with OAuth1a? Too much?

get_all_avail_fields() not working for user_lookup_ids()

Here is the code and error I got:

>>> import osometweet
>>> import os
>>> bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
>>> all_fields = osometweet.fields.get_all_avail_fields()
>>> expansions = osometweet.UserExpansions()
>>> oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
>>> ot = osometweet.OsomeTweet(oauth2)
>>> ot.user_lookup_ids('12', fields=all_fields, expansions=expansions)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/matthewdeverna/Documents/GitHub/osometweet/osometweet/api.py", line 216, in user_lookup_ids
    return self._user_lookup(user_ids, "id", fields=fields, expansions=expansions)
  File "/Users/matthewdeverna/Documents/GitHub/osometweet/osometweet/api.py", line 289, in _user_lookup
    "Invalid parameter type: `query` must be" "either a list or tuple."
ValueError: Invalid parameter type: `query` must beeither a list or tuple.

Additive object fields?

So I was playing around with the user_look_ids() method and realized that our data retrieval methods sort of clash with the set_X_fields() methods.

For example...

# Get a list of all user_fields
>>> u_fields = o_utils.ObjectFields.return_user_fields()
>>> u_fields
['created_at', 'description', 'entities', 'id', 'location', 'name', 'pinned_tweet_id', 'profile_image_url', 'protected', 'public_metrics', 'url', 'username', 'verified', 'withheld']
# Set these fields 
>>> ot.set_user_fields(",".join(u_fields))
# Try and pull user_objects only passing the fields ["created_at","username"]
>>> response = ot.user_lookup_ids(user_ids = ids2find, user_fields = ["created_at","username"])
# This still returns all of user fields that I set before. 
>>> response["data"]
[{'created_at': '2013-12-14T04:35:55.000Z', 'url': 'https://t.co/3ZX3TNiZCY', 'id': '2244994945', 'verified': True, 'description': 'The voice of the #TwitterDev team and your official source for updates, news, and events, related to the #TwitterAPI.', 'location': '127.0.0.1', 'username': 'TwitterDev', 'entities': {'url': {'urls': [{'start': 0, 'end': 23, 'url': 'https://t.co/3ZX3TNiZCY', 'expanded_url': 'https://developer.twitter.com/en/community', 'display_url': 'developer.twitter.com/en/community'}]}, 'description': {'hashtags': [{'start': 17, 'end': 28, 'tag': 'TwitterDev'}, {'start': 105, 'end': 116, 'tag': 'TwitterAPI'}]}}, 'public_metrics': {'followers_count': 514823, 'following_count': 2035, 'tweet_count': 3648, 'listed_count': 1682}, 'pinned_tweet_id': '1293593516040269825', 'protected': False, 'profile_image_url': 'https://pbs.twimg.com/profile_images/1283786620521652229/lEODkLTh_normal.jpg', 'name': 'Twitter Dev'}, {'created_at': '2007-05-23T06:01:13.000Z', 'url': 'https://t.co/8IkCzCDr19', 'id': '6253282', 'verified': True, 'description': 'Tweets about changes and service issues. Follow @TwitterDev\xa0for more.', 'username': 'TwitterAPI', 'entities': {'url': {'urls': [{'start': 0, 'end': 23, 'url': 'https://t.co/8IkCzCDr19', 'expanded_url': 'https://developer.twitter.com', 'display_url': 'developer.twitter.com'}]}, 'description': {'mentions': [{'start': 48, 'end': 59, 'username': 'TwitterDev'}]}}, 'public_metrics': {'followers_count': 6056217, 'following_count': 39, 'tweet_count': 3693, 'listed_count': 12267}, 'pinned_tweet_id': '1293595870563381249', 'protected': False, 'profile_image_url': 'https://pbs.twimg.com/profile_images/942858479592554497/BbazLO9L_normal.jpg', 'name': 'Twitter API'}]

So, basically, whenever we are setting or updating the user_fields they simply add onto one another. I am not sure that this is the best approach. My expectation as a user is for whatever I have passed into a data retrieval method to override any previously set object fields.

Having said the above, I think this highlights the question of whether we want to keep these object field setting-methods (i.e. set_user_fields()) separate from the actual data call method (i.e. user_lookup_ids()) or not.

In my opinion, it makes more sense to convert these set_x_fields methods to internal methods which we can use under the hood to update the parameters. Then, we can imply which fields are retrievable from a specific endpoint based on the available parameters within its respective method by simply including only what is available. For example, place fields are available if you're pulling tweet timelines, however, they're not available for your pulling user account information. So we would include "place" as a parameter for the tweet timelines endpoint but not for the user lookup methods.

Hopefully this change wouldn't be too much trouble because I think we can just do the following:

Add an underscore in front of the existing "set methods"
Add the appropriate parameters to each "data retrieval" method (assuming they would each take a list)
Use the now-internal "set methods" within the "data retrieval" methods

What do you guys think?

Functions for data cleaning

I think it would be good to build out functions for cleaning the data that osometweet returns.

I was watching one of the Twitter Twitch sessions about doing this and they shared this module from twarc which can be used to partially flatten the data JSON to create a data frame for a csv file. I don't think we can directly take this and work it into our package, however, I thought it could be good to reference.

I'm thinking that this would require it's own module, maybe called wrangle or data? Open to suggestions. I'm thinking of functions like:

wrangle.extract() which takes an input of a raw tweet dict and then "text", "urls", "hashtags", etc.
wrangle.flatten() which takes a single raw tweet dict and returns it in row form
etc.

update the user-lookup-ids.py script to use user-context tokens

The user-lookup-ids.py script currently uses the bearer_token for authentication. Using the user-context api keys/tokens allows for much more data to be pulled. Make this switch.

Add a "everything" option to set_user_fields and alike

For me, it would be common to fetch all the fields for tweets and users. So maybe we can add an option for this. Seems we can modify utils.ObjectFields for this.

I'm not sure to do it in the method level or the class level.

Add User Lookup Method

Something that worked well for pulling user-account data that handles rate-limits as well as time-dependent errors

def connect_to_endpoint(oauth, params):
    """Downloads data from Twitter based on the `oauth` object passed and the
    `params` created with the `create_params()` function. 

    If time-dependent errors (429, 500, 503)

    """
    
    switch = True
    
    while switch:
        response = oauth.get("https://api.twitter.com/2/users", params=params)
        
        # Get number of requests left with our tokens
        remaining_requests = int(response.headers["x-rate-limit-remaining"])
        
        # If that number is one, we get the reset-time and wait until then, plus 15 seconds.
        # This is caught below as well, however, we want to program defensively, if possible.
        if remaining_requests == 1:
            buffer_wait_time = 15 
            resume_time = dt.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
            print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
            pause.until(resume_time)

        """To be safe, we check explicitly for these TIME DEPENDENT errors.
        That is, these errors can be solved simply by waiting a little while 
        and pinging Twitter again. So that's what we do."""
        if response.status_code != 200:

            # Too many requests error
            if response.status_code == 429:
                buffer_wait_time = 15 
                resume_time = dt.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.until(resume_time)

            # Twitter internal server error
            elif response.status_code == 500:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = dt.now().timestamp() + 30
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.unit(resume_time)

            # Twitter service unavailable error
            elif response.status_code == 503:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = dt.now().timestamp() + 30
                print(f"Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause.unit(resume_time)

            # If we get this far, we've done something wrong and should exit
            raise Exception(
                "Request returned an error: {} {}".format(
                    response.status_code, response.text
                )
            )
        
        # If we get a 200 response, lets exit the function and return the response.json
        if response.ok:
            return response.json()

@ChrisTorresLugo this would require that I touch both #2 and #1. I don't need to touch #1 however this endpoint is much more effective with the user-based Oauth1 method.

pip install -e?

@yangkcatiu you mentioned in the user-lookup branch README.md ...

Local development

Clone this project, go to the source directory, type the following command to install the package locally:
pip install -e ./

Can you help me understand what this does exactly? The pip documentation says...

... which isn't very helpful.

From other Googling, it seems like maybe this allows me to import the version of the package that is present in the source folder on my local machine. Whereas, if I don't do this, running import osometweet would give me whatever the latest version of the package was when pushed to PyPi? I just tried this out and it seems like that is at least one difference, is there anything else that this does?

Add unit test

We can probably borrow some of the tests from Tweepy at https://github.com/tweepy/tweepy/blob/master/tests/test_api.py
~~configure github actions~~

Handle rate limits

The headers returned by each API call contain the following fields:

x-rate-limit-limit: the rate limit ceiling for that given endpoint

x-rate-limit-remaining: the number of requests left for the 15-minute window

x-rate-limit-reset: the remaining window before the rate limit resets, in UTC epoch seconds

These fields are documented here. Method calls should parse these fields to determine when to sleep the process and for how long. Performing additional requests when the limit has been reached risks account suspension.

Add FIB score functionality

Would be great to build the below into the package somewhere...

def get_fib_scores(rt_counts):
    """
    Calculate the FIB-index.
    
    INPUT:
    - rt_counts (list) : list of retweet count values for all 
    retweets sent by a single user.
    
    OUTPUT:
    - fib_position (int) : super_spreader index where the retweet
    count value (of a sorted list of retweets) is greater than or
    cd  equal to the position in that list.
    """

    rt_counts.sort()

    # The "[::-1]" below makes this for-loop iterate from the last number instead of starting at zero. 
    for fib_position in range(1,len(rt_counts)+1)[::-1]:
        if rt_counts[-fib_position] >= fib_position:
            return fib_position
    
    # If the above criteria is never met, we return the fib_position as zero.
    fib_position = 0
    return fib_position

For anyone else seeing this, this is equivalent to the h-index algorithm - the above is the adapted version for the superspreaders project.

Here is a link to an out-of-the-box script that can take data directly from Moe's tavern.

Add function (maybe to utils.py?)
Remake the above linked-to script, but for twitter data from V2.

Doc-string format

Is there a reason to utilize this docstring format...

:param [str, list, tuple] tids
:returns requests.models.Response
:raises Exception, ValueError

vs. this one?

Parameters:
    - tids [str, list, tuple] - twitter id numbers
Returns:
    - requests.models.Response
Raises:
    - Exception, ValueError

The latter is, IMO, much more readable but I've seen the first a lot and I wasn't sure if there are any functional differences?

Normalize terms

Let's use variable names that are consistent with the newest twitter docs:

API_KEY
API_KEY_SECRET
BEARER_TOKEN
ACCESS_TOKEN
ACCESS_TOKEN_SECRET

Add twitter-api-v2 as repo topic?

What do we think about adding the twitter-api-v2 topic to the repo?

I think this can only be done by an admin. Based on these instructions, the gear icon is not there to allow me to do this - not sure @yangkcatiu if you are an admin?

Historical Search Endpoint

Please claim if you think you can get to this.

API functionality
Add documentation
Add tests

Add destroy friendship endpoint

Allows a user ID to unfollow another user: 2/users/:source_user_id/following/:target_user_id (DELETE)

Reference: https://developer.twitter.com/en/docs/twitter-api/users/follows/api-reference/delete-users-source_id-following

Include **kwargs options in doc strings

If you haven't realized yet, I'm a bit anal about documentation 😄

I'd like to include docstrings like the ones I included in the timelines methods (#41) - i.e. include details on the kwargs (optional arguments) - for each method.

However, they're a bit long, does this bother anyone? I find myself using the docstrings constantly and get frustrated when docstrings have no information so it would personally make me quite happy to include this information - mostly because it is impossible for me to remember everything.

I will take the lead on this if you guys don't mind.

'everything' arg should accept only type = boolean

Right now the everything arg (in all methods) will accept random strings instead of a True or False boolean object because it evaluates strings as True.

For example:

ot.user_lookup_ids(list_tweet_ids[:100],‘blah’)

is the same as

ot.user_lookup_ids(list_tweet_ids[:100],everything=True)

since "blah" is a string and is treated as True.

Solution

Add a check like the below

if not isinstance(everything,bool):
    raise Exception

Include an "everything" option within each method option

For example, if we want to use the tweet_lookup() method.

You could do something like the following:

resp = self.ot.tweet_lookup(
    tids = random_tweet_id,
    everything = True
    )

And this would be equivalent to including all expansions, tweet.fields, place.fields, poll.fields, tweet.fields, and user.fields.

Another option would be to include and all option for each data object, however, since we have the return_objectfield() methods, I think this might be overkill. It's really easy to just call t_fields = o_utils.return_tweet_fields() and then place t_fields in the ot.tweet_lookup() call. As a result, I think it's best to build out the first option.

Methods

Unauthorized 401 Issue

When bearer token is not correct or expired, _user_lookup() in api.py is not handling 401 status code.

Document the exsiting methods and utils

To make it easier to use for other people (e.g., other group members), let's put some documentations in the readme file.

create a new documentation branch
document the initialization of OsomeTweet class, especially about the two different oauth methods
document tweet lookup
document user lookup ids and usernames
document utils.pause_until
document utils.chunker

Develop friends/followers methods

I will probably need to figure this out for the retweet cascade reconstruction part of the superspreaders project so I will have to look into this anyway. If anyone else wants to dig into this, please feel free to assign yourself and go for it because I probably won't get to this for a couple of weeks.

I liked the structural approach that Kaicheng took in his edits to the user lookup methods - having one base method and two user-facing methods (one for IDs and one for usernames) that feed into the base method - so I will try and do the same thing.

Please add any other thoughts you may have.

Reference

Base Method
Friends Method
Followers Method
Update README.md

Add the create friendship endpoint

Allows a user ID to follow another user: /2/users/:source_user_id/following (POST)

Reference: https://developer.twitter.com/en/docs/twitter-api/users/follows/api-reference/post-users-source_user_id-following

Include unused json parameter in _make_one_request

Currently the json parameter in this line is not used in the function.

To fix it, lines 138 (below) and 140...

osometweet/osometweet/oauth.py

Line 138 in 6ae26c8

response = self._oauth_1a.get(url, params=payload, stream=stream)

should be...

response = self._oauth_1a.get(url, params=payload, stream=stream, json=json)

Lists as input for the set_x_fields methods?

Right now all of the setting methods (set_place_fields, set_polls_fields, etc.) all take a string. What do we think about changing this to take a list of strings? We can handle the "stringification" within the method easily with a join. This seems more straight forward and less likely to lead to errors, IMO.

For example, we are assuming that people are going to build the url string correctly, but maybe it's best to do this for them to ensure it's correct?

If you guys are okay with this, I am happy to make these changes and can take care of #12 at the same time.

except KeyError:

Should be

except KeyError as e:

Update Wiki with osometweet.wrangle methods

The wrangle module was added, we need to add some wiki examples for each. Probably they should get their own page.

flatten_dict
get_dict_paths
get_dict_val

Clean up code with pylint

Manually review all scripts with pylint and clean up formatting
Unify docstrings

osome-iu / osometweet Goto Github PK

osometweet's Introduction

Introduction

Installation

Install the PyPI version

Use the newest features & local development

Requirements

Tests

How to seek help and contribute

Quick start

Learn how to use the package

Learn about Twitter V2

Example Scripts

osometweet's People

Contributors

Stargazers

Watchers

Forkers

osometweet's Issues

Controlling which authorization type you'd like to use

Tweet Lookup

User Lookup IDs

User Lookup Usernames

Get Followers

Get Following

Remove

Local development

Solution

Methods

Recommend Projects

Recommend Topics

Recommend Org