stanfordio / gogettr Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 23.0 307 KB

Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.

License: Apache License 2.0

Python 100.00%

gogettr's People

Contributors

Stargazers

Watchers

gogettr's Issues

Hashtags don't really make sense

Hi,

The hashtags don't really make sense as they are full sentences starting with a hashtag. Strange, but it behaves the same way on the mobile before entering a key. Those hashtags won't work because of the spaces. What's the API you are using? i'd like to be able to provide a letter for the top hashstags for that letter.

GETTR post id does not consistently increase

Just curious if anyone else has noticed that the GETTR post id does not consistently increase in a user's timeline. I pulled the most recent 20 posts from a user using the following code:

client = PublicClient()
posts = client.user_activity(username="elisestefanik", max=20, type="posts")
for post in posts:
    print(post["_id"])

Here are the results. The oldest post is on the bottom.

puyfio07ae
puyl23a25b
puya6rd15c
puy99p209c
puy13dd6f9
puwxq28b5b
puwool7718
puwlwue81d
pux00k1c3a
pux3rocc56
pux0r51e80
puwhjz99e2
puwz4v8eff
puwvl67d70
puw00a95db
puwqhtd4ed
puwyd00b1b
puwvfme687
put13g39eb
pusdlv7b92

I assumed that the post id was a base 36 value that was always increasing over time, but when you start from the
bottom of the results and go forward through time you will see the id go from puw___ to pux___ and then back to puw___
Huh? Within the puw posts it goes from puwv___ to puwy___ to puwq___

This seems to present a problem when using the 'until' parameter. My use case involves keeping track of the most recent post id that was retrieved each day, and using that on the next day to make sure I only grab the new posts. This requires a value that consistently increases. Since the value bounces around it's very likely to miss some posts since the line in user_activity.py
if until is not None and until > id:
assumes that new posts always have a higher id.

Each post has a 'udate' available in the dictionary at post['udate'] which is the time in milliseconds since the epoch, UTC. This seems to consistently increase for each post. Maybe a parameter 'until_time' could replace or be an alternative to 'until'?

Can I POST to Gettr? Or is it just GET requests for the time being.

I'd like to contribute in Python with a library to make POST requests if this does not exist already.

user_activity retrieves <500 posts, retrieving more requires logging in

Gettr seems to have changed its public API, and now only allows ~500 posts to be retrieved without logging in. This is also apparent when looking at a user's timeline in a web browser: scrolling to the bottom of a user timeline when you're not logged in shows ~500 user posts and says "END" when you reach the limit, while scrolling to the bottom of the timeline when you ARE logged in shows all user posts (see attached image)

Adding an X-App-Auth header parameter from the logged-in user containing the username and a generated token allows you to retrieve all of a user's posts, i.e.:

HEADERS = {
    'X-App-Auth': json.dumps({
        'user': '$MY_USERNAME', 
        'token': '$MY_TOKEN'}),
}
 resp = requests.get(
    url,
    params=params,
    timeout=10,
    headers=HEADERS,
)

Is implementing a login flow within the scope of this project? I could potentially take a crack at implementing it.

Might be related to issue #21

errors

Hi, I am trying to get hashtags but the command isn't working. Is the script still working? I am getting errors such as the following one. The only command that is working for me is related to the users' info and pulling all posts (with another error copied below). Thanks again!

Traceback (most recent call last):
File "c:\users\ahmed\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\ahmed\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Ahmed\anaconda3\Scripts\gogettr.exe_main.py", line 7, in
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1128, in call
return self.main(*args, **kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\cli.py", line 41, in user
for post in client.user_activity(username, max=max, until=until, type=type):
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\user_activity.py", line 38, in pull
for data in self.client.get_paginated(
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\api.py", line 93, in get_paginated
data = self.get(*args, **kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\api.py", line 77, in get
raise GettrApiError(errors[-1]) # Throw with most recent error
gogettr.errors.GettrApiError

++++++++
Pull all posts error

File "c:\users\ahmed\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\ahmed\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Ahmed\anaconda3\Scripts\gogettr.exe_main.py", line 7, in
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1128, in call
return self.main(*args, **kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\cli.py", line 82, in all
for post in client.all(
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\all.py", line 46, in pull
result = futures.popleft().result()
File "c:\users\ahmed\anaconda3\lib\concurrent\futures_base.py", line 432, in result
return self.__get_result()
File "c:\users\ahmed\anaconda3\lib\concurrent\futures_base.py", line 388, in __get_result
raise self._exception
File "c:\users\ahmed\anaconda3\lib\concurrent\futures\thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\all.py", line 108, in _pull_post
if data["data"]["txt"] == "Content Not Found":
KeyError: 'txt

Getting a specific post's comments

I'm working on this feature on my branch.

Feedback welcome, maybe it's better if I do the iteration using a while content is available loop.

saving file+hashtags

Hi,
Thanks for the update! I have Windows 10, and I am running Python from the shell. The script is fetching user's posts now, but is there a way to save the data as a file e.g. JSON or CSV. I tried using filename.json in the end of the command, but it didn't work. Then I used --o filename.json which also didn't work

Regarding hashtag search, it does not seem to be working because of the command argument. I tried alternative formats e.g. gogettr hashtags [#selfie] OR gogettr hashtags [selfie] OR gogettr hashtags #selfie OR gogettr hashtags [OPTIONS] #selfie...etc, but I always get the same error: Error: Got unexpected extra arguments ([OPTIONS].
Thanks!
Ahmed

Gettr search returns 500 error

For the past 5 days, I get the following response for every search query:

request:

 GET /u/posts/srch/phrase with params {'max': 20, 'q': 'openai', 'sort': 'recent', 'offset': 0}

response:

{"_t":"xresp","rc":"ERR","error":{"_t":"xerr","code":"E_API_ERROR","emsg":"An error occurred","args":[]}}'}

Please let me know if you're experiencing the same thing or what I'm doing wrong.

KeyError: 'txt'

Similar to Issue #13 I receive the following KeyError message with the "all" command (plain or with --rev) after the first scraped post:

    if data["data"]["txt"] == "Content Not Found":
KeyError: 'txt'

If it's unclear how to resolve the error then reviewing this PHP-based API client on GitHub may be useful, as I'm able to scrape with that without any issues.

Unrelated: The quoted description contained in this project's "About" and README.md, i.e. '"non-bias [sic] social network,"' attempts to emphasize an ephemeral grammatical error in the App Store description that no longer exists, highlighting the project maintainers' condescension. This reflects poorly on StanfordIO and Stanford University.

Issues Using All API for Posts Greater than p7b5gh

When I try to use the all API with posts greater than p7b5gh, I start running into issues where I think that there are large numbers of indices seem to be missing.

e.g., running the following command:

gogettr all --max 1000 --first p7b5gh

Returns a single post.

I tried with a couple of much larger ids (copied from another issue), I got a similar result. I did the same thing with the module mode, and still no luck. I want to be respectful of their API and don't want to like brute force until I see more posts, but I am not sure how else to collect sets of posts for a given time period or like the next n posts after a specific _id.

Before that index, I don't seem to run in quite as many issues, though there are definitely gaps in the returned indices.

Do you have any recommendations for using all with larger indices, or should I switch to scraping posts for specific users, rather than specific points in time? Am I missing something? Do the indices change to a different base or something? Is this just a weird coincidence that I am reading into too much?

Thank you for taking a look, this tool is super helpful!

Improve Logging with named logger

firstly: thanks for providing this nice API!

I would like to use it in a small project but the INFO logs are spamming quite much.
Could you use a named logger instead of the root logger? For example by setting the logger in the api module like:
logger = logging.getLogger(__name__)
and consecutively using logger.INFO("awesome log") instead of logging.INFO(...)

This would enable users of your library to override the log level by using:
logging.getLogger("gogettr").setLevel(logging.WARNING)

Search returning a 500 error

Searches that worked in the past no long work.

Using parameters:
{'max': 20, 'q': 'openai', 'sort': 'recent', 'offset': 0}

I get:

{"_t":"xresp","rc":"ERR","error":{"_t":"xerr","code":"E_API_ERROR","emsg":"An error occurred","args":[]}}'}

Retrieving all followers/following for an account

Hey!

I have been trying to play around with the API to get the followers of some specific accounts, but I was wondering if GoGettr is able to get the whole list of followers for any account? I've been running some tests with a few accounts, and I usually get a fraction of the account's followers (for example, I get 1035 followers for @sloan4america, whereas he's supposed to have about 26k followers). I took a look at the gogettr.log file, and I noticed that the followers are retrieved by batches of 500 with the offset parameter. However, it seems that each batch generally has 20 followers. It may be that I am not using correctly, in which case please feel free to correct me.

Btw thanks again for all your hard work to develop this API, it is really an invaluable resource!

Output format / issues

The CLI client outputs JSON objects on new lines to stdout, which might not be ideal for parsing on other programs. Ideally the software would allow for a desired data formats to be set, for instance TSV/CSV instead of JSON. Also, outputting directly to a sqlite DB or other file.

Also, the JSON objects could be outputted with a comma at the end to make parsing on other programs easier.

What do you all think? I could work on it and submit draft PRs.

pypi install not working

➜  ~ pip install gogettr
ERROR: Could not find a version that satisfies the requirement gogettr
ERROR: No matching distribution found for gogettr

Update to support Gettr v2 API

@milesmcc do you have any plans to maintain this repo?

stanfordio / gogettr Goto Github PK

gogettr's People

Contributors

Stargazers

Watchers

Forkers

gogettr's Issues

Recommend Projects

Recommend Topics

Recommend Org