otosky / medium_stats Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 5.0 760 KB

Command Line and Python tool for Scraping Your Medium Stats

License: GNU General Public License v3.0

Python 100.00%

medium_stats's Introduction

Hi 👋, I'm Oliver Tosky!

medium_stats's People

Contributors

Stargazers

Watchers

Forkers

yiuhyuk peterfriese alexandrev daveflynn plotnikov-blox

medium_stats's Issues

Fetch more than 50 stories (get_all_story_overview)

This is a fantastic library, thanks for building it!

I've got a publication with more than 50 articles, and would like to fetch stats for all of them. However, it seems like it's not possible to fetch more than 50 at the moment due to Medium's pagination - see you comment here:

medium_stats/medium_stats/scraper.py

Line 297 in ef96a20

# TODO: need to figure out how pagination works after limit exceeded

I am happy to test this using the publication I manage, if this helps.

Unable to scrape any stats: 404

Issue

Until a week ago scraping publication stats worked.
Suddenly, last week, it stopped working.

Command:

medium-stats scrape_publication -u <username> -s <pubname> --output_dir . --sid "<mySID>" --uid "<myUID>" --all

The error:

Traceback (most recent call last):
  File "/Users/dave/data-projects/marketing-pipeline/venv/bin/medium-stats", line 8, in <module>
    sys.exit(main())
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/__main__.py", line 220, in main
    data = sg.get_all_story_overview()
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/scraper.py", line 294, in get_all_story_overview
    data = self._decode_json(response)
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/scraper.py", line 146, in _decode_json
    return json.loads(cleaned)["payload"]
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

JSON is expected, but not returned.

Expected result

medium_stats would output the stats to ./stats_export/<publication>

Debugging steps

Changed cookie
Tried with VPN on/off
Dumped the response from the server and it seems to be a 404 page (though I can load the publication stats page directly

Anyone else running into issues, got a workaround? Or is Medium updating its stats pages?

Issue with the creation date

Hi,

Here

medium_stats/medium_stats/scraper.py

Line 277 in dbe4164

creation = self.attrs_json['metadata']['activeAt']

, you assume that the activeAt is the creation date. Unfortunately, it doesn't. It is the time where you get your last follower.

As a consequence, when you get a new follower and you can set up a start date prior to this date anymore because of

medium_stats/medium_stats/scraper.py

Line 281 in dbe4164

if self.start < creation:

Adding story topics from story settings page

It would be awesome to get the story topics for a specific story as it makes it easer to see which story topics were getting the most traction during a given time. It would then help any user to focus on content around the topics which are getting the most hits.

Story topics can be fetched from : https://medium.com/p/<POST_ID>/settings

I would be glad to contribute towards this issue. Would really appreciate if you can push me in the right direction.

Add aliases in medium_creds.ini file

I think it's common for people to be part of multiple publications. Right now, you have to make a separate section for each and copy the same sid/uid values, which would all have to be updated when they change.

It would be great to have a field like aliases to list the names of publications the values in a section also apply to.

doesn't seem to install via pip?

I'm sure I'm just missing something. I'm fairly new to python.

I'm running pip install medium_stats (and also medium-stats) but when I run the sample script included in the readme.md (with my info) it errors with:

File "<directory path>/medium_stats.py", line 3, in <module> from medium_stats.scraper import StatGrabberUser
ModuleNotFoundError: No module named 'medium_stats.scraper'; 'medium_stats' is not a package

This is happening on 2 different systems where other python scripts are running fine.

scrape_publication seems to require full URL

When I'm trying to scrape a publication under a custom URL, that works fine. But a regular publication under medium.com seems to require the full URL (at least including the domain).

So this works:
scrape_publication -u medium.com/name-of-publication
while this doesn't:
scrape_publication -u name-of-publication

The error is socket.gaierror: [Errno 8] nodename nor servname provided, or not known, which to me suggests that the script doesn't prepend the medium.com/ bit.

Canonical name of publications for credits matching, etc.

When scraping a publication, it seems that the credits file is matched against the URL the user specifies, which might include the medium.com/ part or not. There should be a canonical name for a publication (IMHO just the part of the URL after https://medium.com/) that is used everywhere. That way, even if I specified the full URL, it would still match the creds based on just the canonical name.

otosky / medium_stats Goto Github PK

medium_stats's Introduction

Hi 👋, I'm Oliver Tosky!

medium_stats's People

Contributors

Stargazers

Watchers

Forkers

medium_stats's Issues

Issue

The error:

Expected result

Debugging steps

Recommend Projects

Recommend Topics

Recommend Org