Git Product home page Git Product logo

medium_stats's Introduction

Hi ๐Ÿ‘‹, I'm Oliver Tosky!

Linkedin: anmol GitHub followers

medium_stats's People

Contributors

daveflynn avatar otosky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

medium_stats's Issues

Fetch more than 50 stories (get_all_story_overview)

This is a fantastic library, thanks for building it!

I've got a publication with more than 50 articles, and would like to fetch stats for all of them. However, it seems like it's not possible to fetch more than 50 at the moment due to Medium's pagination - see you comment here:

# TODO: need to figure out how pagination works after limit exceeded

I am happy to test this using the publication I manage, if this helps.

Unable to scrape any stats: 404

Issue

Until a week ago scraping publication stats worked.
Suddenly, last week, it stopped working.

Command:

medium-stats scrape_publication -u <username> -s <pubname> --output_dir . --sid "<mySID>" --uid "<myUID>" --all

The error:

Traceback (most recent call last):
  File "/Users/dave/data-projects/marketing-pipeline/venv/bin/medium-stats", line 8, in <module>
    sys.exit(main())
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/__main__.py", line 220, in main
    data = sg.get_all_story_overview()
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/scraper.py", line 294, in get_all_story_overview
    data = self._decode_json(response)
  File "/Users/dave/data-projects/marketing-pipeline/venv/lib/python3.10/site-packages/medium_stats/scraper.py", line 146, in _decode_json
    return json.loads(cleaned)["payload"]
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/dave/.pyenv/versions/3.10.9/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

JSON is expected, but not returned.

Expected result

medium_stats would output the stats to ./stats_export/<publication>

Debugging steps

  • Changed cookie
  • Tried with VPN on/off
  • Dumped the response from the server and it seems to be a 404 page (though I can load the publication stats page directly

Screenshot 2024-06-11 at 11 14 37โ€ฏAM

Anyone else running into issues, got a workaround? Or is Medium updating its stats pages?

Adding story topics from story settings page

It would be awesome to get the story topics for a specific story as it makes it easer to see which story topics were getting the most traction during a given time. It would then help any user to focus on content around the topics which are getting the most hits.

Story topics can be fetched from : https://medium.com/p/<POST_ID>/settings

I would be glad to contribute towards this issue. Would really appreciate if you can push me in the right direction.

Add aliases in medium_creds.ini file

I think it's common for people to be part of multiple publications. Right now, you have to make a separate section for each and copy the same sid/uid values, which would all have to be updated when they change.

It would be great to have a field like aliases to list the names of publications the values in a section also apply to.

doesn't seem to install via pip?

I'm sure I'm just missing something. I'm fairly new to python.

I'm running pip install medium_stats (and also medium-stats) but when I run the sample script included in the readme.md (with my info) it errors with:

File "<directory path>/medium_stats.py", line 3, in <module> from medium_stats.scraper import StatGrabberUser
ModuleNotFoundError: No module named 'medium_stats.scraper'; 'medium_stats' is not a package

This is happening on 2 different systems where other python scripts are running fine.

scrape_publication seems to require full URL

When I'm trying to scrape a publication under a custom URL, that works fine. But a regular publication under medium.com seems to require the full URL (at least including the domain).

So this works:
scrape_publication -u medium.com/name-of-publication
while this doesn't:
scrape_publication -u name-of-publication

The error is socket.gaierror: [Errno 8] nodename nor servname provided, or not known, which to me suggests that the script doesn't prepend the medium.com/ bit.

Canonical name of publications for credits matching, etc.

When scraping a publication, it seems that the credits file is matched against the URL the user specifies, which might include the medium.com/ part or not. There should be a canonical name for a publication (IMHO just the part of the URL after https://medium.com/) that is used everywhere. That way, even if I specified the full URL, it would still match the creds based on just the canonical name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.