Git Product home page Git Product logo

python-mwviews's People

Contributors

geohci avatar guyrosin avatar halfak avatar hall1467 avatar milimetric avatar nettrom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-mwviews's Issues

Pagination not implemented for project_views

Hi, I was trying to wrap my head around the Wikimedia REST API, when I found this repo. It looks like there is a pagination problem, at least with the method project_views. Step to reproduce:

p = PageviewsClient(user_agent="<[email protected]> Foo Bar")
a = p.project_views(['it.wikipedia'], granularity='hourly', start='2016010100', end='2022060700')

len(a) # 56,377 items
# However most of those are empty, This library inserts all the dates in the range, 
# but then there are no data coming from the API for all the time span

# E.g.:
a[datetime.datetime(2021, 2, 11, 15, 0)] #  {'it.wikipedia': None}

Data about such dates are definitely present in the API. The problem is that such an endpoint returns at most 5k items per-call along with an undocumented pagination token (see here for more). Probably, there should be a way to pass it to the API to get the next page (still trying to figure out how).

Enforce high-volume access acceptable usage patterns

The API documentation requests/requires that high-volume users:

  • Don't perform more than 500 requests/s to this API
  • Set a unique User-Agent header that allows us to contact you quickly. Email addresses or URLs of contact pages work well.

I would like to not worry about following this policy when I use this tool.

Is there a way to search by QID?

I have a csv with a list of Wiki QIDs, but frustrating no more direct information about their wikipedia pages. Does mwviews have a setting to search by QID?

Thank you!

Keys in data structure returned by article_views should match the title strings passed in

Example:

>>> p.article_views('en.wikipedia', ['Bug report'], granularity='monthly', start='20220101', end='20220228')
defaultdict(<class 'dict'>, {datetime.datetime(2022, 1, 1, 0, 0): {'Bug_report': 262}, datetime.datetime(2022, 2, 1, 0, 0): {'Bug_report': 266}})

It's somewhat confusing that I can't use the titles I passed to the function ('Bug report') to index the data I get back. Instead I need to reverse-engineer the munging the applied by the mwviews client.

key meanings

For single article_view I get dict with items like this:

datetime.datetime(2016, 10, 23, 0, 0): {
    'D': 1066,
    'N': 942,
    '_': None,
    'a': 6,
    'e': 2,
    'i': 7,
    'l': 8,
    'm': 24,
    'n': 8,
    'o': 1,
    'p': 9,
    't': 8,
    'y': 4
}

What is the meaning of these keys?

Instantiating PageviewsClient(parallelism=1) gives same result.

pip install broken?

(Apologies if this is just a quirk of my local setup, but I noticed there was a recent change to the way the project is packaged, and thought that might be related to the issue I'm having.)

I ran pip install mwviews and it appeared to install successfully. But if I try to run something like python -m mwviews, I get No module named mwviews.

pip show mwviews shows the package as being installed. Its location is given as: Location: /home/colin/.local/lib/python3.6/site-packages (as expected)

However, my site-packages does not have a mwviews subdirectory. And if I try to use pip to uninstall mwviews, the list of files to be removed looks like...

  /home/colin/.local/bin/mwviews
  /home/colin/.local/lib/python3.6/site-packages/api/__init__.py
  /home/colin/.local/lib/python3.6/site-packages/api/__pycache__/__init__.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/api/__pycache__/pageviews.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/api/pageviews.py
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/INSTALLER
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/LICENSE.txt
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/METADATA
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/RECORD
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/WHEEL
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/entry_points.txt
  /home/colin/.local/lib/python3.6/site-packages/mwviews-0.2.0.dist-info/top_level.txt
  /home/colin/.local/lib/python3.6/site-packages/utilities/__init__.py
  /home/colin/.local/lib/python3.6/site-packages/utilities/__pycache__/__init__.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/utilities/__pycache__/aggregate.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/utilities/__pycache__/fetch_global_namespaces.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/utilities/__pycache__/util.cpython-36.pyc
  /home/colin/.local/lib/python3.6/site-packages/utilities/aggregate.py
  /home/colin/.local/lib/python3.6/site-packages/utilities/fetch_global_namespaces.py
  /home/colin/.local/lib/python3.6/site-packages/utilities/util.py

In other words, it looks to me like the contents of the package is getting exploded into the top-level site-packages directing, rather than its own proper subdirectory. (In fact, it turns out that as a workaround I can do something like from api import PageviewsClient rather than from mwviews.api import PageviewsClient)

Querying a page title with a questionmark

When I query daily statistics on a page containing a questionmark, like p.article_views(u'nl.wikipedia',['Wie is de mol?'],start=startday,end=endday) it will result in zeroes as number of page views.

other special characters seem to fail

Speciaal:Zoeken does work, but 'Speciaal:MyPage/zeusmodepreferences.js' is not working.
This page is showing up in the todaystoplist = p.top_articles('nl.wikipedia' , limit=100) but fails on the p.article_views('nl.wikipedia' ,['Speciaal:MyPage/zeusmodepreferences.js'],start=startday,end=endday)
on line 99 in article_views.It says KeyError: u'\xe9'

Implement "mwviews aggregate"

Aggregates page view counts from hourly page view files into a single pageview file

Usage:
    aggregate (-h|--help)
    aggregate <hour-file>... 
              [--namespaces=<path>]
              [--projects=<prefixes>]

Options:
    -h --help               Prints this documentation
    <hour-file>             Path to an pageviews hourly file to process
    --namespace=<path>      Path of a file produced by
                            `fetch_global_namespaces` for processing 
                            namespace prefixes (e.g. "Talk:...")
    --projects=<prefixes>   A "|" separated list of project prefixes that 
                            should be included in the output 
                            (e.g. "en|en.mw")

Implement `fetch_global_namespaces`

Fetches a JSON file containing information about all namespace 
names and aliases across all wikis using action=sitematrix and 
action=query&meta=siteinfo.  This file is used later by 
`aggregate` to parse page names into (namespace, title) 
pairs. 

Usage:
    fetch_global_namespaces (-h|--help)
    fetch_global_namespaces <api-host>

Options:
    -h --help   Prints this documentation
    <api-host>  URL for the MediaWiki host to query for 
                action=sitematrix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.