ironholds / pageviews Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 3.0 69 KB

An API client library for Wikimedia pageview data

License: Other

R 100.00%

pageview pageview-data wikipedia wikimedia r mediawiki

pageviews's Introduction

pageviews

An API client library for Wikimedia traffic data.

Author: Os Keyes
License: MIT
Status: Stable

pageviews provides data from the new Wikimedia RESTful API for pageview data. It allows you to retrieve per-article, per-project, and top-1000 pageview data covering a wide range of times and with filtering around the user's class and platform.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Installation

For the stable release:

install.packages("pageviews")

For the developer release:

devtools::install_github("ironholds/pageviews")

pageviews's People

Contributors

Stargazers

Watchers

Forkers

pachevalier mykebrowne elgranmontoya

pageviews's Issues

`pageview_timestamps()` leads to Bad Request (HTTP 400) error

Using pageview_timestamps() for the start or end parameters of article_pageviews() leads to a Bad Request (HTTP 400).

>>> ghviews <- article_pageviews(article = 'GitHub', 
                             end = pageview_timestamps(Sys.Date()))

Error in pv_query(parameters, ...) : Bad Request (HTTP 400).

If you replace pageview_timestamps(Sys.Date())) with '2016090500' it works fine. Seems like pageview_timestamps() returns 201609050100 which is two digits more than what works.

Am I missing something or doing something wrong, or is this a problem that needs fixing? (either due to a bug or a change in the API).

I know you said you weren't actively developing right now, but, if I fixed the function and sent a pull request, would it be accepted and pushed to the release version? No pressure, just asking to decide whether to put time/effort in to fix it.

Add function for querying all linked language versions of an article page.

It would be great for a function call like this to work:

#' @param linked_pages one or more other projects from which to pull linked articles from other project languages. "all" by default. only works for wikipedia domain.
article_pageviews(project = "de.wikipedia",
      article = "R_(Programmiersprache)",
      start = pageview_timestamps(as.Date("2015-09-01")),
      end = pageview_timestamps(as.Date("2015-09-30")),
      linked_pages = "all")

See here for an example function to implement this feature: https://github.com/petermeissner/wikipediatrend/blob/master/R/wp_linked_pages.R

start, end of article_pageviews

hi. first of all, thanks for the package. I'm not sure if I'm doing things wrong, but I'm running into an error when specifying the start and end date of article_pageviews.

For example when trying to download from a period starting first of January 2014 and ending first of April 2014:

article_pageviews(project = "en.wikipedia", article = "R (programming language)", platform = "all",  user_type = "all", start = "2014010100", end = "2014040100", reformat = TRUE)

Error in pv_query(parameters, ...) : client error: (404) Not Found

It runs smoothly with different parameters (for example with start = "2014100100", end = "2015100200"). However, I don't understand why it works here, but not with other parameters.

Thanks for you help.

special characters and URI encoding

Hi
I only found out that the error thrown at me has to do with encoding issues. After a quick search I found that the API requires URI-encoded articles.

URI <- function(x){
      x <- gsub("ü", "%C3%BC", x)
      x <- gsub("Ü", "%C3%9C", x)
      x <- gsub("ä", "%C3%A4", x)
      x <- gsub("Ä", "%C3%84", x)
      x <- gsub("ö", "%C3%B6", x)
      x <- gsub("Ö", "%C3%96", x)
      x <- gsub("é", "%C3%A9", x)
      x <- gsub("É", "%C3%89", x)
      x <- gsub("è", "%C3%A8", x)
      x <- gsub("È", "%C3%88", x)
      x <- gsub("ê", "%C3%AA", x)
      x <- gsub("Ê", "%C3%8A", x)
      x <- gsub("ë", "%C3%AB", x)
      x <- gsub("Ë", "%C3%8B", x)
 return(x)
 }

I'm sure there are more

I'm not sure, but it might be of some use to include some similar function in the package or mention it in the documentation.

Adding granularity="monthly" to article_pageviews

It looks from your code as though the data can be accessed in this manner but you do not offer it as option in package? Presumably if I was after a few months data for an article it would be quicker than getting all days and then processing

Vectorise functions?

Particularly top_articles. Should be trivial to vectorise.

test

Month = "all" / Day = "all" does not work for top articles function.

Example: top_articles(project = "en.wikipedia", year = "2015", month = "all", day = "all")
Error: The pageview data available does not cover the range you requested

I've tested this in the Wikipedia API explorer and it appears to be an issue on their side.
Just wanted to flag this in case anyone runs into the same issue.

article entry in article_pageviews

I'm trying to get pageviews from a list of wikipedia article titles, but besides the limitation by how the entry is written, I've been having problems with an article which has a wikipedia page, and is right writed, but the function doesn't work. This has happened with some entries, like for "Timbavati Private Nature Reserve" for example.

views<- article_pageviews(project = "en.wikipedia",
article = "Timbavati Private Nature Reserve",
user_type = "user",
start = "2015070100",
end = "2020080100",
granularity = "daily")

This code did not work. My original list has 227 wikipedia article titles, most of them run correctly (in a for loop), but some of them (and frequently it is hard to identify which one) did not work well.

Wikimedia functions highly fragmented across 4+ packages

What are your thoughts about the value to be had in merging together the set of related packages mentioned in this post: http://jlewis91.github.io/Wikipedia-Pageviews/ ?

Feel like there is a ton of overlap in functionality and lots of potential for using this set of related functions in complementary ways. Both from a maintenance perspective (all packages are basically API wrappers, so feature creep ought to be kept to a minimum), and there would be the added benefit that a user, interested in one facet of the wikipedia data, might then easily go on to try out another aspect.

Furthermore having such a high degree of fragmentation limits visibility as well as limiting coding best practices (not all packages have tests, are set up for CI, etc.). Just want to gauge your level of support (as author of three separate Wikimedia packages) before forking a bunch of stuff and pushing this forward.