Git Product home page Git Product logo

rose's Introduction

Rose

Analyse all kinds of data for a TV series. Available as a webapp at rose-tv.herokuapp.com.

THUNDERWOMAN!

Rose (of Two and a Half Men) is a highly intelligent, deceiving and manipulative woman. In the beginning of the series she was nothing more than one of Charlie's one night stand however she quickly turned into his stalker, she has an obsessive nature and both loves and resents Charlie.

Rose (this repository) aims to be something similar. For a given TV series, it scrapes the following:

  • U.S viewers (in millions)
  • IMDB ratings.

Why

Two and a Half Men is one of the few shows available on Indian English channels, of which I had watched a few episodes during JEE days. I had the recent urge to finish the series. One observation everyone would make is as the season progressed, the last seasons really took a hit. Series finale was the worst, hitting the lowest the series had ever seen (IMDB 4.3).

I wanted to observe if there was any pattern here. Due to lack of proper existing tools and GraphTV going down, I had to take matter into my own hands.

Results

The results are being rendered via Google sheets charts, because they're interactive. Clicking on a certain image would help, because I couldn't embed google charts in iframes.

The first chart plots views for each episode across seasons. The second chart plots views per episode and average season views.

TV views

IMDB

The dataset is available here for viewing.

Observations

Charlie Sheen was one of the male lead for first 8 seasons, who was replaced by Ashton Kutcher. The script writing went horrible, and some correlation in the data was expected.

The data confers. Observing the number of views, S11 and S12 took a big hit. S09E01 saw a change in the lead thus the spike in views. It was expected to see a spike in views every season finale, but that was not necessarily true here. Seeing IMDB, which mostly confers to scriptwriting, Season 9 onwards became really bad, so Ashton wasn't really to blame.

Usage

# install dependencies
> pip3 install -r requirements.txt

> python3 scrape_views.py -h

optional arguments:
  -h, --help            show this help message and exit
  -i, --imdb            Display only imdb ratings
  -w, --wiki            Display only wikipedia US TV viewers
  -s SHOW, --show SHOW  Provide show name
  -b, --bar             Display bar chart or not
  -a, --avg             Display averaged chart or not
  -e EPI, --epi EPI     Provide Episode name
  -c, --cast            Displays Cast of the show

# Plot averaged IMDB ratings for a show
> python3 scrape_views.py -i -a -s 'Two and a half men'

# Fetch information for a single episode
> python3 scrape_views.py -s 'Two and a half men' -e 'S03E06'

# Fetch information for the star cast of the show
> python3 scrape_views.py -c -s 'Two and a half men'

More imdb plots are available in GALLERY.md.

Running the webapp

The codebase related to the webapp can be found at webapp/ folder. Further details are available there.

# Option 1: Use procfile
> heroku local web

# Option 2: Run via flask directly
> cd webapp
> python3 flaskwebapp.py

ToDos

  • Scrape number of seasons from wikipedia page.
  • Make show urls dynamic. (Search wikipedia page and imdb page only by knowing show name)
  • Print details of episode / movie playing in default video player, akin to x-ray from Amazon Prime.
  • Tests! (Works with other series, but need to detect cornercases)

Contributors can have a look at text_logs folder to get a sample of the download response. Before contributing, please checkout CONTRIBUTING.MD

License

The MIT License (MIT) 2018 - Kaustubh Hiware. Have a look at the LICENSE for more details.

rose's People

Contributors

agarwalrounak avatar harsh5557 avatar kaustubhhiware avatar kunalnpandey avatar s-ankur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rose's Issues

Fixing wiki's US viewers

As pointed out by @Harsh5557, as of Dec 12 2018, the wiki flag is broken. This is probably due to a change in formatting on wikipedia's side, and needs to be fixed urgently.

This is a high priority issue. This is relatively easy, the variables need to be printed out and seems to be a 2-3 hour task once an understanding of the code and wikipedia's relevant page is obtained.

Recreation:

$ python scrape_views.py -w -a -s "Two and a half men"
Detected wiki link: https://en.wikipedia.org/wiki/List_of_Two_and_a_Half_Men_episodes
Number of seasons 12
Season  1
		Episode 1
Traceback (most recent call last):
  File "/home/kaustubh/GitHub/rose/scrape_views.py", line 302, in <module>
    views, average = wikiscrape(wikiurl)
  File "/home/kaustubh/GitHub/rose/scrape_views.py", line 87, in wikiscrape
    numviews = float(filter(None, re.findall(floating_point, views))[0])
TypeError: 'filter' object is not subscriptable

(Webapp) : Minimum rating specified, interactive plots

Once the webapp is completed, and a plot is visualised for any show, we should have a bar / input field that takes an imdb rating for instance, and highlights the episodes with rating higher than the specified value.

Could proceed with displaying the plot first, and then highlighting the bars where value >= specified. Visualization to be discussed before sending PR.

Pointed out by Shalini Mukhopadhyay

Fetch episode specific data

It should be possible to fetch information for only one episode. That is, add functionality for another parameter, call it -e or --episode which allows to fetch information for that particular episode only.

Usage: python scrape_views.py -s 'Two and a half men' -e 'S03E02'

If the contributor wants, they could also handle this case (not a priority) :

python scrape_views.py -e -s 'Two and a half men S03E02'

The expected output is IMDB rating, US views if available and a link to the episode.

Pointed out by Shalini Mukhopadhyay

Get information of currently playing episode

Related to #3.

Say, you're watching an episode on VLC or some video player. The work here, would be a new script, call it rose_listen.py.

Fetches information of currently playing episode

This is possible via playerctl in linux. Let's focus on VLC for the time-being, due to its simplicity.
Here's a sample output

> playerctl metadata # when video is playing
{'mpris:trackid': <objectpath '/org/videolan/vlc/playlist/4'>, 'xesam:url': <'file:///run/media/kaustubh/Windows/Music/FILE_NAME.mp4'>, 'vlc:time': <uint32 202>, 'mpris:length': <int64 202896000>, 'vlc:length': <int64 202896>, 'vlc:publisher': <5>}

> playerctl metadata xesam:url
file:///run/media/kaustubh/Windows/Music/FILE_NAME.mp4

Getting episode information

Assume the file name has Show title, and "S05E12" somewhere present. Assume simplest case for initial development.

What information should be fetched?

  • Cast
  • IMDB rating
  • Quotes ? Information is available on IMDB page. See this.

Quotes fetching not a priority.

Webapp : Minor UI improvements

  • A smooth switcher to modify between imdb or wiki.
  • (Not going with this) Checkboxes for averages and barchart in the same line, along with the above switcher and submit button.
  • Tabulated display of ratings / views when displaying results
  • Interactive plots (separate issue: #5)
  • A spinning loader when user hits submit button until scrapping results are obtained.

Star cast of a show

Well one can also ask about the star cast of a show so why not modify the code a little bit to get the details.

Add A CONTRIBUTING.MD

To avoid problems like #15 with new contributors, I think we should add a contributors md or add something to the readme or maybe add a pull request template

Handle complex file names in rose_listen

Pointed out in #18

Currently, rose_listen.py assumes the file names for currently playing episode to be simple.

Need to extend to cover atleast the following cases:

  • Doctor.Who.S05e12.extraneoustext.mkv
  • Attack On Titan - S03E01 - Smoke Signal (720p) (x265) (E-Subs) [Baba_Bhayanak].mkv
  • Bojack Horseman - S01E01 [WEBRip 720p] [Daiki_Aomine].mkv
  • Brooklyn.Nine-Nine.S01E01.1080p.WEB-DL.DD5.1.H.264-NTb.mkv

Subtitles of current episode

Work to start after #7.

Operates in the umbrella of rose_listen.py --subs

This would provide user with two functionalities:
i) Downloading the subtitles of currently playing episode, and saves it in current location.
python rose_listen.py --subs --dl

ii) Displays in a terminal, appending as the time goes on, like a running commentory.
python rose_listen.py --subs
Say the script is started when 5 minutes of the episode has elapsed. The displayed subtitles should be synced in accordance with the video. This means time and speed both (If the video is being played at 1.5x, the subtitles should also be displayed at 1.5x).

iii) Language. Since we would already be fetching the subtitles, why not fetch it in the language the user wants? This would be a great tool for people wanting to learn other languages.
python rose_listen.py --subs --lang en
en would correspond to default English, if --lang is not provided, it's something the script should handle on its own. Language codes can be found here.

both bar plot and average are not handled in 1 command

Can we add functionality such that both we can run something like
python3 scrape_views.py -i -a -b suits
as a first time user that was my first command and I think many people would prefer to have both functionalities in the same command, right now after executing this command, the program gave the result of -b and became irresponsive afterward.

Implementing a webapp

Currently, we only have a script that runs on your system. It would be great if the implementation could be shifted to a webapp, based on flask or any light weight framework.

This is of highest priority.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.