Git Product home page Git Product logo

figure-skating-scores's Introduction

ISU Figure Skating Score Sheets as Structured Data

At the end of each competition it oversees, the International Skating Union releases a PDF containing all scores given for each performance. That report is known as a "Protocol," and an example can be found here. The code in this repository downloads a series of protocol PDFs, and then extracts structured data from the scoring sheets they contain.

Currently, the data in this repository includes every major international competition from October 2016 through December 2017. You can find a list of those 17 competitions below.

Competitions Included

2016โ€“17 season:

  • ISU GP 2016 Progressive Skate America (Oct. 20-23, 2016)
  • ISU GP 2016 Skate Canada International (Oct. 27-30, 2016)
  • ISU GP Rostelecom Cup 2016 (Nov. 4-6, 2016)
  • ISU GP Trophee de France 2016 (Nov. 11-13, 2016)
  • ISU GP Audi Cup of China 2016 (Nov. 17-20, 2016)
  • ISU GP NHK Trophy 2016 (Nov. 25-27, 2016)
  • ISU Grand Prix of Figure Skating Final 2016 (Dec. 8-11, 2016)
  • ISU European Figure Skating Championships 2017 (Jan. 23-29, 2017)
  • ISU Four Continents Championships 2017 (Feb. 14-19, 2017)
  • ISU World Figure Skating Championships 2017 (Mar. 27 - Apr. 2, 2017)

2017โ€“18 season:

  • ISU GP Rostelecom Cup 2017 (Oct. 20-22, 2017)
  • ISU GP 2017 Skate Canada International (Oct. 27-29, 2017)
  • ISU GP Audi Cup of China 2017 (Nov. 3-5, 2017)
  • ISU GP NHK Trophy 2017 (Nov. 10-12, 2017)
  • ISU GP Internationaux de France de Patinage 2017 (Nov. 17-19, 2017)
  • ISU GP 2017 Bridgestone Skate America (Nov. 24-26, 2017)
  • Grand Prix Final 2017 Senior and Junior (Dec. 7-10, 2017)

Data

The structured data in this repository is available in two formats:

CSV Structure

The CSV-formatted data is split up into four files:

  • programs.csv: One row for each program at each competition, e.g., the "ICE DANCE FREE DANCE" at the "Grand Prix Final 2017 Senior and Junior". Each row includes a reference to the source PDF.

  • performances.csv: One row for each skater/team, for each program.

  • judged-aspects.csv: One row for each "executed element" and "program component", for each performance at each competition.

  • judge-scores.csv: One row for each judge, for each judged aspect, for each performance at each competition.

Data Dictionary

  • programs.csv:

    • competition: The name of the competition, e.g., "ISU European Figure Skating Championships 2017".
    • program: The name of the program, e.g., "LADIES SHORT PROGRAM".
    • pdf: The filename of the corresponding Protocol PDF.
  • performances.csv:

    • performance_id: An ID unique to each performance in a program of a competition. Autogenerated for the CSV files.
    • competition: The name of the competition, e.g., "ISU European Figure Skating Championships 2017".
    • program: The name of the program, e.g., "LADIES SHORT PROGRAM".
    • name: The name(s) of the skater(s).
    • nation: The home country of the skater(s).
    • rank: Final place in the program.
    • starting_number: The order in which the skaters skated.
    • total_segment_score: The total score for the program.
    • total_element_score: The total score of all elements in the program.
    • total_component_score: The total score of all components in the program.
    • total_deductions: The total deductions given by the technical panel for the performance.
  • judged-aspects.csv:

    • aspect_id: A ID unique to each element or component during a skater's performance. Autogenerated for the CSV files.
    • performance_id: See above.
    • section: The type of aspect; either element or component.
    • aspect_num: The positional order of the aspect within the performance and section.
    • aspect_desc: Shorthand notation for the aspect. For instance, a double lutz would be marked 2Lz.
    • info_flag: A marking by the technical panel, such "<" for an under-rotated jump.
    • credit_flag: An "X" in this column means that the skater received "credit for highlight distribution" for that element, which increases the base value.
    • base_value: The base number of points for the performed element.
    • factor: The amount by which the component score is multipled to calculate its final value.
    • goe: The overall translated Grade of Execution (GOE) given by the judging panel.
    • scores_of_panel: The judging panel's total score for the aspect.
  • judge-scores.csv:

    • aspect_id: See above.
    • judge: The identifier assigned to the judge, e.g., "J1".
    • score: The GOE (for elements) or score (for components) awarded by the judge for the aspect.

Downloading the PDFs

This repository does not contain the PDFs themselves.

You can, however, find a list of the URLs of each PDF in the scripts/urls.txt file.

To automate the process of downloading the PDFs, download or clone this repository to your computer, navigate to the repository's root directory, and run sh scripts/download_pdfs.sh.

Extracting the Data Yourself

If you'd like to re-run the data-extraction scripts yourself, do the following:

  • Download or clone this repository to your computer
  • Navigate to the repository's root directory
  • Download the PDFs, per the instructions above
  • Ensure that you have Python 3 installed
  • Install the required libraries (ideally in a Python 3 virtual environment) by running pip3 install pandas==1.2.4; pip3 install -e git+https://github.com/jsvine/[email protected]#egg=pdfplumber
  • Run make reproduce

That last step will clear all previously-extracted data, re-run the PDF-to-JSON and JSON-to-CSV extractions.

That process will overwrite the data/parsing-log.txt file, which contains a transcript of each page that has been parsed, and whether the parser found any score sheets on that particular page.

Licensing

All code in this repository is available under the MIT License. All data files are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Questions / Feedback

Contact Jeremy Singer-Vine [email protected] and John Templon at [email protected].

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.

figure-skating-scores's People

Contributors

jsvine avatar

Watchers

Robert Bongart (MSc MSc MA) avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.