Git Product home page Git Product logo

fifa18-even-more-player-data's Introduction

A more complete FIFA 18 player dataset

Forked from here, but this project no longer bears much resemblance to the original.

This repo contains both the dataset and the code used to scrape so-fifa.com.

See the dataset on Kaggle here

Why fork the original project?

  1. I wanted more fields. This dataset contains extra fields such as International Reputation, traits and specialities.
  2. Efficiency:
    • The original project all takes place in a Jupyter notebook. I've rebuilt the crawler as a Python package.
    • The original project runs synchronously. I can only imagine that this takes several hours.
  3. I've stored the data using the .feather format alongside .csv - this is more convenient for Python and R users
  4. Column names now only contain letters, numbers and underscores. This is safer for most analysis tools
  5. Cleanliness: I've pre-cleaned the data to a great extent. Most of this has involved converting strings to numerics (sometimes with some extra leg-work when dealing with inconsistent units)
  6. Fun

Future improvements:

  • There are still a few fields that could be added if anyone wants them, such as contract expiry date.
  • Scrapy might speed things up
  • Tests
  • Building an archive (EA updates its player data regularly and it would be useful to be able to make comparisons between old versions). We can go all the way back to FIFA 2007, and it shouldn't be too hard to do so, although the once-off execution time would be in the order of a few days if we scrape all past versions of the data.

fifa18-even-more-player-data's People

Contributors

kevinheavey avatar 4m4n5 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.