Indian premiere League is a is a professional Twenty20 cricket league, contested by ten teams based out of ten Indian cities.
Dataset is used from kaggle: https://www.kaggle.com/ramjidoolla/ipl-data-set
The notebooks are on kaggle here: https://www.kaggle.com/ramjidoolla/ipl-data-set/code?datasetId=872854&sortBy=dateRun&tab=profile
The goal is to be able to predict the winner of next matches from past records and the teams playing.
Cleaning: In cleaning nulls are handled, and values are inspected to see best way to deal with every null value
- EDA: Get insights from data to get the best features in predection
- from 2008 to 2019 teams are not playing in every season; Some teams played just one season and some just missed two seasons.
- Some rivals have a lot of matches between each others and others meats rarely
-
Every city has its own champion team which has the most winning percentage in it
-
Teams vary in thier winning rates against other teams which can be a good feature to determine how hard the match is
Most important features are time dependent in this time series data. The most important features are pair rivals with the country
The data is small, so automl was used to devolop a base model on normal featuers and a good model on the new features and compare results