Git Product home page Git Product logo

bulldozer_competition's Introduction

Regression on Buldozers

UPDATE: This was our first simulated Kaggle competition at Zipfian (3 weeks in)

Welcome to the first weekly challenge, a summary mini-projects that comes to cement the info you drank from this weeks information fire hose.
You are in a group of 3 and your challenge is to predict the sale price of a particular piece of heavy equiment at auction, based on it's usage, equipment type, and configuaration. The data is sourced from auction result postings and includes information on usage and equipment configurations.

The key fields are in train.csv are:

  • SalesID: the uniue identifier of the sale
  • MachineID: the unique identifier of a machine. A machine can be sold multiple times
  • saleprice: what the machine sold for at auction (only provided in train.csv)
  • saledate: the date of the sale

There are several fields towards the end of the file on the different options a machine can have. The descriptions all start with "machine configuration" in the data dictionary. Some product types do not have a particular option, so all the records for that option variable will be null for that product type. Also, some sources do not provide good option and/or hours data.

Bonus points: The machine_appendix.csv file contains the correct year manufactured for a given machine along with the make, model, and product class details. There is one machine id for every machine in all the competition datasets (training, evaluation, etc.).

Evaluation

We are holding 10% of the data. The winning team be able to predict the lowest difference between at least 50% of the test data. The evaluation metric for this challenge is the RMSLE (root mean squared log error) between the actual and predicted auction prices. You will present your approach and results today at 6pm as a 5 min talk. Prepare slides.

Tools:

Before you dive into regression, algorithms and testing talk to your team mates and devise a strategy for analysing the data. Work effectively so that you can communicate your findings in a presentation. Use any of the tools we learnt this week (here are some suggestions...):

Use EDA techniques:

  • Visualize the data set and understand your variables.
  • Look for the categorical and continuous regressors.
  • Use faceting or stratification to identify colinearity.

Use the big guns:

  • Linear regression
  • Ridge regression
  • Lasso regression
  • Gradient descent
  • Logistic and Logit regression

Remove biases in data using:

  • Detecting and reducing Multicolinearity
  • Heteroscedasticity
  • Influence and leverage points, and outliers.

Test your predictions:

  • Use cross validation and k-fold means to test for overfitting

Good Luck!

bulldozer_competition's People

Contributors

amentch avatar jonathandinu avatar nyghtowl avatar

Watchers

 avatar  avatar  avatar

Forkers

afcarl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.