Git Product home page Git Product logo

competitive-data-science's Introduction

Materials for "How to Win a Data Science Competition: Learn from Top Kagglers" course

This repository contains programming assignments notebooks for the ML course about competitive data science.

competitive-data-science's People

Contributors

aguschin avatar aksub99 avatar dmitryulyanov avatar geffy avatar iamkissg avatar mahendrakariya avatar maxbeaudoin avatar piyushver avatar vishalshar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

competitive-data-science's Issues

run on Google Colab

Hi,
has someone got this running on Google Colab?

There should be two things we need: the grader package and the data files.
-Anton

A bug in generating lag features

There is a bug in week 4 programming assignment notebook where you generate lag features
'After creating a grid, we can calculate some features. We will use lags from [1, 2, 3, 4, 5, 12] months ago.'

The lag features are correct for only target_lag_{} (target_lag_1,2,3 ...) and incorrect for any other lag features.

I documented that bug and the fix in here. Fixing this bug helps me boost my score in LB tremendously.
https://gist.github.com/anhquan0412/330494b051f74eacad3917f43e3ba43a

data files missing for Reading Materials

The 3 EDA files:
EDA_video2.ipynb
EDA_video3_screencast.ipynb
EDA_Springleaf_screencast.ipynb

refer to data files which cannot be found. I like to run my own notebooks locally to try different parameters.

KNN sanity check

I think there is an issue in the sanity check

test_knn_feats = NNF.predict(X_test[:50])

print ('Deviation from ground thruth features: %f' % np.abs(test_knn_feats - true_knn_feats_first50[44:45]).sum())

Shouldn't it be:

test_knn_feats = NNF.predict(X_test[44:45])

print ('Deviation from ground thruth features: %f' % np.abs(test_knn_feats - true_knn_feats_first50[44:45]).sum())

Because else we are comparing the wrong rows.

Grader not instantiated

In compute_KNN_features at the last cell of the notebook the grader is not instantiated consequently cannot submit.

This works:

Annotation 2020-05-01 234000

Mislabeling of section in compute_KNN_features

Hi,

in compute_KNN_features, honours assignment week 4, inside get_features_for_one, it says:

"2. Same label streak: the largest number N, such that N nearest neighbours have the same label."

I find the task label to be very misleading. Literally it means to check the max number of neighbours within the array with the same label. I would reformulate the task label as:

"2. Same label streak: the largest number N, such that the first N nearest neighbours have the same label."

I hope you can understand my point.

Thanks for the great course!

Alessandro

TypeError: argument of type 'NoneType' is not iterable

Whenever I try submitting the assignement I get the error: TypeError: argument of type 'NoneType' is not iterable in line 60 of grader.py

I think the code in line 58 should be if response.status_code == 201: and not if request.status_code == 201:

Please check this Iam unable to submit my assignment due to this issue.
Attaching a screenshot.

Capture

Observation in the final exercise of week-1 task (Pandas Basics)

Hi. The instruction of the final task of PandasBasics says -

What was the variance of the number of sold items per day sequence for the shop with shop_id = 25 in December, 2014? Do not count the items, that were sold but returned back later.

So the code that followed for getting the rows filtered according to the given condition is -

transactions[(transactions.shop_id==25) & (transactions.date.dt.year==2014) & \
                              (transactions.date.dt.month==12) & (transactions.item_cnt_day>0.0)]

I included (transactions.item_cnt_day>0.0) to only consider items with no returns. And then take the variance accordingly. But this does not seem to work. When I omit the condition I get the answer to be correct. Could you please tell me where I am getting this wrong?

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.