justmarkham / scikit-learn-videos Goto Github PK

View Code? Open in Web Editor NEW

3.6K 3.6K 2.5K 1.51 MB

Jupyter notebooks from the scikit-learn video series

Home Page: https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn

CSS 0.44% Jupyter Notebook 99.56%

data-science jupyter-notebook machine-learning python scikit-learn tutorial

scikit-learn-videos's People

Stargazers

Watchers

Forkers

rpj911 annavmontoya aluna351 yiyingw wenliangz snowdj jhonasttan neuropil fujianhai butchland hidayat722 mbengoufa tarsam2009 krishnatray ababook davidcolbyreed jiajiaxingxing biwa222 ml-ai-nlp-ir mathsci vsingh58 muyiibidun ewellinger angelayuan siddhartharay007 flxc3r new-high-score alisoncossette fsgp shokuninsan drbaguiar divfor mitchshack techpartnerz arthyt ivo-me nunofernandes-plight cmadore nkhuyu yangspeaking seasons90 basilrormose sen15recess maythapk chengat1314 adonisbruno aimlnerd raghavendra990 muditrastogi m4ckr0 rahul-c1 chenzhongtao rmaheshkumarblr chenhq hatib72 rmudunuri dotran gaoch023 kamilgruca absarf perevales vanqm bricesh bjaus gizmo3d arkoneogy dkbradley madjelan lpalanisamy kgantsov cmccann11 jslmann qingkaikong acourtney2015 gvanzin miketam1021 tanthml godfanmiao huangrh duapraveen speedbird21 skkoobb ralphgragutaya arunkumarpt anuragism raslin hendryli mcervantes viennachen cv56 enshengdong pnpatel bardolfranck ranjeet-floyd btng manaranjanp mbourgedata roryneary raheja romainlopez

scikit-learn-videos's Issues

examine the class distribution of the testing set (using a Pandas Series method)

y_test.value_counts()

I get this:

AttributeError Traceback (most recent call last)
in ()
1 # examine the class distribution of the testing set (using a Pandas Series method)
----> 2 y_test.value_counts()

AttributeError: 'numpy.ndarray' object has no attribute 'value_counts'

Notebook links broken?

Hi! I just noticed that all the links to IPython notebooks show a 400 error when clicked on.

09_classification_metrics.ipynb data URL spioled

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)

->
temporary solution (N.B.: comment='#' in read_csv is important)

url = 'https://gist.githubusercontent.com/ktisha/c21e73a1bd1700294ef790c56c8aec1f/raw/819b69b5736821ccee93d05b51de0510bea00294/pima-indians-diabetes.csv'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names, comment='#')

Confusion Matrix setup in scikit-learn

Hi!
I wanted to start off by saying that your tutorials and videos are really great! so clear and simple!

I've been working on a binary classification problem for my school with scikit-learn and I have been scratching my head in regards to how it displays the confusion matrix. For instance I have as output

 [ [30  5]
    [2 42] ]

I noticed by looking at the classification report that scikit learn by default outputs the negative class first. This leads me to understand that the first list is the negative class and that the second is the positive class. However, what I don't understand how to interpret what each number stands for as in TP, FP, TN, FN.

TN(30) FN (5)
FP(2) TP (42)

Is this a current representation of the input above?

Thanks a bunch!

headers in Advertising.csv have changed

Error: 'unable to contact kernel' on Jupyter 3.2.x +

When running these notebooks on Jupyter 3.2.x or 4.2.x, I get the following error:

Failed to start the kernel

The 'None' kernel is not available. Please pick another suitable kernel instead, or install that kernel.

Note, the Kernel is running when I create a new notebook, so the problem seems to be related to incompatibility between these notebooks and Jupyter.

Environment

localhost

Here is my local Environment:

The version of the notebook server is 4.2.1 and is running on:

Python 3.5.1 |Anaconda 4.1.0 (64-bit)| (default, Jun 15 2016, 15:32:45)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

Current Kernel Information:

unable to contact kernel

mybinder.org

The version of the notebook server is 3.2.0-8b0eef4 and is running on:

Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 17:02:03)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

Issue with skvideo.io

Can someone let me know the issue with this?when i tried installing with anaconda and using it on to detect cars in a video using Haar Cascade Classifier.
cheers
.
/Users/mohan/anaconda3/lib/python3.5/site-packages/skvideo/init.py:356: UserWarning: avconv/avprobe not found in path:
warnings.warn("avconv/avprobe not found in path: " + str(path), UserWarning)

Show accuracy

Hi, I'm used KNN for my research but I don't know how to display accuracy of a result
This is my code:

# In[38]:
from sklearn import neighbors, metrics
from sklearn.neighbors import NearestNeighbors
import numpy as np
import pandas as pd
import sys
import json
import math

data    = pd.read_excel('dataset.xlsx')
data    = np.array(data.as_matrix())
# In[40]:
knn=neighbors.KNeighborsClassifier(n_neighbors=5)
# In[41]:
X = data[:,:-3]
Y = data[:,-1:]
Y = np.zeros(len(Y))
for i in range(0,len(Y)):
    if data[i,5] == 0:
        Y[i] = 0
    elif data[i,5] == 1:
        Y[i] = 1
    elif data[i,5] == 2:
        Y[i] = 2
    elif data[i,5] == 3:
        Y[i] = 3
    elif data[i,5] == 4:
        Y[i] = 4
# In[42]:
knn.fit(X, Y)
# In[49]:
result = knn.predict([[220.4, 6.39,1855]])
print(result)
# result = knn.predict(X)
# print(metrics.accuracy_score(Y[2000], result))

This is my dataset:

Thank you!

Meaning of cross_val_score output

Hi.

I have a question from scikit-learn-videos/07_cross_validation.ipynb. The output of the classification accuracy is usually several digits after the decimal e,g. 0.966666666667. If I multiply this value with the total number of observations i.e. 25, I will get 24.1666666667. What does this mean? That 24.1666666667 were classified correctly. Should not it give me a whole number? such as 24 maybe.

Self-Organizing Maps in Scikit Learn

Hello!

First of all, thankyou so much for this series and all the resources you have mentioned with them. I started out with machine learning a few months ago and after reading, searching online, I was still not able to grasp the core of the machine learning. Your videos made it really simple and easy to understand! Most of my confusions cleared up! I hope you keep making these videos.

Anyways, Can you please make or refer me a video tutorial or great resource on Self-Organizing Maps in Scikit Learn? It will be a great help!

Thankyou again!

type error

the line

print('{:^9} {} {:^25}'.format(iteration, data[0], data[1]))

gives a type error

print('{:^9} {} {:^25}'.format(iteration, data[0], str(data[1])))

solves the problem