rasbt / python-machine-learning-book Goto Github PK

The "Python Machine Learning (1st edition)" book code repository and info resource

License: MIT License

TeX 0.04% Python 0.97% HTML 0.03% CSS 0.01% Jupyter Notebook 98.97%

machine-learning machine-learning-algorithms logistic-regression data-science data-mining python scikit-learn neural-network

python-machine-learning-book's Issues

Cant import using pickle ch9

When i try to read back the classifier on page 254 i get the following error. I have done like in the book the whole way and things have worked find until now. Any idea what has gone wrong?

Im using ipython 4.2.0

AttributeError                            Traceback (most recent call last)
<ipython-input-4-f050da95a5cf> in <module>()
----> 1 import codecs, os;__pyfile = codecs.open('''/var/folders/yh/mm1bdmx9073_b15lw69b2qmh0000gn/T/py71220g7y''', encoding='''utf-8''');__code = __pyfile.read().encode('''utf-8''');__pyfile.close();os.remove('''/var/folders/yh/mm1bdmx9073_b15lw69b2qmh0000gn/T/py71220g7y''');exec(compile(__code, '''/Users/henke/Documents/code/python/python-ml/movieclassifier/main.py''', 'exec'));

/Users/henke/Documents/code/python/python-ml/movieclassifier/main.py in <module>()
      4 from vectorizer import vect
      5 
----> 6 clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
      7 
      8 import numpy as np

AttributeError: Can't get attribute 'tokenizer' on <module '__main__'>

the Breast Cancer Wisconsin dataset is not available

In chapter6, the Breast Cancer Wisconsin dataset is not available now.
Maybe it is broken link.

currently

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

should be

df = pd.read_csv('http://mlr.cs.umass.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)

I'm sorry if I'm wrong.

Chapter 9 live demo not working

Hi,
The link to the live example application ( http://raschkas.pythonanywhere.com/ ) is not working.
There's a "Coming soon" message, as if the page did not exist.

IndexError: too many indices for array in CH-5 PCA Plot Code

I am having an issue in executing the for generating the graph of the PCA in Chapter-5 of the book Python Machine Learning. I tried debugging but I am not able to understand what the problem in the code is.

Kindly provide support for this issue.

AttributeError: Can't get attribute 'tokenizer_porter' on <module 'main' (built-in)>

trying to run gs_lr_tfidf.fit(X_train, y_train) and got the AttributeError

Running on jupyter notebook, python 3.5

Hello Kaggle! It is a guide for new kaggler

https://github.com/stevekwon211/Hello-Kaggle
It is a Kaggle Guide Document for someone who is new at Kaggle!

Notebook error

Opening the first chapter file ch01.ipynb result in the following error:

"Unreadable Notebook: /home/antonio/libro-machine-learning/ch01.ipynb NotJSONError("Notebook does not appear to be JSON: '\n\n\n\n\n\n\n<html lang...")"

python version: 3.7 from anaconda distribution.

Python Machine learning

typo in chapter 12

In this file, the code loads the names as

labels_path = os.path.join(path, 
                           '%s-labels-idx1-ubyte' % kind)
images_path = os.path.join(path, 
                           '%s-images-idx3-ubyte' % kind)

However the linked .gz file has the names with a period, not a hyphen. It should be

labels_path = os.path.join(path, 
                           '%s-labels.idx1-ubyte' % kind)
images_path = os.path.join(path, 
                           '%s-images.idx3-ubyte' % kind)

https://www.reddit.com/r/learnpython/comments/6qc9t1/path_to_existing_file_in_root_folder_not_found_on/

Incorrect results printed for MLPGradientCheck

I was running MLPGradientCheck, but the results are different from example
When I tried to locate the problem, I found that a3 in MLPGradientCheck.fit is always "nan"
Is that normal? how could I fix it

python crushes when running theano code

I am trying to run the following code from the book in jupyter notebook with everything updated. However, every time python crushes and the kernel restarts. Everything is fine before this point. Any thought?
ps. using 32bit and gpu, tried dmatrix, no luck
chapter 13

import numpy as np
x = T.dmatrix(name='x')
x_sum = T.sum(x, axis=0)
calc_sum = theano.function(inputs=[x],outputs=x_sum)
ary = [[1,2,3],[1,2,3]]
print('column sum:',calc_sum(ary))

Add Code Examples

Just bought this book and I can't find the source code of the examples in the book. I bought on Amazon and gone to Packtpub page as suggested on the book but even the zip I download from them is only a mirror to this repository. Just images, no code for the examples in the book. It's really annoying to have to type every single example by hand.

Confusion in chapter 2

In chapter 2 you have some code for a simple perceptron model.

On page 27, you describe the code.

the net_input method simply calculates the vector product wTx

However, there is more than a simple vector product in the code:

def net_input(self, X):
    """Calculate net input"""
    return np.dot(X, self.w_[1:]) + self.w_[0]

In addition to the dot product, there is an addition. The text does not mention anything about what is this + self.w_[0]

Can you (or anyone) explain why that's there?

thanks,
-trevor

How to Use the code files

Hi, I am extremely new to Python and I understand how to write basic commands and stuff.
I got the Code files for the book but I am not able to understand how to use them for learning.
All of them seem to be in Text format.
How can I use them as code to make a new file in which I can just have the Code instead of all the text.
I just wanted to see how the code runs but I am unable to understand what this code is and how to extract things which I want instead of having to remove all the "" and /n and other formatting elements.
Thanks.

Issues when installing Keras on Windows 10 64bit machines

Hi,

I had an issue when installing Keras on Windows 10 64bit machine (as described in ch13) but this did not work. I have posted the solution in a step by step here in this blog post:

install keras on windows 10 x64 bit machine
@rasbt : feel free to add it to the notes of the labs in GitHub.

Thanks.

Reinforcement Learning - Where Art Thou?

Wonderful book; learning a ton! Question: In first chapter, you explain the three types of learning (supervised, unsupervised and reinforcement). Usually the third is not listed. So, I search your text for other material on RL but found none. Future chapter in next edition? Future book? In your other resources, are there links about RL with a scikit-learn style? Love Karpathy's blog on "Pong from Pixels".

AttributeError: 'SGDClassifier' object has no attribute 'max_iter'

Hello! Thank you for this amazing gift to everyone!

My issue is with Chapter 9's movie_classifier_with_update via python app.py. I am able to enter my sample review and get predicted class label and probability. The issue arises when I click "Correct"/"Incorrect" for the classification.

It is almost assuredly due to the issue of versions of Python (3.5 needed) and Sklearn (0.19 needed) as indicated here: https://www.pythonanywhere.com/forums/topic/11716/

It'd be nice to keep this current though and I will send a PR if I ever figure out how to update it for Python 3.6 and Sklearn 0.20!

mlxtend no longer has tf_classifier

I wanted to run your code that compares TensorFlow with SKLearn but it no longer works.
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/tensorflow-vs-scikitlearn.md

In addition, your mlxtend package no longer has tf_classifier and consequently no TfSoftMaxRegression.

Would you have an updated resource by any chance?

Broken link in FAQ

General Questions section of FAQ

How do Data Scientists perform model selection? Is it different from Kaggle?

The web link is broken.

Thank you for the beautiful book.

pip install SomePackge

Hello,

I currently facing the attached issue.

Is this package only available to Linux users? I'm on Windows.

Chapter: Combining weak to strong learners via random forests [sample size]

Via the sample size n of the bootstrap sample, we control the bias-variance tradeoff of the random forest. By choosing a larger value for n, we decrease the randomness and thus the forest is more likely to overfit. On the other hand, we can reduce the degree of overfitting by choosing smaller values for n at the expense of the model performance.

To me this implies that I should choose sample size n, that is smaller than N (original training set size).

In most implementations, including the RandomForestClassifier implementation in scikit-learn, the sample size of the bootstrap sample is chosen to be equal to the number of samples in the original training set, which usually provides a good bias-variance tradeoff

But the above got me confused: If we choose n = N, then aren't we overfitting unless the algorithm is bootstrapping aggressively - repeating the values many times over?

i need to find somebody who can automate a trading strategy

im new to the site, if you can or cant if you could send me in the right direction it would greatly appreciated

Couldn't get desired output from Adaptive linear neuron implementation

Hi,
I was trying out one of the example in Chapter2, under title: Implementing an adaptive linear neuron in Python (Link to notebook).
The problem is when I plot decision boundaries, whole area is shown red.

When I change output = self.activation(X) to output = self.predict(X)inside fit function, the problem seems to be gone.

Is there an issue with the code or the code is correct and I made some other mistake while implementing?

Thanks
Sohaib

Chapter 8: Shuffling the DataFrame in newer versions of pandas

Just a note in case it's helpful to anyone else - I seemed to be getting 100% accuracy with the on-line sentiment analysis classifier (pages 246-246), but it turned out to be because the code used to shuffle the dataset before exporting it to CSV on page 235 hadn't worked.

In the version of pandas I'm using (0.23.4), it looks like df.index.values is needed in order to get the indexes of a DataFrame as a list. So, this:

df = df.reindex(np.random.permutation(df.index))

now needs to be this:

df = df.reindex(np.random.permutation(df.index.values))

Hope that helps someone!

Errata update question

Regarding your remark:

[2015-10-20] Good news! I just heard back from the publisher; all the typos and errors which are listed below will be fixed by next week.

I bought the ebook yesterday (O'Reilly, not PACKT) and found some errors. Up to now, they are in the errata (v2), but not yet fixed in my fresh copy. Can you say something about the current state, are the updates for immediate PACKT customers only?

edit:
Interestingly, my copy stands the test on page viii (so I have Classifiers there), but for example not the one regarding the inverted 'y' variants (with and w/o caret) on page 22 and also the following errors (p. 23) are still present.

ValueError: operands could not be broadcast together with shapes (200,) (30000,)

I am working on a finite element code in python. It is originally for the diffusion equation but I want to modify it for the wave equation and include a ricker source term. I tried adding the source term and it produces an error. Below is the code and error

from IPython import display
from matplotlib.tri import Triangulation, LinearTriInterpolator

deltat = 0.001
numIterations = 30
mass = numpy.zeros((NPOINTS,NPOINTS))
stiffness = numpy.zeros((NPOINTS,NPOINTS))
phi = numpy.zeros((NPOINTS,))
phi_old = numpy.zeros((NPOINTS,))

f0= 5 # Center frequency Ricker-wavelet
q0= 100 # Maximum amplitude Ricker-Wavelet
t=np.arange(0,numIterations,deltat) # Time vector

tau=np.pif0(t-1.5/f0)
q=q0*(1.0-2.0*tau2.0)*np.exp(-tau2)

xi = np.linspace(0, L, 200)
yi = np.linspace(0, H, 200)
Xi, Yi = np.meshgrid(xi, yi)

updateMatrix(mass,stiffness,phi)
mat = mass/deltat + stiffness
triang = Triangulation(points[:,0], points[:,1])

for iteration in range(1,numIterations+1):
phi_old = phi

rhs = numpy.dot(mass/deltat, phi_old)
rhs = rhs + q
phi = numpy.linalg.solve(mat,rhs)

interpolator = LinearTriInterpolator(triang, phi)
zi = interpolator(Xi, Yi)
fig1 = pylab.figure(1)
pylab.imshow(zi)

fig2 = pylab.figure(2)
xanal, yanal = analytical(numIterations*deltat)
pylab.plot(xanal,yanal,"-")
pylab.plot(Xi[100,:],zi[100,:])
fig2.savefig("comparison.png",format="PNG")

ValueError Traceback (most recent call last)
in
29
30 rhs = numpy.dot(mass/deltat, phi_old)
---> 31 rhs = rhs + q
32 phi = numpy.linalg.solve(mat,rhs)
33 interpolator = LinearTriInterpolator(triang, phi)

ValueError: operands could not be broadcast together with shapes (200,) (30000,)

ch.06(Tuning hyperparameters via grid search)

I'm now learning machine learning using the Japanese translation of this book, and when I run this program, I always get stuck on the part using sklearn.svm.

When the program do the part"gs=gs.fit(X_train,y_train)", it always show the past two graphs infinitely. I don't know the reason, so tell me what may be the cause.

My PC's spec:
Window10, python3.6.5, scikit-learn0.19.1

Chapter 2 (Rosenblatt Perceptron): "misclassifications per epochs" on p. 30 are misleading

First things first: I absolutely like how you motivate, introduce and implement the relevant concepts in your book.

I think there is a problem with the Rosenblatt perceptron learning description (evaluation) as presented in the Figure on page 30 in the book. The errors that are counted in the variable errors are the number of updates that are performed in one epoch. However, this number does not represent the number of misclassifications after each epoch. For instance, if you use your standard options but train only for one iteration, there will be two updates ("2 errors" according to your terminology), however, all items will be classified as -1 (Setosas), therefore, there are 50 misclassification and this classifier's error rate is actually 50%.

Machine Learning (Python)

Implementation of AdalineSGD

Hi,

First of all, thanks for your nice book, Python Machine Learning

I began to read it right now and I am wondering one thing about the implementation of AdalineSGD mentioned in book

    def fit(self, X, y):
        
        self._initialize_weights(X.shape[1])
        self.cost_ = []
        for i in range(self.n_iter):
            if self.shuffle:
                X, y = self._shuffle(X, y)
            cost = []
            for xi, target in zip(X, y):
                cost.append(self._update_weights(xi, target))
            avg_cost = sum(cost) / len(y)
            self.cost_.append(avg_cost)
        return self

    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights"""
        output = self.net_input(xi)
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error**2
        return cost

I think, the way to update self.w_[1:] in AdalineSGD is in fact the same as implementation of the batch AdalineGD, just implemented in different ways

            output = self.activation(X)
            errors = (y - output)
            self.w_[1:] += self.eta * X.T.dot(errors)

IMO, self.eta * X.T.dot(errors) operates on entire matrix X in AdalineGD, however AdalineSGD operates on row by row via for-loop (for xi, target in zip(X, y)) of the same X. It doesn't reflect the essential diff between AdalineGD and AdalineSGD as you mentioned in book

operands could not be broadcast together with shapes (5,5) (10,5)

I am trying to execute MLPGradientCheck.fit in chapter 12
But got an error
The code is copied from source code in Chapter 12
Could you please tell me where I am wrong and how to fix it?

SoftmaxRegression - zero_init_weight missing

Hello,
I think the function zero_init_weight is missing.
I searched the github site but did not find it.
Maybe this is another version of the softmax-regressor and here it is missing?

Best Regards, Thomas

Chapter 2: confusion b/w perceptron code and SGD code

In the perceptron part of the code, I see:

for xi, target in zip(X, y):
  update = self.eta * (target - self.predict(xi))
  self.w_[1:] += update * xi
  self.w_[0] += update

In the SGD part I see something similar except that everytime the new gradient point is calculated, the data is shuffled:

X, y = self._shuffle(X, y)
for xi, target in zip(X, y):
  cost.append(self._update_weights(xi, target))

def _update_weights(self, xi, target):
  """Apply Adaline learning rule to update the weights"""
  output = self.net_input(xi)
  error = (target - output)
  self.w_[1:] += self.eta * xi.dot(error)
  self.w_[0] += self.eta * error

I do not see any difference between the two except for the shuffling part and the part that one is binary value and the other is a real value (SGD). Did I misunderstand how fundamentally the weights are calculated for SGD and simple perceptron model. Ofcourse if there was a mini batch implementation, the code would have looked a lot more like adaptive linear neurons. But since you are taking sample by sample, they are implemented similarly?

Plotting iris data in Ch02 assumes that the data is in a particular order

In chapter 2, where the iris data is plotted on a scatterplot,

# extract sepal length and petal length
X = df.iloc[0:100, [0, 2]].values

# plot data
plt.scatter(X[:50, 0], X[:50, 1],
            color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1],
            color='blue', marker='x', label='versicolor')

it is simply assumed that the first 50 rows belong to the label setosa, and the next 50 to versicolor. The scatterplot should be generated using the labels(which are in the 5th column of the dataset)

Updated ebook not yet available from O'Reilly

Just a heads up on this -- I checked my O'Reilly account, and they did not yet have the updated version.

I'll post here once it appears.

Kernal PCA [projecting new data points]

So you have this:

X, y = make_moons(n_samples=100, random_state=123)
alphas, lambdas =rbf_kernel_pca(X, gamma=15, n_components=1)

Then you take a sample from X:
x_new = X[25]

And then find the projection for the new sample from:

x_reproj = project_x(x_new, X, 
...       gamma=15, alphas=alphas, lambdas=lambdas)

But x_new was already a part of alphas and lambdas created using X. In other words, X already had x_new when the rbf_kernel_pca was applied. So should I be surprised that the projected value of x_new coincides exactly in the plots? I would have thought it might have been better to exclude x_new to derive alpha and lambda values and then apply project_x. Thoughts?

Numpy Future Warning when using plot_decision_regions function

Sebastian,

I've been collecting my own data and have applied the plot_decision_regions function several times to my data but I am running into a problem with this new data. The problem is occurring here:

#plot class samples
    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
                    alpha=0.8, c=cmap(idx),
                    marker=markers[idx], label=cl)

My enumerated object is: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
So 5 classifications hot encoded.

From what I understand, this list comprehension passes over my X_train_pca data five times and uses the boolean comparison y == cl to plot all my data points with five different colors as it passes through the markers and colormap.

Upon running, I get the warning:

FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],

The really weird part is the values in the array: X[y==cl, 0]
They now look like: [-0.4277726 -0.4277726 -0.44362509 ..., -0.4277726 -0.4277726 -0.4277726 ]
With shape (9784,) which is the original length of my X_train_pca data. (I believe it should be closer to about a fifth since most of my data is similar in length and I checked np.shape after the loop ran.)

To give a visual my data looks like this.

When it should be separated into colors with a spread looking like this.

I can't really think through the problem anymore probably due to a misunderstanding of what this future warning is trying to tell me. I am wondering if you have any ideas as to what might cause this behavior.

Images

Missing code in chapter 11

Hi, below code is from the book but missing in the cell In[17] of ch11.ipynb.

row_clusters = linkage(df.values, method='complete', metric='euclidean')

ValueError: operands could not be broadcast together with shapes (400,2) (400,)

Dear concerns : I am extracting features from wav , using PLP , this ( Pyhton 3.6 -Anaconda Spyder ) after execute i am facing error in this line

File "C:\ProgramData\Anaconda3\lib\site-packages\sidekit\frontend\features.py", line 399, in power_spectrum
ahan = framed[start:stop, :] * window

ValueError: operands could not be broadcast together with shapes (400,2) (400,)

#!usr/bin/python
import numpy.matlib
import scipy
from scipy.fftpack.realtransforms import dct
from sidekit.frontend.vad import pre_emphasis
from sidekit.frontend.io import *
from sidekit.frontend.normfeat import *
from sidekit.frontend.features import *
import scipy.io.wavfile as wav
import numpy as np



def readWavFile(wav):
        #given a path from the keyboard to read a .wav file
        #wav = raw_input('Give me the path of the .wav file you want to read: ')
        inputWav = 'C:/Speech_Processing/2-Speech_Signal_Processing_and_Classification-master/feature_extraction_techniques'+wav
        return inputWav
#reading the .wav file (signal file) and extract the information we need
def initialize(inputWav):
        rate , signal  = wav.read(readWavFile(inputWav)) # returns a wave_read object , rate: sampling frequency
        sig = wave.open(readWavFile(inputWav))
        # signal is the numpy 2D array with the date of the .wav file
        # len(signal) number of samples
        sampwidth = sig.getsampwidth()
        print ('The sample rate of the audio is: ',rate)
        print ('Sampwidth: ',sampwidth)
        return signal ,  rate
def PLP():
        folder = input('Give the name of the folder that you want to read data: ')
        amount = input('Give the number of samples in the specific folder: ')
        for x in range(1,int(amount)+1):
                wav = '/'+folder+'/'+str(x)+'.wav'
                print (wav)
                #inputWav = readWavFile(wav)
                signal,rate = initialize(wav)
                #returns PLP coefficients for every frame
                plp_features = plp(signal,rasta=True)
                meanFeatures(plp_features[0])
#compute the mean features for one .wav file (take the features for every frame and make a mean for the sample)
def meanFeatures(plp_features):
        #make a numpy array with length the number of plp features
        mean_features=np.zeros(len(plp_features[0]))
        #for one input take the sum of all frames in a specific feature and divide them with the number of frames
        for x in range(len(plp_features)):
                for y in range(len(plp_features[x])):
                        mean_features[y]+=plp_features[x][y]
        mean_features = (mean_features / len(plp_features))
        print (mean_features)

def main():
        PLP()

main()

ValueError: operands could not be broadcast together with shapes

Chapter 15: Padding modes figure

On page 500 (second edition: September 2017) there is a figure illustrating Full, Same and Valid padding and how the pixel patches map to the feature maps.

The feature map of the valid padding example is only 2x2. It specifies a 5x5 pixel input, 3x3 filter and a stride of 1. The feature map should be of size 3x3.

Performing hierarchical clustering on a distance matrix

I understood the concept of complete linkage .. however in the example you provided I did not understand the values in the table with columns 'row label 1', 'row label 2' etc ..

For example what do the numbers (0-7) under the first two columns : 'row label _' represent?
On the first step to create clusters, when you just have points, how do you go about creating clusters? You attempted to explain that via the example, but if you could expand on your example I would really appreciate it.

Typo/Clarification on Ch02.ipynb

There is an Additional Note (1) section where it says: " If all the weights are initialized to 0, only the scale of the weight vector, not the direction."

Seems there is some missing meaning in that sentence. Was wondering if you could correct it please. Thank you very much!

Error in chapter 8 code

def tokenizer(text):
    return text.split()

from nltk.stem.porter import PorterStemmer
porter = PorterStemmer()
def tokenizer_porter(text):
    return [porter.stem(word) for word in text.split()]

from nltk.corpus import stopwords
stop = stopwords.words('english')

X_train = df.loc[:25000,'review'].values
y_train = df.loc[:25000,'sentiment'].values
X_test = df.loc[25000:,'review'].values
y_test = df.loc[25000:,'sentiment'].values
from distutils.version import LooseVersion as Version
from sklearn import __version__ as sklearn_version
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
if Version(sklearn_version) < '0.18':
    from sklearn.grid_search import GridSearchCV
else:
    from sklearn.model_selection import GridSearchCV

tfidf = TfidfVectorizer(strip_accents=None,
                        lowercase=False,
                        preprocessor=None)

param_grid = [{'vect__ngram_range': [(1, 1)],
               'vect__stop_words': [stop, None],
               'vect__tokenizer': [tokenizer, tokenizer_porter],
               'clf__penalty': ['l1', 'l2'],
               'clf__C': [1.0, 10.0, 100.0]},
              {'vect__ngram_range': [(1, 1)],
               'vect__stop_words': [stop, None],
               'vect__tokenizer': [tokenizer, tokenizer_porter],
               'vect__use_idf':[False],
               'vect__norm':[None],
               'clf__penalty': ['l1', 'l2'],
               'clf__C': [1.0, 10.0, 100.0]},
              ]

lr_tfidf = Pipeline([('vect', tfidf),
                     ('clf', LogisticRegression(random_state=0))])

gs_lr_tfidf = GridSearchCV(lr_tfidf, param_grid,
                           scoring='accuracy',
                           cv=5,
                           verbose=1,
                           n_jobs=-1)
gs_lr_tfidf.fit(X_train,y_train)

Hi,
I get an error about "can't get attribute tokenizer_porter" ,
what's the problem you think?

Cannot run a line of code

When I run this script in python notebook:

https://github.com/rasbt/python-machine-learning-book/blob/master/code/optional-py-scripts/ch02.py

The last line (ada.partial_fit(X_std[0, :], y[0])) gives the error:

<main.AdalineSGD at 0x10a89fac8>

Adding the iris.data file back into the repo

Can the iris.data file be added back into the repo on master?

Here's the last version I believe:
https://github.com/rasbt/python-machine-learning-book/blob/194e34f245abb97f53d0e72166ab6785d01a1e94/code/datasets/iris/iris.data

Thanks again!

question about in [29]

dear sir..
i try to study Machine Learning through your book 『Python Machine Learning』, and it's very nick book!
i can't understand how you to set up 『param_grid』
i try to get the information from sklearn but it just say『dict or list of dictionaries』
even the sample , it just write 『param_grid=....』
So ... about the 『param_grid』 how do i set it up!
i am sorry my english is a little week!
i hope i can let you know my question and thank you very much!

TypeError: can't multiply sequence by non-int of type 'float'

I am getting this error at the np.dot for the Iris data set. Can you explain the solution ?

Following is traceback :
Traceback (most recent call last):
File "Perceptron.py", line 61, in
ppn.train(x, y)
File "Perceptron.py", line 24, in train
update = self.eta * (target - self.predict(xi))
File "Perceptron.py", line 35, in predict
return np.where(self.net_input(X) >= 0.0, 1, -1)
File "Perceptron.py", line 32, in net_input
return np.dot(X, self.w_[1:]) + self.w_[0]

Windows 10, ImportError: cannot import name 'plot_decision_regions'

Hello,

I was trying to execute the code:

%matplotlib inline
from sklearn.linear_model import LogisticRegression 
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from mlxtend.evaluate import plot_decision_regions

iris = load_iris()
y, X = iris.target, iris.data[:, [0, 2]]  # only use 2 features
lr = LogisticRegression(C=100.0, 
                        class_weight=None, 
                        dual=False, 
                        fit_intercept=True,
                        intercept_scaling=1, 
                        max_iter=100, 
                        multi_class='multinomial', 
                        n_jobs=1,
                        penalty='l2', 
                        random_state=1, 
                        solver='newton-cg', 
                        tol=0.0001,
                        verbose=0, 
                        warm_start=False)
lr.fit(X, y)
plot_decision_regions(X=X, y=y, clf=lr, legend=2)
plt.xlabel('sepal length')
plt.ylabel('petal length')
plt.show()

but it returned following error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-9b78ac9a656a> in <module>()
      3 from sklearn.datasets import load_iris
      4 import matplotlib.pyplot as plt
----> 5 from mlxtend.evaluate import plot_decision_regions
      6 
      7 iris = load_iris()

ImportError: cannot import name 'plot_decision_regions'

I installed mlxtend package. What am I doing wrong? Could You help me? Thanks in advance!

rasbt / python-machine-learning-book Goto Github PK

python-machine-learning-book's Issues

Recommend Projects

Recommend Topics

Recommend Org