random-forests / tutorials Goto Github PK

View Code? Open in Web Editor NEW

526.0 526.0 922.0 210 KB

License: Apache License 2.0

Jupyter Notebook 96.17% Python 3.83%

tutorials's People

Contributors

Stargazers

Watchers

Forkers

albertoimpl hello2all amilai rayed-therap wellwang mnrmja007 zhuyeqing harry-muzart larryting6280 kme2698 tyabh leonchen417 polinom simonanez nukesz nicolaszy jingwenjessica mrswiss nysthee jharrison23 sidneylisanza ratanraj michaeljshell kizombaciao tttciao jgmgit vinayakkankanwadi fitrialif sshv84 luhgit chaonet prasaadhsrinivasan contractorwolf gijoe911 ingviso sm746d johnpineda4 fsuplicy hotarumoon adam-mccabe twhyn ankit1543 matthewmazurek superxc1000 cowinartclass mariohoh sanchitv7 mgoodhand lrsperanza michiqwq timodie githubwua devin97 otnlbxbpcmu vinaybhawsar baoyunfan tapasrm phyllip clytras melancholysheep02 aron2560 castlebin xujyan mcustiel lampts binarycook onthelake haitaozhao alexmuller666 cschiri vincentchu12 hassandayoub prathap17 idhamhalim krk27 shakedunay adarbha 3ndung kedamawimulualem ioanszilagyi shivamshukla01 niteshsrivastava009 daucloudlab kaizoku99 dufferinmall gonulreyhan vovant sharanamruthsai aemach bici-sancta aungnlin sandeepravutla473 craterkamath dhnguyen68 apooravg techjollof moeab gjbaker barisyazici cpotdevin

tutorials's Issues

Use Pandas Dataframe instead of a python 2D Matrix

It is obvious that a program Pandas Dataframe is more useful as compared to one in 2D matrix.
So please translate the code to pandas Dataframe or I could do that for you.

A suggestion on is_numeric function

Hi, Josh Gordon
Thanks for your great sharing, it helped me get a much better understanding of decision tree.
I have a suggestion on the function ''is_numeric''
In your example, there is no bool column in the training data, so 'is_numeric' function works fine, yet if i add a bool column in the dataset, is_numeric(True) will be true
so i suggest change the
function into the following to take bool value into account
def is_numeric(value): return type(value) in (float,int)
thanks~

LinearClassifier does not have weights_

In the latest version of tensorflow (1.3.0) the LinearClassifier object does not have a weights_ member. Instead, the weights have to be retrieved as follows:

weights = classifier.get_variable_value("linear//weight")

In ep7 classifier.fit() raises "IndexError: invalid index to scalar variable"

Full disclosure: I'm not using the Docker image, but working in my own environment on a Mac (10.12.2) with Python 2.7 (via Homebrew) and Tensorflow 0.12.1.

I'm going through the code for Episode 7, where I get an "IndexError: invalid index to scalar variable" on the line classifier.fit(data, labels, batch_size=100, steps=1000). Here's my code in full (exactly same as the tutorial):

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
learn = tf.contrib.learn

mnist = learn.datasets.load_dataset('mnist')
data = mnist.train.images
labels = np.asarray(mnist.train.labels, dtype=np.int32)
test_data = mnist.test.images
test_labels = np.asarray(mnist.test.labels, dtype=np.int32)

max_examples = 10000
data = data[:max_examples]
labels = labels[max_examples]

feature_columns = learn.infer_real_valued_columns_from_input(data)
classifier = learn.LinearClassifier(feature_columns=feature_columns, n_classes=10)
classifier.fit(data, labels, batch_size=100, steps=1000)

Here's the complete error:

Traceback (most recent call last):
  File "/Users/mbaytas/Dropbox/works-code/ml-studies/google-recipes/ep7-mnist/ep7.py", line 38, in <module>
    classifier.fit(data, labels, batch_size=100, steps=1000)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/linear.py", line 446, in fit
    max_steps=max_steps)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 191, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 355, in fit
    max_steps=max_steps)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 733, in _train_model
    max_steps=max_steps)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 300, in _monitored_train
    _, loss = super_sess.run([train_op, loss_op], feed_fn() if feed_fn else
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_io/data_feeder.py", line 407, in _feed_dict_fn
    out[i] = _access(self._y, sample)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_io/data_feeder.py", line 208, in _access
    return data[iloc]
IndexError: invalid index to scalar variable.

Printing the prediction in tensor flow not working

I am trying to follow your seventh episode and trying to print the prediction using tensor flow as depicted :

print ("Predicted %d, Label: %d" % (classifier.predict(test_data[0]), test_labels[0]))

But I am getting the following error :

print ("Predicted %d, Label: %d" % (classifier.predict(test_data[0]), test_labels[0]))
TypeError: %d format: a number is required, not generator

How to fix it ?

The number of Columns

Hi,
There is a mistake in the calculation of the # of columns in the ipython book.
'''n_features = len(rows[0]) - 1 # number of columns''' does define the number of samples -1, not the number of columns.
This should be replaced with:
n_features = rows.shape(1) for getting the number of columns.
The code in the example works, because of the number of the rows and columns are not way too off from each other.

TypeError: 'int' object is not subscriptable

the code works fine!
However I am a newbie and tried to make editing the training data a bit more easy so I've created a file training_data.data as:
Green,3,Apple
Yellow,3,Apple
Red,1,Grape
Red,1,Grape
Yellow,3,Lemon
and then import this with:
import pandas as pd
training_data = pd.read_csv('training_data.data', header=-1)

and now I gut the error TypeError: 'int' object is not subscriptable.

InvalidArgumentError (see above for traceback): tensor_name = linear//weight

print ("Predicted %d, Label: %d" % (classifier.predict(test_data[0]), test_labels[0]))

the below error occurred.
InvalidArgumentError (see above for traceback): tensor_name = linear//weight; shape in shape_and_slice spec [1,10] does not match the shape stored in checkpoint: [784,10]
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]

Although classifier.evaluate(test_data[0:1,:], test_labels[0:1]) is working..
{'accuracy': 1.0, 'global_step': 1000, 'loss': 0.010729363}

Error in decision tree code

Thanks for the tutorial, it is very great, waiting for the random forest video.
I have run the code, and face different results for the impurity than the expected ones.
when removing **2 , it worked well .
bu when completing to the next sections I got different best question, so I do not know.
I think that the correct choice is to remove #**2
Thanks,