What steps will reproduce the problem?
Tried to use the learning.NearestNeighborLearner on the Sex Classification
dataset from this Wikipedia article on Naive Bayes classifiers:
http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Sex_Classification
What is the expected output? What do you see instead?
Program wouldn't run due to bugs in the implementation of NNLearner
What version of the product are you using?
Bug exists in r30
Please provide any additional information below.
Here's my sample code:
import learning
examples =
[[6,180,12,'male'],[5.92,190,11,'male'],[5.58,170,12,'male'],[5,100,6,'female'],
[5.5,150,8,'female'],[5.42,130,7,'female'],[5.75,150,9,'female']]
ds = learning.DataSet(examples)
nnl = learning.NearestNeighborLearner(2)
nnl.train(ds)
print nnl.predict([5.1,105,6.3])
And I would expect it to print 'female'.
I believe the following fixes should work:
old learning.py, lines 217 - 231
else:
## Maintain a sorted list of (distance, example) pairs.
## For very large k, a PriorityQueue would be better
best = []
for e in examples:
d = self.distance(e, example)
if len(best) < k:
e.append((d, e))
elif d < best[-1][0]:
best[-1] = (d, e)
best.sort()
return mode([e[self.dataset.target] for (d, e) in best])
def distance(self, e1, e2):
return mean_boolean_error(e1, e2)
new learning.py:
else:
## Maintain a sorted list of (distance, example) pairs.
## For very large k, a PriorityQueue would be better
best = []
for e in self.dataset.examples:
d = self.distance(e, example)
if len(best) < self.k:
best.append((d, e))
elif d < best[-1][0]:
best[-1] = (d, e)
best.sort()
return mode([e[self.dataset.target] for (d, e) in best])
def distance(self, e1, e2):
return mean_error(e1, e2)
Specifically:
1) changed 'examples' to self.dataset.examples.
2) changed e.append((d,e)) to best.append((d, e))
3) and I could be wrong, but I believe you wanted mean_error, not
mean_boolean_error in your distance function.
For the gender classification example, it seems to work great. Thanks!