Hello, I'm trying to convert a simple of project of silent (little n

<a href="https://colab.research.google.com/drive/12N3_oB2VLdNiDalHsJrhRPNj-cAqQ9D0?usp

porting an example from tensorflow about yggdrasil-decision-forests HOT 4 CLOSED

prashant-saxena commented on May 18, 2024

porting an example from tensorflow

from yggdrasil-decision-forests.

Comments (4)

rstz commented on May 18, 2024

Hi, you can train directly on multi-dimensional numpy data as explained in the documentation: https://ydf.readthedocs.io/en/latest/tutorial/multidimensional_feature

The super short version of it is (with random data)

import ydf
num_examples = 10000
num_rows = 20
train_data = np.random.uniform(size=(num_examples, num_rows))
train_label = np.random.randint(0, 2, size=(num_examples))

train_ds = {"features": train_data, "label": train_label}

model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)

test_data = {"features": np.random.uniform(size=(1, num_rows))}

model.predict(test_data)

from yggdrasil-decision-forests.

prashant-saxena commented on May 18, 2024

Hi,
Thanks for the tip.
I have tried as you suggested but prediction values are like random values between 0.0 and 1.0, not at all useful.

from yggdrasil-decision-forests.

prashant-saxena commented on May 18, 2024

Ok, Here is the test. Extract files(train.npy, test.npy) from the attached zip file

import numpy as np
import ydf

train_data = np.load('train.npy')
train_label = np.random.randint(0, 2, size=(train_data.shape[0]))

print(train_data.shape)

train_ds = {"features": train_data, "label": train_label}
model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)
test_data = {"features": np.load('test.npy')}

predictions = model.predict(test_data)
print(predictions)

For the same data, TensorFlow's predictions are 99% correct but ydf's predictions look random to me. Am I missing something
here?
ydf.zip

from yggdrasil-decision-forests.

achoum commented on May 18, 2024

This notebook shows how to train a model on this dataset and make predictions with a Random Forest and a Gradient Boosted Trees model. The notebook also runs a cross-validation to evaluate the quality of predictions on this small dataset.

The model self evaluation (model.describe() ; out-of-bag accuracy of 53%) and cross-validation (learner.cross_validation(train_ds) ; accuracy=50%, AUC=51%) shows that the input features are virtually not correlated with the labels.

You mention that with "TensorFlow's predictions are 99% correct". Are you sure you are using the same dataset? If so, are you sure you are not evaluating on the training dataset?

from yggdrasil-decision-forests.

Recommend Projects

porting an example from tensorflow about yggdrasil-decision-forests HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent