smartcorelib / smartcore Goto Github PK

A comprehensive library for machine learning and numerical computing. The library provides a set of tools for linear algebra, numerical computing, optimization, and enables a generic, powerful yet still efficient approach to machine learning.

Home Page: https://smartcorelib.org/

License: Apache License 2.0

Rust 100.00%

classification clustering machine-learning machine-learning-algorithms model-selection regression rust rust-lang scientific-computing statistical-learning statistical-models

smartcore's People

Stargazers

Watchers

smartcore's Issues

Implement hierarchical clustering

Motivation: why do we need hierarchical when we have already kmeans?

Vocabulary:

divisive clustering: ...
agglomerative clustering: average, weighted, median, centroid, Ward

Sub-tasks:

pick one or a minimal set of metrics-distances
pick one or a minimal set of linkage strategies
pick one or more algorithms (SLINK for single-linkage and CLINK for complete-linkage clustering)

Visualisations: (?)

Other implementations:

hierarchical clustering in SciPy
hierarchical clustering in sklearn

Implement Recursive Feature Elimination (RFE) algorithm

Implement RFE. Match Scikit Learn's implementation.

These are parameters we should support:

estimator
n_features_to_select
step
verbose

Parallelize Random Forest training and evaluation

Right now neither random_forest_classifier nor random_forest_regressor compute and evaluate trees in parallel. This can be easily changed since each tree in the ensemble does not depend on other trees.

We also need a new parameter n_jobs (similar to the parameter with the same name in Scikit Learn) that will control level of parallelization.

OPTICS algorithm

Implement OPTICS clustering algorithm.
Make sure the algorithm's performance is on par with Scikit Learn's implementation.

fill out the Apache license

The LICENSE file doesn't have a year & copyright holder filled in: https://github.com/smartcorelib/smartcore/blob/development/LICENSE#L189

This can cause an issue with a legal review of dependencies, because it can be interpreted to mean that the package wasn't "really" put under that license.

Thanks!

Implement a new method that predicts probabilities, where it makes sense

Classification algorithms usually offer a way to quantify certainty of a prediction. In Scikit learn a method that returns probability estimates for all classes is called predict_proba.

We need a similar method in SmartCore. One way to do it is to define a new trait Classifier, that will have a function predict_proba, and implement this trait for every algorithm where predicts probabilities makes sense.

Document features in Cargo.toml

We should add a description to each feature defined in smartcore in the Cargo.toml file, we could use as example the Cargo.toml file of the serde project.

We should add also the documentation for the "serde" feature that currently is not present in the Cargo.toml file.

Not sure if we can use this.

Feature request: time series functionality

Hello, I hope this finds you well.

Please could you implement machine learning time series functionality within SmartCore? I would like to work with machine learning-based time series forecasting, classification, regression, etc in Rust. I just thought it would be useful and interesting to me for a Rust machine learning library to implement the type of time series functionality in existing Python libraries like sktime, Prophet, and Tensorflow, and that it may be the case for others as well.

There are obviously additional considerations that would take time to address and implement like making sure data are stationary and deseasonised, windowing and framing, etc. It may be useful or necessary to implement models that perform well with time series data, such as LSTM neural networks, classical models, like ARIMA, to use as a baseline against which to compare machine learning models, additional relevant evaluation metrics, etc.

Thank you for your time and consideration.

Implement `BaseMatrix` for `ndarray::ArrayView2`

Why is this limited to owned arrays here? I'd rather not need to make a copy each time I train a model.

smartcore/src/linalg/ndarray_bindings.rs

Lines 185 to 187 in 521dab4

 impl<T: RealNumber + ScalarOperand + AddAssign + SubAssign + MulAssign + DivAssign + Sum> 

 BaseMatrix<T> for ArrayBase<OwnedRepr<T>, Ix2> 

 {

Add new solver that supports L1 penalty to Logistic Regression

We need to implement a new solver that supports L1 penalty for Logistic Regression. A good candidate is saga solver. Another alternative would be to use a coordinate descent algorithm that is widely used in LIBLINEAR

Question: how to train models with multiple targets

Hi there!

Is it possible to train models with multiple outputs?

Regards,

Refactor linalg module

I want to use this issue to share a heads-up on a big refactoring that I plan for the linalg module.

During last couple of month I've seen on multiple occasions limitations and shortcomings imposed by the current design of the BaseVector and BaseMatrix. To mention a couple here:

It is not possible to define an instance of BaseMatrix that holds string, integer type values.
BaseMatrix is not designed to hold values that belong to multiple types
Some algorithms, e.g. RandomForest, does not use most methods defined in the BaseMatrix and BaseVector. Some preprocessing methods that we plan for future, like LabelEncoder will not need linear algebra routines defined for both classes.
Some basic operations, like get row or get column, perform unnecessary copy. This problem stems from the fact that both structs do not provide views or iterators that lets developer access an internal structure of the data.
All operations are defined as functions. While this is not a big deal it leads to a clumsy looking code. Instead it would be nice to use more traits defined in std::ops

As a result, I'd like to see how can we use Rust's type system to design a better container for data that solves all these shortcomings.

I am open to any suggestions you have. Feel free to post your ideas here.

Support for rust_decimal

I want to deploy smartcore without using floating-point numbers. I'll try to change the crate math::num::RealNumber to support rust_decimal.

Do you think it's possible to have something like impl RealNumber for Decimal in the trait in RealNumber or would a completely new trait be necessary?

Implement SGDClassifier

SGDClassifier is one of the few algorithms that can be used for incremental learning and it would be a great addition to SmartCore.

This is an open-ended problem. I do not have many requirements to specific optimization method and API other than new algorithm should implement SupervisedEstimator and Predictor interfaces

SVM

Implement Support vector machine (SVM) classifier and regressor.

The theory behind SVM is described in the book "An Introduction to Statistical Learning" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.

This paper (+ references ) describes one approach to implementing Sequential minimal optimization (SMO) algorithm that can be used to speedup SVR training, while library (and paper) LaSVM describes fast implementation of another algorithm for SVC.

SmartCore should support at least these 3 kernels:

linear
polynomial
rbf

Implement a generic read_csv method

In many cases data analysis starts from loading dataset into memory. Some datasets comes as a CSV file. We need a new default function read_csv that is defined on the BaseMatrix trait.

This story is not fully defined and a lot of details should be discussed prior to working on implementation. For example, I am not sure what parameters (if any) his function should take. Some ideas can be borrowed from the similar function in Pandas

Add features to KFold cross-validation

Folllowing sklearn interface:

other KFold implementations: 'GroupKFold', 'LeaveOneGroupOut', 'LeaveOneOut', 'LeavePGroupsOut', 'LeavePOut', etc.
consider usage of derive-builder for defaults
...

No implementation of Display for Dataset

Hi!

First of all: Awesome project!

I found myself wanting to look at a dataset, and implemented this:

fn display_dataset<X: Copy + std::fmt::Debug, Y: Copy + std::fmt::Debug>(dataset: &Dataset<X, Y>) {
    struct Target<Y> {
        name: String,
        value: Y
    }
    struct Feature<X> {
        name: String,
        value: X
    }
    struct DataPoint<X, Y> {
        labels: Vec<Target<Y>>,
        features: Vec<Feature<X>>
    }
    impl <X: Copy + std::fmt::Debug, Y: Copy + std::fmt::Debug>std::fmt::Display for DataPoint<X, Y> {
        fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
            // Write strictly the first element into the supplied output
            // stream: `f`. Returns `fmt::Result` which indicates whether the
            // operation succeeded or failed. Note that `write!` uses syntax which
            // is very similar to `println!`.
            write!(
                f, "{} : {}",
                self.labels.iter().map(|target| format!("{}:{:?}", target.name, target.value)).collect::<String>(),
                self.features.iter().map(|feature| format!("{}:{:?}", feature.name, feature.value)).collect::<String>()
            )
        }
    }
    println!("{}", dataset.description);
    let mut datapoints = Vec::new();
    for sample_index in 0..dataset.num_samples {
        let mut features = Vec::new();
        for feature_index in 0..dataset.feature_names.len() {
            features.push(Feature{
                name: dataset.feature_names[feature_index].to_owned(),
                value: dataset.data[sample_index*dataset.num_features+feature_index]
            });
        }
        let mut targets = Vec::new();
        for target_index in 0..dataset.target_names.len() {
            targets.push(Target{
                name: dataset.target_names[target_index].to_owned(),
                value: dataset.target[sample_index*dataset.target_names.len()+target_index]
            });
        }
        datapoints.push(DataPoint {
            labels: targets,
            features
        })
    }
    for point in datapoints {
        println!("{}", point);
    }
}

Any appetite for a souped-up version of this in a PR?

Simple k-fold cross validation

K-fold cross validation (CV) is a preferred way to evaluate performance of a statistical model. CV is better than just splitting dataset into training/test sets because we use as many data samples for validation as we can get from a single dataset, thus improving estimate of out-of-the-box error.

SmartCore does not has a method for CV and this is a shame, because any good ML framework must have it.

I think we could start from a simple replica of the Scikit's sklearn.model_selection.KFold. Later on we can add replica of StratifiedKFold.

If you are not familiar with CV I would start from reading about it here and here. Next I would look at Scikit's implementation and design a function or a class that does the same for SmartCore.

We do not have to reproduce class KFold exactly, one way to do it is to write an iterator that spits out K pairs of (train, test) sets. Also, it might be helpful to see how train/test split is implemented in SmartCore

Add new `make_moons` data generator

make_moons data generator will be useful to compare different clustering algorithms. Implementation details and parameters can be found in Scikit Learn. We need an exact copy of this function

Add L1 and Elastic Net regularization terms to the Logistic Regression

We need additional parameters to support L2 and Elastic Net (optional) regularization of Logistic Regression parameters

mod.rs deserialize_data uses architecture specific usize which errors on 32bit linux/windows

pub(crate) fn deserialize_data(
    bytes: &[u8],
) -> Result<(Vec<f32>, Vec<f32>, usize, usize), io::Error> {
    // read the same file back into a Vec of bytes
    let (num_samples, num_features) = {
        let mut buffer = [0u8; 8];
        buffer.copy_from_slice(&bytes[0..8]);
        let num_features = usize::from_le_bytes(buffer); // This line does not compile on 32bit systems. Change to u64
        buffer.copy_from_slice(&bytes[8..16]);
        let num_samples = usize::from_le_bytes(buffer);
        (num_samples, num_features)
    };
  ...
}

Add multiclass classification strategies

We should have at least these 2 meta-classifiers:

one-vs-the-rest / one-vs-all
one-vs-one

For more information on both strategies look at this page https://scikit-learn.org/stable/modules/multiclass.html#multiclass-classification

Request to add further summary stats, like Python OLS to regression

Adding f_stat(), adjusted_r2(), and t() would be nice to complete the set of traits for regression analysis

Thoughts on moving the linalg and math abstractions into a standalone crate?

Hello Smartcore team,

I'm pretty new to the rust language, and have been working on a personal machine learning project to try to learn the language a bit better. In the process of looking for good code examples, I came upon this library, and am impressed with the organization and level of abstraction present in some modules. Particularly, I keep thinking about how useful it would be to develop my personal project against the interface provided by the n-dimensional array/vector/real number abstractions present in the linalg and math modules of this crate.

What are your thoughts on making those modules, linalg and math, part of a standalone crate that Smartcore then depends on? I feel like as those abstractions continue to be refined (e.g. #108), they could become an invaluable part of the ML/AI ecosystem in rust.

If this isn't something you're all interested in, no worries! Just a thought and figured I'd put it out there.

Cheers, from an aspiring rust developer!
-Sean

Create changelog

With a changelog tools like dependabot can report API and dependency changes. Also we could create an UPDATE.md were we can provide guidance on how to use new features and migrate away from old ones.

We can use this for the previous releases and add an Unreleased section with the changes that are on top of the latest release.

I propose to use the keepachangelog format.

Test silently raise error

i don't know if this is expected but when running RUST_BACKTRACE=1 cargo test -- --nocapture, one of the test output this error even if the suite is successful:

test svm::tests::linear_kernel ... ok
test model_selection::tests::run_kfold_return_test_mask_simple ... ok
test svm::tests::rbf_kernel ... ok
test svm::svc::tests::svc_fit_predict_rbf ... ok
test optimization::first_order::gradient_descent::tests::gradient_descent ... ok
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.44/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1069
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1427
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:511
  11: std::panicking::begin_panic
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libstd/panicking.rs:438
  12: <smartcore::math::distance::minkowski::Minkowski as smartcore::math::distance::Distance<alloc::vec::Vec<T>,T>>::distance
             at src/math/distance/minkowski.rs:43
  13: smartcore::math::distance::minkowski::tests::minkowski_distance_negative_p
             at src/math/distance/minkowski.rs:82
  14: smartcore::math::distance::minkowski::tests::minkowski_distance_negative_p::{{closure}}
             at src/math/distance/minkowski.rs:76
  15: core::ops::function::FnOnce::call_once
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libcore/ops/function.rs:232
  16: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/liballoc/boxed.rs:1017
  17: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libstd/panic.rs:318
  18: std::panicking::try::do_call
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libstd/panicking.rs:331
  19: std::panicking::try
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libstd/panicking.rs:274
  20: std::panic::catch_unwind
             at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/libstd/panic.rs:394
  21: test::run_test_in_process
             at src/libtest/lib.rs:542
  22: test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:451

Implement StandardScaler

StandardScaler standardizes features by removing the mean and scaling to unit variance. Implementation details and parameters can be found in Scikit Learn.

The algorithm should be implemented as a struct that extends Transformer. The struct should belong to a new module preprocessing

What is planned for v0.2.0?

This is a list of algorithms that are planned for v0.2.0. Feel free to let me know if If there is any particular algorithm that you would like to work on/include in the upcoming release.

Clustering
- DBScan
- Hierarchical Clustering(@Mec-iS )
Dimensionality reduction
- SVD
Supervised learning
- SVM

On top of new algorithms we plan these improvements:

k-fold cross-validation(@VolodymyrOrlov )
new predict_proba method that predicts probabilities, where it makes sense
Ridge, Lasso, ElasticNet for Logistic and Linear regression

Optional Features (If we have time and spare hands)

Integration with Polars
Method that prints model summary, similar to [R's summary function]
(https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/summary)
Naive Bayes @morenol

Next release is planned for the end of this year. Help needed :)

Naive Bayes (NB) Classifier

Implement Base NB classifier that doesn't make any assumptions about the underlying distribution of x.

https://scikit-learn.org/stable/modules/naive_bayes.html

We need something like this (pseudocode):

trait NBDistribution:
    
    // Fit distribution to some continuous or discrete data
    def fit(x: Matrix<T>) -> NBDistribution
    
    // prior of class k 
    def prior(k) -> T

    // conditional probability of feature j give class k
    def conditional_probability(k, j)-> T

class BaseNaiveBayes:
    
    // "Fits" NB. This method validates and remembers parameters
    def fit(distribution: NBDistribution)
    
    // Calculates likelihood of labels using stored probabilities and X. Returns vector with estimated labels
    def predict(x: Matrix<T>) -> Vector<T>

Once we have BaseNaiveBayes we can implement Gaussian Naive Bayes, Multinomial Naive Bayes and Bernoulli Naive Bayes as concrete implementations of trait NBDistribution

Add getters to the structs in order to check internal state

Add getters to the structs implemented in smartcore. Please make sure that we only add getters in the places where there is a corresponding getter in Scikit Learn.

Add cargo clippy checks

Fix cargo clippy warnings and add to the CI something that checks for that.

There is a cargo clippy --fix command that can solve some of the warnings, that could be used at the beginning.

The instructions to run cargo clippy are:

rustup component add clippy # to install it
cargo clippy

incremental learning

Hey, thanks for sharing this project!
Is incremental learning supported by smartcore; something like partial_fit from sk-learn?

Implement OneHotEncoder

Implement OneHotEncoder, make logic similar to Scikit Learn's

The new encoder should belong to new module preprocessing and produce results on par with Scikit Learn

Add l2 regularization penalty to the Logistic Regression

We need additional parameters to support L2 regularization of Logistic Regression parameters

question in smartcore/src/svm/svr.rs?

in smartcore/src/svm/svr.rs
line 311: gmin: T::max_value(),
line 312: gmax: T::min_value(),

it looks like the gmin and gmax values are reversed.

it should be like this:
line 311: gmin: T::min_value(),
line 312: gmax: T::max_value(),?

Allow setting seed for `RandomForestClassifier` and `Regressor`

To make them reproducible. This would include passing a RNG to RandomForestClassifier::<T>::sample_with_replacement:

smartcore/src/ensemble/random_forest_classifier.rs

Lines 223 to 224 in 521dab4

 for _ in 0..parameters.n_trees { 

 let samples = RandomForestClassifier::<T>::sample_with_replacement(&yi, k);

to be used instead of rng::thread_rng():

smartcore/src/ensemble/random_forest_classifier.rs

Lines 307 to 308 in 521dab4

 fn sample_with_replacement(y: &[usize], num_classes: usize) -> Vec<usize> { 

 let mut rng = rand::thread_rng();

to subsample rows for each tree.

The same RNG would also need to be passed to DecisionTreeClassifier::fit_weak_learner:

smartcore/src/ensemble/random_forest_classifier.rs

Line 235 in 521dab4

 let tree = DecisionTreeClassifier::fit_weak_learner(x, y, samples, mtry, params)?; 

and then to

smartcore/src/tree/decision_tree_classifier.rs

Line 387 in 521dab4

if tree.find_best_cutoff(&mut visitor, mtry) {

to be used in this shuffle:

smartcore/src/tree/decision_tree_classifier.rs

Lines 484 to 486 in 521dab4

 if mtry < n_attr { 

 variables.shuffle(&mut rand::thread_rng()); 

 }

Same for the RandomForestRegressor.

I can set something up and open a PR.

Add code coverage to CI

See https://github.com/xd009642/tarpaulin

Release schedule

Is there any release planned for smartcore? I have a crate changeforest that depends on the latest commits implementing seeded & oob random forests and would like to publish this to crates.io. A release of smartcore=0.3.0 would help a lot.

Ridge Regression

Implement Ridge Regression that is similar in functionality to Scikit's Ridge.

We are looking to support following parameters:

alpha
solver

The implementation should support at least Cholesky decomposition for solver

https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Ridge_Regression.pdf
https://www.cs.ubc.ca/~schmidtm/Courses/540-F14/leastSquares.pdf

GPU backend with Emu

Hi,

This looks neat! I was curious about how hard it would be to implement a GPU-accelerated backend with Emu. Would it amount to implementing the API in linalg?

LASSO regression

Implement LASSO Regression that is similar in functionality to Scikit's Lasso.

We are looking to support following parameters:

alpha
normalize
tol
maxIter

This paper describes an optimization method that is comparable to coordinate descent in solving large problems with modest accuracy, but is able to solve them with high accuracy with relatively small additional computational cost.

This method comes with a code that can be found here

Add OOB predictions to random forests

I am using in-sample out-of-bag (OOB) predictions to estimate the KL-divergence between samples. In general, OOB predictions are an efficient alternative to CV to estimate out of sample prediction performance and can be used for tuning.

Getting OOB predictions requires storing the samples used to build each tree (i.e. samples here). This could be made optional. We can then add up predictions for samples only that were OOB for a particular tree, keeping track of the number of trees for which a particular sample was OOB.

I could work on a PR, but might need some help with details and guidance on what you think the API should be.

Rust machine learning group

I can't find your email address, so I'm opening an issue here. There is a (not yet official) machine learning group for Rust. At the moment we are trying to implement the most popular algorithms, and find a common interface for the learning process. Would be awesome if you say hello and share your project here https://rust-ml.zulipchat.com/

Make SerDe optional

SmartCore depends on serde and serde_derive. Let's put these libraries behind feature flag serde (name is not important)

missing implementations to serialize models

I'm trying to serialize a simple LinearRegression<f64, DenseMatrix<f64>> model, which fails in current development state,

the trait `serde::ser::Serialize` is not implemented for `smartcore::linear::linear_regression::LinearRegression<f64, smartcore::linalg::naive::dense_matrix::DenseMatrix<f64>>

This also applies to f32 and occurs for serializing and deserializing. It works on public version 0.2.0.

let model_binary = bincode::serialize(&model).expect("Can not serialize the model");
...

...
bincode::deserialize(&buf).expect("Can not deserialize the model");
...

info

[dependencies]
wasm-bindgen = "=0.2.63"
smartcore = {git = "https://github.com/smartcorelib/smartcore", branch="development"}
serde = "1.0.125"
bincode = "1.3.3"
ssvm-wasi-helper = "=0.1.0"

[Ethics] interpretability, accessibility, integration

I write here some general concepts as a references or notes for bits that could be helpful to the community.
I suggest reading this paper for an overview of current and possible scenarios for ML applications.

Are interpretability principles something that should be embedded in software libraries or workflows? If yes, how can an API avoid software developers to take shortcuts to black-boxes and nudge to tackle interpretability-by-design when writing code? Is this desirable?

A missing point in the paper imo is the lack of addressing data quality and dataset growth. There should be an "evolutionary" perspective on the model performance as the dataset grows in time: i.e. which characteristics newly added data should have to improve global performance and how to monitor (data-centric MLOps approach)

Implement OrdinalEncoder

Implement OrdinalEncoder, make logic similar to Scikit Learn's

The new encoder should belong to new module preprocessing and produce results on par with Scikit Learn

datasets - deserialize_data mismatched types error

Hi, first of all thanks for your amazing works on bringing ML to Rust!

While compiling a very simple function to train and predict a linear regression model, I encountered an error from the datasets module:

#[wasm_bindgen]
pub fn basic_prediction() -> f64 {
    let x = DenseMatrix::from_2d_array(&[
        &[234.289, 235.6, 159.0, 107.608, 1947., 60.323],
        &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
        &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
        &[284.599, 335.1, 165.0, 110.929, 1950., 61.187],
        &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
        &[346.999, 193.2, 359.4, 113.270, 1952., 63.639],
        &[365.385, 187.0, 354.7, 115.094, 1953., 64.989],
        &[363.112, 357.8, 335.0, 116.219, 1954., 63.761],
        &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
        &[419.180, 282.2, 285.7, 118.734, 1956., 67.857],
        &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
        &[444.546, 468.1, 263.7, 121.950, 1958., 66.513],
        &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
        &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
        &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
        &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
    ]);

    let y: Vec<f64> = vec![
        83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2,
        115.7, 116.9,
    ];
    let (x_train, x_test, y_train, y_test) = train_test_split(&x, &y, 0.2, true);
    let y_hat_lr = LinearRegression::fit(&x_train, &y_train, Default::default())
    .and_then(|lr| lr.predict(&x_test)).unwrap();
    let mse = mean_squared_error(&y_test, &y_hat_lr);

    return mse;
}

direnc@direnc-VirtualBox:~/workspace/nodejs-rust$ wasm-pack build --target nodejs
[INFO]: Checking for the Wasm target...
[INFO]: Compiling to Wasm...
   Compiling smartcore v0.2.0
error[E0308]: mismatched types
  --> /home/direnc/.cargo/registry/src/github.com-1ecc6299db9ec823/smartcore-0.2.0/src/dataset/mod.rs:88:49
   |
88 |         let num_features = usize::from_le_bytes(buffer);
   |                                                 ^^^^^^ expected an array with a fixed size of 4 elements, found one with 8 elements

error[E0308]: mismatched types
  --> /home/direnc/.cargo/registry/src/github.com-1ecc6299db9ec823/smartcore-0.2.0/src/dataset/mod.rs:90:48
   |
90 |         let num_samples = usize::from_le_bytes(buffer);
   |                                                ^^^^^^ expected an array with a fixed size of 4 elements, found one with 8 elements

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0308`.
error: could not compile `smartcore`

To learn more, run the command again with --verbose.
Error: Compiling your crate to WebAssembly failed
Caused by: failed to execute `cargo build`: exited with exit code: 101
  full command: "cargo" "build" "--lib" "--release" "--target" "wasm32-unknown-unknown"

The error occurs in mod.rs inside the deserialize_data function. When changing the buffers from 8 to 4 as shown below, the code builds, but throws a RuntimeError somewhere.

...
 let mut buffer = [0u8; 4];
 buffer.copy_from_slice(&bytes[0..4]);
...

const nodejsrust = require('nodejs-rust')
console.log(nodejsrust.basic_prediction())

wasm://wasm/0003e286:1
RuntimeError: unreachable
    at <anonymous>:wasm-function[56]:0xb419
    at <anonymous>:wasm-function[73]:0xbf1e
    at <anonymous>:wasm-function[128]:0xd073
    at <anonymous>:wasm-function[117]:0xce92
    at <anonymous>:wasm-function[124]:0xcfd6
    at <anonymous>:wasm-function[101]:0xca19
    at <anonymous>:wasm-function[20]:0x89ea
    at <anonymous>:wasm-function[9]:0x68ac
    at <anonymous>:wasm-function[15]:0x7cad
    at basic_prediction (<anonymous>:wasm-function[189]:0xd537)

EDIT1: It was supposed to be fixed with #88 , but when applying the changes locally, the RuntimeError still occurs.
EDIT2:
It seems to be an issue with the wasm-bindings... I tried disabling the dataset feature by specifying smartcore = {version = "0.2.0", default-features = false} and the function runs via main, but still panics the RuntimeError..

Additional Info

I'm compiling to WASM with wasm-pack for nodejs.

[dependencies]
wasm-bindgen = "0.2.63"
smartcore = "0.2.0"

Ubuntu 20.04.2 LTS
cargo 1.51.0 (43b129a20 2021-03-16)
wasm-pack 0.9.1

empty initialized model

I want to have a static variable which holds a model, but I want it to be lazy-loaded. This way, I only have to initialize it once and reuse it during runtime. The final model has been serialized in a file from which it can be loaded.

I'm thinking of something like this:

static mut model: LinearRegression<f64, DenseMatrix<f64>> = None;

fn init(model_path: str) {
    unsafe {
        model = {
            let mut buf: Vec<u8> = Vec::new();
            File::open(&model_path)
                .and_then(|mut f| f.read_to_end(&mut buf))
                .expect("Can not load model");
            bincode::deserialize(&buf).expect("Can not deserialize the model")
        }
    }
}

But it is not possible to initialize a LinearRegression<f64, DenseMatrix<f64>> with None. Is there another easy way to initialize a "default" or "empty" model? I thought of a constructor-like API:

LinearRegression::new()

which constructs an empty model without any parameters.

Additional info

I think in python's scikit-learn this can also be achieved with sklearn.linear_model.LinearRegression().

I tried an alternative by wrapping the model type with an Option<> like this:

static mut model: Option<LinearRegression<f64, DenseMatrix<f64>>> = None;

but this brings in other challenges and feels a little hacky.

Thanks in Advance!

	impl<T: RealNumber + ScalarOperand + AddAssign + SubAssign + MulAssign + DivAssign + Sum>
	BaseMatrix<T> for ArrayBase<OwnedRepr<T>, Ix2>
	{

	for _ in 0..parameters.n_trees {
	let samples = RandomForestClassifier::<T>::sample_with_replacement(&yi, k);

	fn sample_with_replacement(y: &[usize], num_classes: usize) -> Vec<usize> {
	let mut rng = rand::thread_rng();

	if mtry < n_attr {
	variables.shuffle(&mut rand::thread_rng());
	}

smartcorelib / smartcore Goto Github PK

smartcore's People

Stargazers

Watchers

Forkers

smartcore's Issues

info

Additional Info

Additional info

Recommend Projects

Recommend Topics

Recommend Org