stack level too deep

I am using with a 250MB csv file as training data, and I am getting

rumale/tree/gradient_tree_regressor.rb:117:in `apply': stack level too deep (SystemStackError)

during the fit

Bundler 2.2.11
Platforms ruby, x86_64-linux

I tried to ulimit -s unlimited,

> ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127649
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

but still no luck.


How to use your own datasets (labels & samples) from Ruby Array ?

Samples and Labels are not Ruby Array but NArray.

  • samples are Numo::DFloat object.
  • labels are Numo::Int32 object.

You can create NArray from Ruby Array.

samples = [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]]
samples = Numo::DFloat.cast(samples)
# samples = Numo::DFloat[*samples]

labels = [0,0,1]
labels = Numo::Int32.cast(labels)
# labels = Numo::Int32.cast(labels)


Add option to store training summary

I've noticed that people complain about scikit-learn lacking the option to generate summary. Is this feature already present in rumale? If not, can we look into that? I would love to contribute, but would require some starters.

Online versions of algorithms

first, thank you so much for providing this gem. It is awesome to have such an easy API to work with in Ruby.

I noticed that there are no incremental learning/online versions of the algorithms. For example, the SGDClassifier in sklearn, does support partial_fit. Are you planning on implementing something like this?

feature request: SNN clustering

I'd like to add a feature request for SNN Clustering (Shared Nearest Neighbors) and possibly also HDBSCAN.

The advantage of SNN over DBSCAN is to be able to identify clusters with different densities, and also it does not need to be provided a fixed number of clusters as k-means does. Also its way of identifying similarity between items has advantages over euclidian distance when working with a higher number of dimensions.

HDBSCAN is an interesting improvement over DBSCAN, as it only requires one hyperparameter, and reports a probability of assignment to a cluster (which can be used to optimize the minPts hyperparameter). Towards Data Science: How to cluster in High Dimensions has an interesting overview including possible advantages of HDBSCAN and SNN (in the variant SNN-cliq).

Both algorithms have no Ruby implementation yet, as far to my knowledge.

PS: thanks for all your work on Rumale so far, it's greatly appreciated.



require 'rumale'
require 'numo/narray'
require 'numo/gnuplot'
require 'numo/linalg/autoloader'

include Numo

x = DFloat[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]].transpose
y = DFloat[[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]].transpose

e = true), y)

y_ = e.predict(x)

Numo.gnuplot do
  set term: :svg
  set out: 'out.svg'
  set key: :bottom
  plot [x, y, pt: 6, t: 'Data'],
       [x, y_, w: :lines, lw: 3, t: 'Regression']



e = true, solver: 'svd')



Add a social media preview


I'm following your work on Rumale and would suggest, you'd better add the Logo to the social media preview which is newly available for project repositories. This ensures that links to Rumale's repository show the logo and not your avatar on Github.

To do so please go to "Settings->Social media preview" in this repository.

Thank you again for your work on this topic, it's very useful!

Cross validation: best score

At present, cross validation allows for use of only the best MSE to evaluate a model. It may be good to:

These could also be presented as examples for people to adapt, rather than as additions to the core library since they can then be adapted to specific situations.

OneHotEncoder blows up for large values

Here's a quick demonstration, adapted from the spec:

  it 'does not murder us' do
    x = Numo::Int32[[0, 0, 5999999], [1, 1, 0], [0, 2, 1], [1, 0, 2]]
    y = Numo::Int32[[0, 1, 1]]
    expect( eq(Numo::Int32[[1, 0, 0, 1, 0, 0, 1, 0, 0]])

This return a vector of length 60_000_000 or so. My system actually shows 1.5 TB(!) of RAM being consumed, although that is luckily almost completely virtual. It is, however, impossible to work with such values.

I was expecting the transformation to interpret the given values as catgorical, as it is mentioned in the docs and is the usual practice as far as I can tell.

Required Ruby Version is unclear

I tried installing with Ruby 2.3, but I ran into this error while building rumale-tree-0.27:

extconf.rb:26:in <main>': undefined method match?' for "aarch64-linux":String (NoMethodError)

That suggests Ruby 2.4 is the minimum version. doesn't provide much information as well. Can you help clarify which version of Ruby you are targeting?

Thank you!

When loading dataset from libsvm file, determine size of line

In libsvm MNIST file there are skipped zero values, and it makes size of one line not correct.
My suggestion here is to add option to determine size of libsvm line.

2.7.1 :004 > x, y = Rumale::Dataset.load_libsvm_file("mnist", zero_based: true)

2.7.1 :005 > 
2.7.1 :006 > y
[5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9, 1, 1, ...] 
2.7.1 :007 > x

My desired size of x is [60000,784].

If any knows how can I do it, I'd be happy to hear that.


Hey @yoshoku, this library is great! Ruby could really use a comprehensive ML library.

It looks like the source code is well-commented, but I can't find online documentation for it. I think it would really help users if there was documentation similar to Scikit-learn (linear regression example). I had no idea how much you could with it until diving into the source code.

FP-Gowth algorithm

Can I try add the algorithm of the association task, FP-Growth, in this repo?

Method similar to train_test_split of sklearn

@yoshoku - Thank you for this awesome gem.

I tried to convert an example simple linear regression example from Python to Ruby. Here is my Python example and Ruby example.

In sklearn, train_test_split seems like an easy function to convert dataset into training and test sets.

# Import Dataset
dataset = pd.read_csv('salary_data.csv')
X = dataset.iloc[:, :-1].values # Take all rows and columns except last one
y = dataset.iloc[:, 1].values # Take all rows of column with index 1

# Split dataset into Training and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

However, I couldn't find something similar in rumale. So I implemented like this:

# Import Dataset
df = Daru::DataFrame.from_csv('salary_data.csv')

# Convert dataset to Numo::NArray dataset
data = Numo::DFloat.cast df['YearsExperience', 'Salary'].to_a[0].map { |data| data.values }

# Split dataset into Training and Test set
x_train, y_train, x_test, y_test = nil 0.3, n_splits: 1, random_seed: 1).split(data).each do |train, test|
  x_train, y_train = data[train, true][true, 0..-2], data[train, true][true, 1..-1]
  x_test, y_test = data[test, true][true, 0..-2], data[test, true][true, 1..-1]

If I am not missing an easy way provided within rumale to split dataset, I think it would be great to have a method similar to train_test_split in rumale.

Feature request : t-sne

New algorithm request
I want t-sne rather than neural networks.

  • I want to visualize the open-access cancer mRNA data from the TCGA (The Cancer Genome Atlas).

Unable to install gem under rvm on jruby-

gem install rumale got error

Building native extensions. This could take a while...
ERROR:  Error installing rumale:
        ERROR: Failed to build gem native extension.

    current directory: /Users/kietdv/.rvm/gems/jruby-
/Users/kietdv/.rvm/rubies/jruby- -r ./siteconf20200222-38612-w78zw1.rb extconf.rb
checking for stdbool.h... RuntimeError: The compiler failed to generate an executable file.
You have to install development tools first.

                 try_do at /Users/kietdv/.rvm/rubies/jruby-
                try_cpp at /Users/kietdv/.rvm/rubies/jruby-
   block in have_header at /Users/kietdv/.rvm/rubies/jruby-
  block in checking_for at /Users/kietdv/.rvm/rubies/jruby-
      block in postpone at /Users/kietdv/.rvm/rubies/jruby-
                   open at /Users/kietdv/.rvm/rubies/jruby-
      block in postpone at /Users/kietdv/.rvm/rubies/jruby-
                   open at /Users/kietdv/.rvm/rubies/jruby-
               postpone at /Users/kietdv/.rvm/rubies/jruby-
           checking_for at /Users/kietdv/.rvm/rubies/jruby-
            have_header at /Users/kietdv/.rvm/rubies/jruby-
                 <main> at extconf.rb:60
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:

To see why this extension failed to compile, please check the mkmf.log which can be found here:


extconf failed, exit code 1

Gem files will remain installed in /Users/kietdv/.rvm/gems/jruby- for inspection.
Results logged to /Users/kietdv/.rvm/gems/jruby-

fit_bias: false is the default parameter for LinearRegression. Too difficult to find out ?


I have a question and a suggestion.
Do you want to change the default value of fit_bias to true?

In the Rumale::LinearModel::LinearRegression class, the default value of fit_bias is false.

In my understanding, this means that the regression line will always go through (0, 0) unless you set fit_bias: true.

This is difficult for starters, I think.

If you're new to Rumale, you don't understand why the regression line doesn't work. They can't tell if Rumale is a buggy or missing an option until they search the document. Some people may quit before looking for documentation.

fit_intercept=True is the default value in sklearn.




  1. ラベル予測の実行速度を上げたい。
  2. 機械学習の実行速度を上げたい。



├── data
│   └── sample.txt (サンプルデータ)
├── model
│   └── sample_model.dat (学習後に生成される学習モデル)
├── sample_gen_model.rb (学習モデル作成用プログラム)
└── sample_predict.rb (ラベル予測用のプログラム)



0: スポーツ、1: 天気、2: サイエンス


  # 予測したいテキスト
  text = "Sports physical activities with competitive or recreational elements"





# main
if __FILE__ == $0

  # 学習モデルを読み込み
  model = Marshal.load(File.binread("./model/sample_model.dat"))

  text = [
    "Sports physical activities with competitive or recreational elements",
    "Athletic endeavors for exercise, fun, or competition",
    "Sporting activities that promote physical fitness and skill",
    "Physical games or contests for entertainment purposes",
    "Weather atmospheric conditions at a specific location",
    "Climate patterns that affect daily conditions",
    "Temperature, precipitation, wind and other meteorological factors",
    "Science: systematic study of the natural world",
    "Observation, experimentation, and analysis of phenomena",
    "Evidence-based exploration of the physical universe"

  # 正規化(??)
  normalizer =
  new_samples = normalizer.fit_transform(get_predict_featurevector(text))

  # 予想
  puts model.predict(new_samples).to_a


$ ruby sample_predict.rb
time for label prediction: 0.268448s    // 文章1つの場合と同等の実行速度が欲しい


Compatibility Inquiry: rumale 0.28.1 with Ruby 3.3.0


I'm seeking confirmation regarding the compatibility of the rumale 0.28.1 with the recently released Ruby 3.3.0 (December 25, 2023).

While I reviewed the gem's documentation on [ruby: [ '3.0', '3.1', '3.2', '3.3' ]]. Does this mean it is compatible with the 3.3.0 ?

Given your expertise in this area, could you offer any insights or leads on this matter? Any information you can provide would be greatly appreciated.

Thank you for your time and assistance!

Typo in project description

Just a minor nitpick...

Rumale is a machine learninig library in Ruby should probably be learning instead of learninig.

Thanks for creating and sharing this project.

