Git Product home page Git Product logo

catboost / catboost Goto Github PK

View Code? Open in Web Editor NEW
7.7K 192.0 1.1K 1.62 GB

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Home Page: https://catboost.ai

License: Apache License 2.0

Python 33.80% Makefile 0.01% C++ 29.95% R 0.03% C 30.39% CSS 0.01% JavaScript 1.49% Assembly 3.32% Shell 0.01% Java 0.02% CMake 0.20% Cuda 0.73% TeX 0.01% Dockerfile 0.01% M4 0.01% Rust 0.01% Roff 0.02% C# 0.01% HTML 0.01% Ragel 0.01%
machine-learning decision-trees gradient-boosting gbm gbdt python r kaggle gpu-computing catboost

catboost's Introduction

Website | Documentation | Tutorials | Installation | Release Notes

GitHub license PyPI version Conda Version GitHub issues Telegram Twitter

CatBoost is a machine learning method based on gradient boosting over decision trees.

Main advantages of CatBoost:

Get Started and Documentation

All CatBoost documentation is available here.

Install CatBoost by following the guide for the

Next you may want to investigate:

If you cannot open documentation in your browser try adding yastatic.net and yastat.net to the list of allowed domains in your privacy badger.

Catboost models in production

If you want to evaluate Catboost model in your application read model api documentation.

Questions and bug reports

Help to Make CatBoost Better

  • Check out open problems and help wanted issues to see what can be improved, or open an issue if you want something.
  • Add your stories and experience to Awesome CatBoost.
  • To contribute to CatBoost you need to first read CLA text and add to your pull request, that you agree to the terms of the CLA. More information can be found in CONTRIBUTING.md
  • Instructions for contributors can be found here.

News

Latest news are published on twitter.

Reference Paper

Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, Aleksandr Vorobev "Fighting biases with dynamic boosting". arXiv:1706.09516, 2017.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017.

License

© YANDEX LLC, 2017-2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

catboost's People

Contributors

alexander-somov avatar andrey-khropov avatar arcadia-devtools avatar artpaul avatar borman avatar dbakshee avatar ek-ak avatar evgueni-petrov-aka-espetrov avatar exprmntr avatar frazenshtein avatar georgthegreat avatar halyavin avatar iaz1607 avatar kizill avatar lyzhinivan avatar mityada avatar nemo-cpt avatar nikitxskv avatar noxoomo avatar orivej avatar pg83 avatar robot-piglet avatar robot-yandex-devtools-repo avatar shadchin avatar slon avatar smertnik3sh avatar snermolaev avatar tatakir avatar vestnik avatar yandex-contrib-robot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catboost's Issues

Windows: Compiler issue while installing in R ; Errors in 'ymake_conf.py'

Windows 10 Home Single Language
R-version: 3.4.1 -- 'Single Candle'
Platform: x86_64-w64-mingw32/x64 (64-bit)
Python: 3.5.2
Visual Studio 2017 Community Edition

> devtools::install()
Installing catboost
"C:/PROGRA~1/R/R-34~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD  \
  INSTALL "C:/Users/Mrugank/catboost/catboost/R-package"  \
  --library="C:/Users/Mrugank/Documents/R/win-library/3.4" --install-tests 

* installing *source* package 'catboost' ...
** libs
  running 'src/Makefile.win' ...
/cygdrive/c/Users/Mrugank/catboost/catboost/R-package/src/../../../ya.bat make -r -o ../../..
2017-07-19T15:39:16.270000 [MainThread] Info: Attention! Using system user-defined compiler: cl.exe (check CC and CXX env vars).

Config was not generated due to errors in C:\Users\Mrugank\catboost\build\ymake_conf.py
ERROR:root:In toolchain theyknow, detected system cl compiler from VS version 15.0
Traceback (most recent call last):
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 2290, in <module>
    main()
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 2277, in main
    tc_params["platform"]["target"]["arch"])
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 757, in print_full_build_type
    tc = Toolchain(toolchain_name)
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 660, in __init__
    self.toolchain_by_cxx_compiler_file_name()
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 698, in toolchain_by_cxx_compiler_file_name
    cxx_info[3](self)
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 706, in detect_msvc_
    detect_msvc(self)
  File "C:\Users\Mrugank\catboost\build\ymake_conf.py", line 1590, in detect_msvc
    vc_key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r"SOFTWARE\Wow6432Node\Microsoft\VisualStudio\SxS\Vc7")
WindowsError: [Error 2] The system cannot find the file specified

make: *** [libcatboostr.dll] Error 1
Warning: running command 'make --no-print-directory -f "Makefile.win"' had status 2
ERROR: compilation failed for package 'catboost'
* removing 'C:/Users/Mrugank/Documents/R/win-library/3.4/catboost'
Error: Command failed (1)

If I build and then install, I get following error

> devtools::build()
"C:/PROGRA~1/R/R-34~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD  \
  build "C:\Users\Mrugank\catboost\catboost\R-package" --no-resave-data --no-manual 

* checking for file 'C:\Users\Mrugank\catboost\catboost\R-package/DESCRIPTION' ... OK
* preparing 'catboost':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'catboost_0.1.1.2.tar.gz'

[1] "C:/Users/Mrugank/catboost/catboost/catboost_0.1.1.2.tar.gz"
> setwd("C:/Users/Mrugank/catboost/catboost")
> install.packages("catboost_0.1.1.2.tar.gz")
Warning in install.packages :
  cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/src/contrib/PACKAGES.rds': HTTP status was '404 Not Found'
Installing package into ‘C:/Users/Mrugank/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘catboost_0.1.1.2.tar.gz’ is not available (for R version 3.4.1)

Wrong slash in .vcproj files

for example output:
1>$B/contrib/libs/coreml/ArrayFeatureExtractor.pb.cc
1>'" \catboost\catboost\msvs\Debug/contrib/tools/protoc/protoc.exe"' is not recognized as an internal or external command,
1>operable program or batch file.
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 9009.
1>Done building project "contrib-libs-coreml.vcxproj" -- FAILED.

contrib-libs-coreml.vcproj
...
"$(SolutionDir)$(Configuration)/contrib/tools/protoc/protoc.exe" "-I=./" "-I=$(SolutionDir)../" "-I=$(SolutionDir)$(Configuration)" "-I=$(SolutionDir)../contrib/libs/protobuf" "--cpp_out=$(SolutionDir)$(Configuration)/" "--cpp_styleguide_out=$(SolutionDir)$(Configuration)/" "--plugin=protoc-gen-cpp_styleguide=$(SolutionDir)$(Configuration)/contrib/tools/protoc/plugins/cpp_styleguide/cpp_styleguide.exe" "contrib/libs/coreml/ArrayFeatureExtractor.proto"
...

handling of unseen categorical values

How are categorical values handles which were previously unseen in training?
What if there were 100 features and 50 categorical with let's assume 20 unseen features - is it possible to create a worse but still usable prediction from the remaining features?

R package need Python?

install_github('catboost/catboost', subdir = 'catboost/R-package')
Downloading GitHub repo catboost/catboost@master
from URL https://api.github.com/repos/catboost/catboost/zipball/master
Installing catboost
"C://R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
"C:/
/AppData/Local/Temp/Rtmp8g9Ngq/devtools81cc29ef78b3/catboost-catboost-37378d2/catboost/R-package"
--library="C:/Users/kogayaa/Desktop/R-3.4.0/library" --install-tests

  • installing source package 'catboost' ...
    ** libs
    running 'src/Makefile.win' ...
    /cygdrive/c/***/AppData/Local/Temp/Rtmp8g9Ngq/devtools81cc29ef78b3/catboost-catboost-37378d2/catboost/R-package/src/../../../ya.bat make -r -o ../../..
    [ya.bat] Error: Python not found
    make: *** [libcatboostr.dll] Error 1
    Предупреждение: работающая команда 'make --no-print-directory -f "Makefile.win"' имеет статус 2
    ERROR: compilation failed for package 'catboost'
  • removing 'C:/Users/kogayaa/Desktop/R-3.4.0/library/catboost'
    Error: Command failed (1)

[R Ubuntu]: catboost.caret

System: Ubuntu 14.04 LTS
Processor: Intel® Core™ i3-6100 CPU @ 3.70GHz × 4
Graphics: Intel® HD Graphics 530 (Skylake GT2)
OS type: 64-bit
R: 3.4.1
Python: 3.4.3

Issue with using the catboost package through the caret package.
Simplified example from https://tech.yandex.com/catboost/doc/dg/concepts/r-usages-examples-docpage/

library(caret)
library(titanic)
library(catboost)

set.seed(12345)
data <- as.data.frame(as.matrix(titanic_train), stringsAsFactors = TRUE)

drop_columns = c("PassengerId", "Survived", "Name", "Ticket", "Cabin")
x <- data[, !(names(data) %in% drop_columns)]
y <- data[, c("Survived")]

report <- train(x, as.factor(make.names(y)), method = catboost.caret)

Output:

Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :12 NA's :12
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

Invalid MAE value

MAE value is half of the actual value.

E.g.

train_pool = Pool(x_train, y_train, cat_features=cat_indexes)
valid_pool = Pool(x_valid, y_valid, cat_features=cat_indexes)
test_pool = Pool(x_valid, cat_features=cat_indexes)

model = CatBoostRegressor(iterations=10, depth=6, learning_rate=0.03, loss_function='MAE',verbose=True)
model.fit(train_pool, eval_set=valid_pool, verbose=True)
y_pred = model.predict(test_pool)
sk_err = metrics.mean_absolute_error(y_valid, y_pred)
print("sklearn error:", sk_err)

Output is

bestTest = 0.0346194358
bestIteration = 9

sklearn error: 0.0692388715917

CatBoost version: 0.1.1.5
Python: 3.5.2
OS: OS X 10.12.5

Good work in command line, but not work in ipython

import numpy as np

from catboost import CatBoostClassifier

# initialize data

train_data = np.random.randint(0, 100, size=(100, 10))

train_label = np.random.randint(0, 2, size=(100))

test_data = np.random.randint(0, 100, size=(50, 10))

# specify the training parameters 

model = CatBoostClassifier(iterations=2, depth=2, learning_rate=1, loss_function='Logloss', verbose=True)

#train the model

model.fit(train_data, train_label, cat_features=[0,2,5], verbose=True)

# make the prediction using the resulting model

preds_class = model.predict(test_data)

preds_proba = model.predict_proba(test_data)

print("class = ", preds_class)

print("proba = ", preds_proba)

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e29c96272905> in <module>()
      1 import numpy as np
----> 2 from catboost import CatBoostClassifier
      3 # initialize data
      4 train_data = np.random.randint(0, 100, size=(100, 10))
      5 train_label = np.random.randint(0, 2, size=(100))

ImportError: cannot import name 'CatBoostClassifier'

I installed catboost from source like you described it in instruction
Python 3.4.2
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

When I type "python" and after that "tab", I see a several version
python
python python2.7-config python3.4m python3.5m python-html2text
python2 python2-config python3.4m-config python3.5m-config
python2.7 python3.4 python3.5 python-config
My os is Cent Os 7

pip installation?

Hi all,

would be nice to be able install catboost with pip.

Paddy

no errors for unknown arguments to catboostclassifier

If I call CatBoostClassifier(dfdscdsc=10) in python I get no indication that the argument is invalid. This is annoying e.g. if i misspell an argument it won't tell me, it just ignores it. It would be nice to get an error or some other indication that an unknown argument is present.

Add early stopping

At the moment you set an iterations number and you can save the best model (use_best_model parameter).

However there's no early stopping for the number of iterations. I should be able to stop training after a given number of iterations without a loss/score improvement on the validation set.

Please add early stopping.

catboost widget issues

I was getting following error while trying kaggle_titanic_catboost_demo.ipynb tutorial(cell id - 11)(CatBoostClassifier - Model Plot using Widget).

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

I am using Mac, Python 3.5.3 :: Continuum Analytics, Inc
Working Fix:

changed - line 81, catboost/widget/ipythonwidget.py
-            with open(meta_tsv, 'rb') as meta_in:
+            with open(meta_tsv, 'r') as meta_in:

changed - line 101, catboost/widget/ipythonwidget.py
-                with open(file_path, 'rb') as f:
+                with open(file_path, 'r') as f:

Please check and close this issue.

Warning when algorithm diverges

library(catboost)
# https://github.com/catboost/catboost/pull/56
catboost.caret$grid <- function (x, y, len = 5, search = "grid") {
  grid <- if (search == "grid") {
    expand.grid(
      depth =  c(2, 4, 6), learning_rate = exp(-(0:len)), 
      iterations = 100, l2_leaf_reg = 1e-06, rsm = 0.9, 
      border_count = 255)
  } else {
    data.frame(
      depth = sample.int(len, len, replace = TRUE), 
      learning_rate = runif(len, min = 1e-06, max = 1), 
      iterations = rep(100, len), 
      l2_leaf_reg = sample(c(0.1, 0.001, 1e-06), len, replace = TRUE), 
      rsm = sample(c(1, 0.9, 0.8, 0.7), len, replace = TRUE), 
      border_count = sample(c(255), len, replace = TRUE))
  }
  return(if (len == 1) grid[1,] else grid)
}

data(segmentationData, package = 'caret', envir = environment())
y_col <- 'Class'
X <- segmentationData[, !names(segmentationData) %in% c(y_col, 'Case')]

pool <- catboost.from_data_frame(X, as.double(segmentationData[[y_col]]) - 1)
for (i in 1:5) {
  # model <- catboost.train(
  #   pool, calc_importance = TRUE, params = list(
  #     loss_function = 'Logloss', train_dir = 'train_dir'))
  # stopifnot(100 - sum(model$var_imp) < 1)  # works fine
  
  model <- caret::train(method = catboost.caret,
    x = segmentationData[, !names(segmentationData) %in% c(y_col)],
    y = segmentationData[[y_col]], 
    trControl = trainControl(method = "none", allowParallel = F),
    train_dir = 'train_dir')
  stopifnot(100 - sum(model$finalModel$var_imp) < 1)  # is the model fitted?
  
  print(i)
}

Sparse matrix support

Currently Catboost does not seem to support sparse matrixes, while XGBoost does support them. Are there any plans to add sparse matrix support to Catboost?

Windows: Did anyone succeed in installing R package?

Error during installing from zip file:
install.packages("C:/Users/kogayaa/Downloads/catboost-master.zip", repos = NULL, type = "win.binary")
Warning in install.packages :
cannot open compressed file 'catboost-master/DESCRIPTION', probable reason 'No such file or directory'
Error in install.packages : cannot open the connection

install github error:

install_github('catboost/catboost', subdir = 'catboost/R-package')
Downloading GitHub repo catboost/catboost@master
from URL https://api.github.com/repos/catboost/catboost/zipball/master
Installing catboost
"C:/Users/kogayaa/Desktop/R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD
INSTALL
"C:/Users/kogayaa/AppData/Local/Temp/Rtmp8g9Ngq/devtools81cc45f52243/catboost-catboost-87e5a37/catboost/R-package"
--library="C:/Users/kogayaa/Desktop/R-3.4.0/library" --install-tests

  • installing source package 'catboost' ...
    ** libs
    running 'src/Makefile.win' ...
    /cygdrive/c/Users/kogayaa/AppData/Local/Temp/Rtmp8g9Ngq/devtools81cc45f52243/catboost-catboost-87e5a37/catboost/R-package/src/../../../ya.bat make -r -o ../../..
    [ya.bat] Error: Python not found
    make: *** [libcatboostr.dll] Error 1
    Предупреждение: работающая команда 'make --no-print-directory -f "Makefile.win"' имеет статус 2
    ERROR: compilation failed for package 'catboost'
  • removing 'C:/Users/kogayaa/Desktop/R-3.4.0/library/catboost'
    Error: Command failed (1)

Add Interaction strength to python package

CatBoostClassifier Example from https://tech.yandex.com/catboost/doc/dg/concepts/python-quickstart-docpage/

I was exploring the model attributes of this example and found these errors for following. Please check if its a bug.

model.feature_importance_

CatboostError: Invalid attribute feature_importance_: use calc_feature_importance=True in model params for use it

model.feature_importance

AttributeError: 'CatBoostClassifier' object has no attribute 'feature_importance'

Btw why we need two attributes for features importance?

Error saving the model

I am trying to save a trained model:
clf.save_model(fname='cb.model')

But I get an error message:

Traceback (most recent call last): File "/home/myhome/PycharmProjects/123/CatBoost/catboost_train.py", line 120, in <module> clf.save_model(fname='cb.model') File "/home/myhome/virtualenv/tensorflow/lib/python3.5/site-packages/catboost/core.py", line 638, in save_model self._save_model(fname, format, export_parameters) File "_catboost.pyx", line 768, in _catboost._CatBoostBase._save_model (/home/donskov/.ya/build/build_root/6d786f6630393533326272727668366a/catboost/python-package/catboost/_catboost.pyx.cpp:14763) File "_catboost.pyx", line 667, in _catboost._CatBoost._save_model (/home/donskov/.ya/build/build_root/6d786f6630393533326272727668366a/catboost/python-package/catboost/_catboost.pyx.cpp:11571) File "_catboost.pyx", line 669, in _catboost._CatBoost._save_model (/home/donskov/.ya/build/build_root/6d786f6630393533326272727668366a/catboost/python-package/catboost/_catboost.pyx.cpp:11458) File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_TString (/home/donskov/.ya/build/build_root/6d786f6630393533326272727668366a/catboost/python-package/catboost/_catboost.pyx.cpp:17179) TypeError: expected bytes, str found

Python 3.5, CatBoost 0.1.1.5

Fatal error on fit()

screenshot
import catboost - ok
create the regressor (or classifier) - ok
fit() - always error
Python 2.7 (64 bit), windows7

Problem in catboost.train: write failed

Sometimes during catboost.train I got following error:

Error in catboost.train(pool, NULL, param, calc_importance = TRUE) :
(Cannot create a file when that file already exists.) sman/appdata/local/temp/rtmpcleqvg/devtools2120232836e1/catboost-catboost-03459cd/util/stream/output.cpp:271: write failed

For instance, it happends for your example "Select hyperparameters" on https://tech.yandex.com/catboost/doc/dg/concepts/r-usages-examples-docpage/ in code

report <- train(x, as.factor(make.names(y)),
method = catboost.caret,
verbose = TRUE, preProc = NULL,
tuneGrid = grid, trControl = fit_control)

Error Installing R Package

Hi, I have some difficulties while installing the package to R on Windows. Here is the error message:

> devtools::install_github('catboost/catboost', subdir = 'catboost/R-package')
Downloading GitHub repo catboost/catboost@master
from URL https://api.github.com/repos/catboost/catboost/zipball/master
Installing catboost
"C:/PROGRA~1/MICROS~1/MRO-33~1.2/bin/x64/R" --no-site-file --no-environ  \
  --no-save --no-restore --quiet CMD INSTALL  \
  "C:/Users/Aris/AppData/Local/Temp/RtmpI1WyoO/devtools35866d040ee/catboost-catboost-7e4ba38/catboost/R-package"  \
  --library="C:/Users/Aris/Documents/R/win-library/3.3" --install-tests 

* installing *source* package 'catboost' ...
** libs
  running 'src/Makefile.win' ...
/cygdrive/c/Users/Aris/AppData/Local/Temp/RtmpI1WyoO/devtools35866d040ee/catboost-catboost-7e4ba38/catboost/R-package/src/../../../ya.bat make -r -o ../../..
2017-07-19T14:22:55.521000 [MainThread] Info: Attention! Using system user-defined compiler: cl.exe (check CC and CXX env vars).

Config was not generated due to errors in C:\Users\Aris\AppData\Local\Temp\RtmpI1WyoO\devtools35866d040ee\catboost-catboost-7e4ba38\build\ymake_conf.py
ERROR:root:Could not detect cxx compiler, error is Can not find Microsoft.VCToolsVersion.default.txt

make: *** [libcatboostr.dll] Error 1
Warning: running command 'make --no-print-directory -f "Makefile.win"' had status 2
ERROR: compilation failed for package 'catboost'
* removing 'C:/Users/Aris/Documents/R/win-library/3.3/catboost'
Error: Command failed (1)

Partial dependence / model dump

Good evening!
Is there any way to intuitively explore the model?

E.g. in scikit-learn GB package there is the function plot_partial_dependence which calculates bivariate relationship between one feature and a target.
Or in XGBoost there no such built-in function, but user can parse model dump and recreate partial dependence plot or any other useful statistic from it.

What is the best way to do this in CatBoost? Will partial dependence calculation be supported? Or any way to transform the model into traversable trees?

Problem in installing R package

> catboost.caret$fit(..., train_dir = NULL)

Error in catboost.train(pool, NULL, param, calc_importance = TRUE) : 
  /appdata/local/temp/rtmpmzktb8/devtools20103baf6fbb/catboost-catboost-bcf5caf/library/json/writer/json_value.cpp:491: Not a string 

Create #catboost tag on Stackoverflow

To create a new tag on Stackoverflow you must to have at least 1500 reputation.
Maybe you need to create first post on SOF and create the #catboost tag? With instructions how to start to contribute to the package for instance.

"TypeError: expected bytes, str found" on Python 3.5

Hi
I am trying to run my own script which is based (with minor modifications) on catboost/catboost/tutorials/quora_catboost_w2v.ipynb
I get the following error message:
Traceback (most recent call last): File "/home/myhome/PycharmProjects/123/CatBoost/test.py", line 219, in <module> clf.fit(train, y_train) File "/home/myhome/virtualenv/tensorflow/lib/python3.5/site-packages/catboost/core.py", line 929, in fit self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot) File "/home/myhome/virtualenv/tensorflow/lib/python3.5/site-packages/catboost/core.py", line 365, in _fit X = Pool(X, y, cat_features=cat_features, weight=sample_weight, baseline=baseline) File "/home/myhome/virtualenv/tensorflow/lib/python3.5/site-packages/catboost/core.py", line 94, in __init__ self._init(data, label, cat_features, weight, baseline, feature_names) File "/home/myhome/virtualenv/tensorflow/lib/python3.5/site-packages/catboost/core.py", line 305, in _init self._init_pool(data_matrix, label, weight, baseline, feature_names) File "_catboost.pyx", line 361, in _catboost._PoolBase._init_pool (/home/donskov/.ya/build/build_root/78633066626b733765616e696f713972/catboost/python-package/catboost/_catboost.pyx.cpp:5141) File "_catboost.pyx", line 373, in _catboost._PoolBase._init_pool (/home/donskov/.ya/build/build_root/78633066626b733765616e696f713972/catboost/python-package/catboost/_catboost.pyx.cpp:5006) File "_catboost.pyx", line 418, in _catboost._PoolBase._set_feature_names (/home/donskov/.ya/build/build_root/78633066626b733765616e696f713972/catboost/python-package/catboost/_catboost.pyx.cpp:6629) File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_TString (/home/donskov/.ya/build/build_root/78633066626b733765616e696f713972/catboost/python-package/catboost/_catboost.pyx.cpp:17167) TypeError: expected bytes, str found

Is there any way to fix it?

Errors while compiling python package with clang on Darwin (10.12.x)

Hi,

I checked out the latest code from master, and after passing the following build command (because clang is no longer default on 10.12.x but /usr/bin/llvm-gcc(++) are):

env CC=/usr/bin/clang CXX=/usr/bin/clang++ ../../ya make -r -DUSE_ARCADIA_PYTHON=no -DPYTHON_CONFIG=python3-config

I get the following linking error:

------- [LD] {FAILED} $(B)/contrib/tools/yasm/yasm{, .dSYM/Contents/Resources/DWARF/yasm} command /usr/bin/clang++ -o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/yasm -mmacosx-version-min=10.9 -nodefaultlibs /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/x86regtmod.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/x86cpu.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/raw/raw-preproc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/nasm/nasmlib.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/nasm/nasm-preproc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/nasm/nasm-pp.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/nasm/nasm-eval.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/gas/gas-preproc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/gas/gas-eval.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/preprocs/cpp/cpp-preproc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/parsers/nasm/nasm-parser.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/parsers/nasm/nasm-parse.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/parsers/gas/gas-parser.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/parsers/gas/gas-parse.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/parsers/gas/gas-parse-intel.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/xdf/xdf-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/rdf/rdf-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/macho/macho-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/elf/elf.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/elf/elf-x86-x86.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/elf/elf-x86-x32.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/elf/elf-x86-amd64.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/elf/elf-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/dbg/dbg-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/coff/win64-except.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/coff/coff-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/objfmts/bin/bin-objfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/nasm-token.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/listfmts/nasm/nasm-listfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/lc3bid.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/init_plugin.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/gas-token.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/stabs/stabs-dbgfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/null/null-dbgfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/dwarf2/dwarf2-line.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/dwarf2/dwarf2-info.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/dwarf2/dwarf2-dbgfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/dwarf2/dwarf2-aranges.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/codeview/cv-type.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/codeview/cv-symline.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/dbgfmts/codeview/cv-dbgfmt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/x86/x86id.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/x86/x86expr.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/x86/x86bc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/x86/x86arch.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/lc3b/lc3bbc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/modules/arch/lc3b/lc3barch.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/xstrdup.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/xmalloc.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/value.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/valparam.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/symrec.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/strsep.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/strcasecmp.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/section.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/phash.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/mergesort.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/md5.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/linemap.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/inttree.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/intnum.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/insn.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/hamt.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/floatnum.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/file.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/expr.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/errwarn.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/cmake-module.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bytecode.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bitvect.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bc-reserve.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bc-org.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bc-incbin.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bc-data.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/bc-align.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/libyasm/assocdat.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/frontends/yasm/yasm.c.o /Volumes/Users/arnaudsj/.ya/build/build_root/793165623763396d6773336f73786f69/contrib/tools/yasm/frontends/yasm/yasm-options.c.o -lpthread -lpthread DARWIN -lc -lm -Wl,-macosx_version_min -Wl,10.9 failed with exit code 1 clang: error: no such file or directory: 'DARWIN' Failed

By looking at the last portion of the output it looks like some params are wrongly passed to the linker: "-lpthread -lpthread DARWIN" when it should probably be simply "lpthread". I am not very familiar with ya, so I welcome suggestions on how to fix this.

Thank you!

'Invalid Description File : R installation. Did anyone succeed in installing R package?

install_github('catboost/catboost', subdir = 'catboost/R-package')
Downloading GitHub repo catboost/catboost@master
from URL https://api.github.com/repos/catboost/catboost/zipball/master
Installing catboost
"C://R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
"C:/
/Temp/Rtmp8g9Ngq/devtools81cc1e1c2710/catboost-catboost-8fde831/catboost/R-package"
--library="C:/***R-3.4.0/library" --install-tests

  • installing source package 'catboost' ...
    Error : Неправильный файл 'DESCRIPTION'


ERROR: installing package DESCRIPTION failed for package 'catboost'

  • removing 'C:/***/R-3.4.0/library/catboost'
    Error: Command failed (1)

QuickStart fails on Windows, Python 3.5

Succesful installation:
C:\Users\asuil_000>pip install catboost
Collecting catboost
Downloading catboost-0.1.1.2-py3-none-win_amd64.whl (2.3MB)
100% |################################| 2.3MB 361kB/s
Requirement already satisfied: numpy in c:\app\anaconda3\lib\site-packages (from catboost)
Requirement already satisfied: pandas in c:\app\anaconda3\lib\site-packages (from catboost)
Requirement already satisfied: six in c:\app\anaconda3\lib\site-packages (from catboost)
Requirement already satisfied: python-dateutil>=2 in c:\app\anaconda3\lib\site-packages (from pandas->catboost)
Requirement already satisfied: pytz>=2011k in c:\app\anaconda3\lib\site-packages (from pandas->catboost)
Installing collected packages: catboost
Successfully installed catboost-0.1.1.2

Error after runing first example on QuickStart page:


ImportError Traceback (most recent call last)
C:\app\Anaconda3\lib\site-packages\catboost\core.py in ()
17 try:
---> 18 from _catboost import _PoolBase, _CatBoostBase, CatboostError, _cv, _set_logger, _reset_logger
19 except ImportError:

ImportError: No module named '_catboost'

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in ()
1 import numpy as np
----> 2 from catboost import CatBoostClassifier
3 # initialize data
4 train_data = np.random.randint(0, 100, size=(100, 10))
5 train_label = np.random.randint(0, 2, size=(100))

C:\app\Anaconda3\lib\site-packages\catboost_init_.py in ()
----> 1 from .core import Pool, CatBoost, CatBoostClassifier, CatBoostRegressor, CatboostError, cv # noqa
2 try:
3 from .widget import CatboostIpythonWidget # noqa
4 except:
5 pass

C:\app\Anaconda3\lib\site-packages\catboost\core.py in ()
18 from _catboost import _PoolBase, _CatBoostBase, CatboostError, _cv, _set_logger, _reset_logger
19 except ImportError:
---> 20 from ._catboost import _PoolBase, _CatBoostBase, CatboostError, _cv, _set_logger, _reset_logger
21
22 from contextlib import contextmanager

ImportError: DLL load failed: The specified module could not be found.

Sparse matrix

Is catboost support sparse matrix representation?

Issue on param class_weights in CatBoostClassifier

Have some trouble using class weights for binary classes. Can't share dataset because of confidentiality, but looks like problem is in algorithm itself. Tried to track prob in source, but ended up at file catboost/app/mode_fit.cpp - can't find where is problem.

Using value [1, 13], 13 for positive class works (AUC ~ 0.91) but slightly worse than training without classweighting. XGBoost performs better ~0.93 with [1, 15] class weightings.
But, training process stuck on AUC 0.5. if I set 15 for positive class:

class_weights = np.array([1., 15.])
# class_weights = (class_weights / np.sum(class_weights))
class_weights = class_weights.tolist()
print(class_weights)
random_seed = 123456
model = cgb.CatBoostClassifier(iterations=500, learning_rate=0.1, depth=5, l2_leaf_reg=2, rsm=1, class_weights=class_weights,
                               auto_stop_pval=1e-4, random_seed=random_seed, eval_metric='AUC')
model.fit(train_X, train_Y, eval_set=(test_X, test_Y), verbose=True, plot=True)

No module named '_catboost'

Anaconda Windows x64, Python 3.5, install was ok:

pip install catboost

Jupiter Notebook code:

import numpy as np
from catboost import CatBoostClassifier

I got this:

ImportError: No module named '_catboost'

and this:

ImportError: DLL load failed: Не найден указанный модуль.

what is the "ya.make"? please, use build-files for normal build system. thanks!

good day!

what is the "ya.make"?

how to build this project without downloading unclear binary files from doubtful origin?

there are some normal build systems: gmake, cmake, meson, etc..

please use these tools (or some other regular build systems) for open source projects. (and please continue to use "ya.make" for Yandex-closed-inside projects)

thank you in advance!

Pool class using cat_features could not convert string to float error

Hi,
I am using Python Pool class for my string categorical features.
cat_index=[22, 32, 34, 49, 55]
train_pool = Pool(x_train, y_train, cat_features=cat_index)

I am getting the following error for the string LARS.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-5e1830151743> in <module>()
----> 1 train_pool = Pool(x_train, y_train, cat_features=cat_index)
      2 #test_pool = Pool(test_data, cat_features=cat_index)

/home/dorukhan/.local/lib/python2.7/site-packages/catboost/core.pyc in __init__(self, data, label, cat_features, column_description, delimiter, has_header, weight, baseline, feature_names, thread_count)
     92                 self._read(data, column_description, delimiter, has_header, thread_count)
     93             else:
---> 94                 self._init(data, label, cat_features, weight, baseline, feature_names)
     95         super(Pool, self).__init__()
     96 

/home/dorukhan/.local/lib/python2.7/site-packages/catboost/core.pyc in _init(self, data_matrix, label, cat_features, weight, baseline, feature_names)
    303         if feature_names is not None:
    304             self._check_feature_names(feature_names, data_shape[1])
--> 305         self._init_pool(data_matrix, label, weight, baseline, feature_names)
    306 
    307 

_catboost.pyx in _catboost._PoolBase._init_pool (/home/donskov/.ya/build/build_root/7970737a6565726b34656c346638377a/catboost/python-package/catboost/_catboost.pyx.cpp:5141)()

_catboost.pyx in _catboost._PoolBase._init_pool (/home/donskov/.ya/build/build_root/7970737a6565726b34656c346638377a/catboost/python-package/catboost/_catboost.pyx.cpp:4860)()

_catboost.pyx in _catboost._PoolBase._set_data (/home/donskov/.ya/build/build_root/7970737a6565726b34656c346638377a/catboost/python-package/catboost/_catboost.pyx.cpp:5586)()

ValueError: could not convert string to float: LARS

Python: Python 2.7.13 :: Anaconda custom (64-bit)
Installed with: pip install catboost

Thanks.

Error during installing in RStudio

I have the next error:

Downloading GitHub repo catboost/catboost@master
from URL https://api.github.com/repos/catboost/catboost/zipball/master
Installing catboost
"C:/PROGRA1/R/R-341.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
"C:/Users/yahooomg/AppData/Local/Temp/RtmpqyHggA/devtools24941ce5572/catboost-catboost-7e4ba38/catboost/R-package"
--library="C:/Program Files/R/R-3.4.0/library" --install-tests

  • installing source package 'catboost' ...
    ** libs
    running 'src/Makefile.win' ...
    /cygdrive/c/Users/yahooomg/AppData/Local/Temp/RtmpqyHggA/devtools24941ce5572/catboost-catboost-7e4ba38/catboost/R-package/src/../../../ya.bat make -r -o ../../..
    2017-07-19T08:15:12.209000 [MainThread] Info: Attention! Using system user-defined compiler: cl.exe (check CC and CXX env vars).
    2017-07-19T08:15:12.210000 [MainThread] Info: will fetch 'YMake' from sandbox

Traceback (most recent call last):
File "devtools/ya/entry/entry.py", line 157, in exit_interceptor
func()
File "devtools/ya/entry/entry.py", line 56, in
return lambda: wrapper(f)
File "devtools/ya/entry/entry.py", line 121, in f
res = func()
File "devtools/ya/entry/entry.py", line 250, in
run_main = lambda: do_main(args)
File "devtools/ya/entry/entry.py", line 49, in do_main
res = handler.handle(handler, args, prefix=['ya'])
File "devtools/ya/core/handler.py", line 157, in handle
return handler.handle(self, args[1:], prefix + [name])
File "devtools/ya/core/dispatch.py", line 37, in handle
return self.command().handle(root_handler, args, prefix)
File "devtools/ya/core/handler.py", line 337, in handle
return self._action(params)
File "devtools/ya/app.py", line 64, in helper
return action(ctx.params)
File "devtools/ya/build/build_handler.py", line 11, in do_ya_make
return YaMake(params, app_ctx).go()
File "devtools/ya/build/ya_make.py", line 519, in init
self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, configure_errors=configure_errors, make_files=make_files)
File "devtools/ya/build/ya_make.py", line 347, in init
self.graph, self.tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
File "devtools/ya/build/ya_make.py", line 282, in _build_graph_and_tests
graph, tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener)
File "devtools/ya/build/graph.py", line 1372, in build_graph_and_tests
real_ymake_bin = ct.tool('ymake')
File "devtools/ya/core/tools.py", line 214, in tool
return toolchain.find(name, with_params, for_platform)
File "devtools/ya/core/tools.py", line 152, in find
executable = cur_bottle[executable_name] # if executable_name is None it's Ok
File "devtools/ya/core/tools.py", line 60, in getitem
path = self.resolve()
File "devtools/ya/core/tools.py", line 42, in resolve
return self.__fetcher.fetch_if_need(self.__match, tared, binname).where
File "devtools/ya/yalibrary/fetcher/init.py", line 286, in fetch_if_need
self.__c[key] = self._fetch_if_need(*args, **kwargs)
File "devtools/ya/yalibrary/fetcher/init.py", line 296, in _fetch_if_need
if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
File "devtools/ya/yalibrary/fetcher/init.py", line 276, in _fetch
_install(self.where, do_install)
File "devtools/ya/yalibrary/fetcher/init.py", line 71, in _install
func(install_guard)
File "devtools/ya/yalibrary/fetcher/init.py", line 239, in do_install
http_client.download_file(url=config.mapping()["resources"][by_platform[best]["id"]], path=download_to)
File "devtools/ya/exts/retry.py", line 49, in wrapper
return retry(proxy_func, **retry_kwargs)
File "devtools/ya/exts/retry.py", line 22, in retry
return func()
File "devtools/ya/exts/retry.py", line 48, in
proxy_func = lambda: func(*args, **kwargs)
File "devtools/ya/exts/http_client.py", line 56, in download_file
res = urllib2.urlopen(request, timeout=timeout)
File "contrib/tools/python/src/Lib/urllib2.py", line 154, in urlopen
File "contrib/tools/python/src/Lib/urllib2.py", line 431, in open
File "contrib/tools/python/src/Lib/urllib2.py", line 449, in _open
File "contrib/tools/python/src/Lib/urllib2.py", line 409, in _call_chain
File "contrib/tools/python/src/Lib/urllib2.py", line 1227, in http_open
File "contrib/tools/python/src/Lib/urllib2.py", line 1194, in do_open
File "contrib/tools/python/src/Lib/httplib.py", line 1057, in request
File "contrib/tools/python/src/Lib/httplib.py", line 1097, in _send_request
File "contrib/tools/python/src/Lib/httplib.py", line 1053, in endheaders
File "contrib/tools/python/src/Lib/httplib.py", line 889, in _send_output
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 25: ordinal not in range(128)
make: *** [libcatboostr.dll] Error 1
Предупреждение: работающая команда 'make --no-print-directory -f "Makefile.win"' имеет статус 2
ERROR: compilation failed for package 'catboost'

  • removing 'C:/Program Files/R/R-3.4.0/library/catboost'
    Installation failed: Command failed (1)

Can anybody help with this?

Error when call fit(plot=True), jupyter python3.5, Ubuntu

import ipywidgets; print(ipywidgets.__version__)
import IPython; print(IPython.__version__)
X = [[1, 2], [3, 4]]
y = [1, 2]
ckf = cgb.CatBoostRegressor()
ckf.fit(X, y, plot=True)

Outputs:

6.0.0
6.1.0

/usr/local/lib/python3.5/dist-packages/catboost/core.py:385: UserWarning: For drow plots in fit() method you should install ipywidgets and ipython
  warnings.warn("For drow plots in fit() method you should install ipywidgets and ipython")

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/catboost/core.py in _fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
    380             try:
--> 381                 from widget import CatboostIpythonWidget
    382                 widget = CatboostIpythonWidget(train_dir)

ImportError: No module named 'widget'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-32-cc8632cdb220> in <module>()
      4 y = [1, 2]
      5 ckf = cgb.CatBoostRegressor()
----> 6 ckf.fit(X, y, plot=True)

/usr/local/lib/python3.5/dist-packages/catboost/core.py in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
    432         model : CatBoost
    433         """
--> 434         return self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
    435 
    436     def _predict(self, data, weight, prediction_type, ntree_limit, verbose):

/usr/local/lib/python3.5/dist-packages/catboost/core.py in _fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
    384             except ImportError as e:
    385                 warnings.warn("For drow plots in fit() method you should install ipywidgets and ipython")
--> 386                 raise ImportError(e.message)
    387         with log_fixup():
    388             self._train(X, eval_set, params)

AttributeError: 'ImportError' object has no attribute 'message'

TypeError: expected bytes, str found

From the documentation I suspected that pandas data frame is a valid input format for CatBoostClassifier(). However, when I run the code

m4 = catboost.CatBoostClassifier()
m4.fit(X_train, y_train.factorize()[0])

it gives an error message TypeError: expected bytes, str found. Note that type(X_train) gives pandas.core.frame.DataFrame.

If I change the code to

m4 = catboost.CatBoostClassifier()
m4.fit(X_train.as_matrix(), y_train.factorize()[0])

then everything is fine.

R package need Visual Studio compiler only?

сonfig was not generated due to errors in C:\xxx\Temp\RtmpymqlxK\devtools3e9853651dec\catboost-catboost-87e5a37\build\ymake_conf.py
ERROR:root:Could not detect cxx compiler, error is [Error 2]

Problem with multiclass

clf = cb.CatBoostClassifier()
score = cross_val_score(clf,x_train2,y,cv=5,scoring='neg_log_loss')
score.mean()

/usr/local/lib/python2.7/dist-packages/catboost/core.pyc in _fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
386 raise ImportError(e.message)
387 with log_fixup():
--> 388 self._train(X, eval_set, params)
389 if calc_feature_importance:
390 setattr(self, "feature_importance", self.feature_importances(X))

_catboost.pyx in _catboost._CatBoostBase._train (/home/rnefyodov/.ya/build/build_root/7466703371396a7832336d7773326674/catboost/python-package/catboost/_catboost.pyx.cpp:13630)()

_catboost.pyx in _catboost._CatBoost._train (/home/rnefyodov/.ya/build/build_root/7466703371396a7832336d7773326674/catboost/python-package/catboost/_catboost.pyx.cpp:9713)()

_catboost.pyx in _catboost._CatBoost._train (/home/rnefyodov/.ya/build/build_root/7466703371396a7832336d7773326674/catboost/python-package/catboost/_catboost.pyx.cpp:9507)()

CatboostError: catboost/libs/algo/train_model.cpp:109: All targets are greater than border

caret wrapper failing with numerical target.

Hello,

Thank you very much for making catboost available and for all your work on this. Unfortunately the catboost.caret wrapper seems to fail even in very simple examples when using numerical responses. For example:

library(caret)
require(catboost)

set.seed(12345)

target_ <- sample(c(1, 2, 3), size = 1000, replace = TRUE)

data <- data.frame(f_numeric = target_ + rnorm(length(target_), mean = 0, sd = 1),
                   f_logical = (target_ + rnorm(length(target_), mean = 0, sd = 1)) > 0,
                   f_factor = as.factor(round(10 * (target_ + rnorm(length(target_), mean = 0, sd = 1)))),
                   f_character = as.character(round(10 * (target_ + rnorm(length(target_), mean = 0, sd = 1)))))

data$f_logical = as.factor(data$f_logical)
data$f_character = as.factor(data$f_character)

data$target <- target_ # as.factor(make.names(target_))

fit_control <- trainControl(method = "cv",    number = 4,   classProbs = TRUE)

grid <- expand.grid(depth = c(4, 6), learning_rate = c(0.01, 0.1, 0.2), iterations = 10)

report <- train(target ~ f_numeric + f_logical + f_factor,    data = data,   
                       method = catboost.caret, verbose = FALSE,   preProc = NULL, 
                       tuneGrid = grid,   trControl = fit_control)
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :6     NA's   :6     NA's   :6     

In addition, would it be possible to add the l2_leaf_reg and rsm variable as being tunable? These two seem as the two most obvious parameters to control over-fitting.
Thanks again for all your work on catboost!

Java library with Maven support

Due to inferiority of Python to proper languages/ecosystems like Java/Kotlin, please add support for CatBoost as java module published in Maven Central / JCenter repository.

XGBoost have such implementation called XGBoost4J already: https://github.com/dmlc/xgboost/tree/master/jvm-packages

Java ecosystem even have very popular Keras-like framework DeepLearning4J: http://deeplearning4j.org

P.S. Please embed all platform-specific native libraries into single JAR, so Java developer wouldn't have to recompile your project during deployment or change pom.xml to support different CPU/GPU in development and production environments.

Problem with save_model

I am having problem with save_model. I was able to pickle and unpickle it.

classifier.save_model(fileName, format="cbm")
Traceback (most recent call last):

File "", line 1, in
classifier.save_model(fileName, format="cbm")

File "C:\Anaconda3\lib\site-packages\catboost\core.py", line 638, in save_model
self._save_model(fname, format, export_parameters)

File "_catboost.pyx", line 768, in _catboost._CatBoostBase._save_model (c:\users\donskov.ya\build\build_root\37396b7a77396336667370646e6e6a34\catboost\python-package\catboost_catboost.pyx.cpp:14763)

File "_catboost.pyx", line 667, in _catboost._CatBoost._save_model (c:\users\donskov.ya\build\build_root\37396b7a77396336667370646e6e6a34\catboost\python-package\catboost_catboost.pyx.cpp:11571)

File "_catboost.pyx", line 669, in _catboost._CatBoost._save_model (c:\users\donskov.ya\build\build_root\37396b7a77396336667370646e6e6a34\catboost\python-package\catboost_catboost.pyx.cpp:11458)

File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_TString (c:\users\donskov.ya\build\build_root\37396b7a77396336667370646e6e6a34\catboost\python-package\catboost_catboost.pyx.cpp:17179)

TypeError: expected bytes, str found

Quickstart example fails on Python 3.5

This example https://tech.yandex.com/catboost/doc/dg/concepts/python-quickstart-docpage/ fails with the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-e28159aea7ef> in <module>()
      4 model = CatBoostClassifier(iterations=2, depth=2, learning_rate=1, loss_function='Logloss', verbose=True)
      5 model.fit(train_data, train_label, cat_features=[0,2,5], verbose=True)
----> 6 preds_class = model.predict(test_data)
      7 preds_proba = model.predict_proba(test_data)

~/shub/memex/hh-page-classifier/venv/lib/python3.5/site-packages/catboost/core.py in predict(self, data, weight, prediction_type, ntree_limit, verbose)
    964         prediction : numpy.array
    965         """
--> 966         return self._predict(data, weight, prediction_type, ntree_limit, verbose)
    967
    968     def predict_proba(self, data, weight=None,  ntree_limit=0, verbose=None):

~/shub/memex/hh-page-classifier/venv/lib/python3.5/site-packages/catboost/core.py in _predict(self, data, weight, prediction_type, ntree_limit, verbose)
    435
    436     def _predict(self, data, weight, prediction_type, ntree_limit, verbose):
--> 437         verbose = verbose or self.get_param('verbose')
    438         if verbose is None:
    439             verbose = False

~/shub/memex/hh-page-classifier/venv/lib/python3.5/site-packages/catboost/core.py in get_param(self, key)
    630             The param value of the key, returns None if param do not exist.
    631         """
--> 632         params = self.get_params()
    633         if params is None:
    634             return {}

~/shub/memex/hh-page-classifier/venv/lib/python3.5/site-packages/catboost/core.py in get_params(self, deep)
    644             Dictionary of {param_key: param_value}.
    645         """
--> 646         return self._get_params()
    647
    648     def set_params(self, **params):

_catboost.pyx in _catboost._CatBoostBase._get_params (/Users/rnefyodov/.ya/build/build_root/726f38643039646b323639696e6f6e64/catboost/python-package/catboost/_catboost.pyx.cpp:15539)()

_catboost.pyx in _catboost._CatBoost._get_params (/Users/rnefyodov/.ya/build/build_root/726f38643039646b323639696e6f6e64/catboost/python-package/catboost/_catboost.pyx.cpp:12194)()

_catboost.pyx in _catboost._CatBoost._get_params (/Users/rnefyodov/.ya/build/build_root/726f38643039646b323639696e6f6e64/catboost/python-package/catboost/_catboost.pyx.cpp:12049)()

/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    310     if not isinstance(s, str):
    311         raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312                             s.__class__.__name__))
    313     if s.startswith(u'\ufeff'):
    314         raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",

TypeError: the JSON object must be str, not 'bytes'

Python 3.5.1 on OS X, catboost-0.1.1.2-py3-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

Error in "iterations"-key when passing dict as parameter

It seems to be an error when trying to pass a dict as parameter to the booster. Apparently, the "iterations"-key gets set to one of the other keys instead of its value.
`from catboost import CatBoostRegressor
import numpy as np

cat_params = {'auto_stop_pval': 0,
'depth': 6,
'feature_border_type': 'MinEntropy',
'has_time': False,
'iterations': 1000,
'l2_leaf_reg': 3,
'learning_rate': 0.03,
'loss_function': 'RMSE',
'name': 'experiment',
'random_strength': 1,
'rsm': 1,
'store_all_simple_ctr': False,
'use_best_model': False,
'verbose': False}

cat = CatBoostRegressor(cat_params)

cat.get_params()
`

[Out]
{'auto_stop_pval': 0,
'depth': 6,
'feature_border_type': 'MinEntropy',
'has_time': False,
'iterations': {'auto_stop_pval': 0,
'depth': 6,
'feature_border_type': 'MinEntropy',
'has_time': False,
'l2_leaf_reg': 3,
'learning_rate': 0.03,
'loss_function': 'RMSE',
'name': 'experiment',
'random_strength': 1,
'rsm': 1,
'store_all_simple_ctr': False,
'use_best_model': False,
'verbose': False},
'l2_leaf_reg': 3,
'learning_rate': 0.03,
'loss_function': 'RMSE',
'name': 'experiment',
'random_strength': 1,
'rsm': 1,
'store_all_simple_ctr': False,
'use_best_model': False,
'verbose': False}

This obviously wont work, so when trying to fit the booster, following error appears:
`---------------------------------------------------------------------------
CatboostError Traceback (most recent call last)
in ()
----> 1 catreg = cat.fit(X, y, eval_set= (X_val, y_val))

/home/thomas/anaconda3/lib/python3.6/site-packages/catboost/core.py in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
432 model : CatBoost
433 """
--> 434 return self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
435
436 def _predict(self, data, weight, prediction_type, ntree_limit, verbose):

/home/thomas/anaconda3/lib/python3.6/site-packages/catboost/core.py in _fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
386 raise ImportError(e.message)
387 with log_fixup():
--> 388 self._train(X, eval_set, params)
389 if calc_feature_importance:
390 setattr(self, "feature_importance", self.feature_importances(X))

_catboost.pyx in _catboost._CatBoostBase._train (/home/rnefyodov/.ya/build/build_root/676e74667a3979746c773265657a6d73/catboost/python-package/catboost/_catboost.pyx.cpp:13630)()

_catboost.pyx in _catboost._CatBoost._train (/home/rnefyodov/.ya/build/build_root/676e74667a3979746c773265657a6d73/catboost/python-package/catboost/_catboost.pyx.cpp:9713)()

_catboost.pyx in _catboost._CatBoost._train (/home/rnefyodov/.ya/build/build_root/676e74667a3979746c773265657a6d73/catboost/python-package/catboost/_catboost.pyx.cpp:9507)()

CatboostError: library/json/writer/json_value.cpp:470: Not an integer`

Quickfix is to use the set_params() after passing the dict but before training.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.