Comments (6)
@gutouyu xgboost是不支持category特征的 在训练模型之前 需要我们进行预处理 可以根据特征的具体形式 来选择one-hot encoding(无序)还是label encoding(有序)。
当category的特征值非常多时,one-hot encoding会非常稀疏。这时候one-hot encoding的效果可能不好,可以用NN训练一个该category的向量,或者用其他方式来编码。
from tgboost.
就是传入的必须是数值类型 比如如果有个星期特征“星期三” xgboost是不支持直接这样输入非数值类型的 至于你编码成0010000(无序)或者3(有序,每周七天从1到7)就得跑实验看看效果了 但编码之后不再是“星期三”这种非数值类型 而是变成数值类型
from tgboost.
@liudragonfly xgboost不支持category特征是指这类特征必须编码后才能输入处理吗?我理解的是能够按照有序无序来自动处理就是算支持,否则就是不支持。。。
请多多指教
from tgboost.
@liudragonfly 懂了,非常感谢
from tgboost.
tgboost-python 这个分支的实现,不支持类别特征处理,也就是把任何输入当成数值型特征,所以需要用户自己预处理类别特征。master分支 的实现支持类别特征处理:
Handle categorical feature, TGBoost order the categorical feature by their statistic (Gradient_sum / Hessian_sum) on each tree node, then conduct split finding as numeric feature.
from tgboost.
非常感谢,请问下xgboost也是这么做的吗? @wepe
from tgboost.
Related Issues (10)
- gbm.py 中有俩个 min_child_weight HOT 1
- Try to fix some code in AttributeList.java HOT 1
- DataSet HOT 1
- 离散变量该如何split? HOT 4
- Why pass reg_lambda into the loss class?
- Pool() is missing num_thread args
- logistic loss computation HOT 1
- 能解释下ClassList和AttributeList命名的意思吗 HOT 2
- are grad and hess formulas in LogisticLoss wrong? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tgboost.