qiyiping / gbdt Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v2.0
License: GNU General Public License v2.0
计算基尼不纯度,代码里的逻辑是这样的,见fitness.cpp代码:
162 for (size_t j = unknown; j < len-1; ++j) {
163 s = data_copy[j]->target * data_copy[j]->weight;
164 ss = Squared(data_copy[j]->target) * data_copy[j]->weight;
165 c = data_copy[j]->weight;
166
167 ls += s;
168 lss += ss;
169 lc += c;
170
171 rs -= s;
172 rss -= ss;
173 rc -= c;
174
175 ValueType f1 = data_copy[j]->feature[index];
176 ValueType f2 = data_copy[j+1]->feature[index];
177 if (AlmostEqual(f1, f2))
178 continue;
179
180 fitness1 = lc > 1? (lss - lsls/lc) : 0;
181 if (fitness1 < 0) {
182 // std::cerr << "fitness1 < 0: " << fitness1 << std::endl;
183 fitness1 = 0;
184 }
185
186 fitness2 = rc > 1? (rss - rsrs/rc) : 0;
187 if (fitness2 < 0) {
188 // std::cerr << "fitness2 < 0: " << fitness2 << std::endl;
189 fitness2 = 0;
190 }
191
192 double fitness = fitness0 + fitness1 + fitness2;
193
194 if (g_conf.feature_costs && g_conf.enable_feature_tunning) {
195 fitness *= g_conf.feature_costs[index];
196 }
197
198 if (*impurity > fitness) {
199 *impurity = fitness;
200 *value = (f1+f2)/2;
201 *gain = fitness00 - fitness1 - fitness2;
202 }
203 }
204
205 return *impurity != std::numeric_limits::max();
206 }
计算impurity我的理解是应该计算全量样本的,代码的逻辑貌似是循环每一条非空特征的样本,然后取最小值。所以我的疑问是198-202行代码是不是应该在for循环之外呢,还是我的理解有问题?求解答
Functions for gradient boosting classifier make me confused.
模型在处理missing value有点问题,麻烦看一下,谢谢!
这是我构造的训练集A,20条样本,训练集上的auc=0.55
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
训练集合B,对A补全了特征,20条样本,训练集上auc=1
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
11
1、enable_initial_guess这个大概是啥意思?
2、gbdt为啥要加一个bias呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.