Git Product home page Git Product logo

gbdt's Issues

get impurity question

计算基尼不纯度,代码里的逻辑是这样的,见fitness.cpp代码:

162 for (size_t j = unknown; j < len-1; ++j) {
163 s = data_copy[j]->target * data_copy[j]->weight;
164 ss = Squared(data_copy[j]->target) * data_copy[j]->weight;
165 c = data_copy[j]->weight;
166
167 ls += s;
168 lss += ss;
169 lc += c;
170
171 rs -= s;
172 rss -= ss;
173 rc -= c;
174
175 ValueType f1 = data_copy[j]->feature[index];
176 ValueType f2 = data_copy[j+1]->feature[index];
177 if (AlmostEqual(f1, f2))
178 continue;
179
180 fitness1 = lc > 1? (lss - lsls/lc) : 0;
181 if (fitness1 < 0) {
182 // std::cerr << "fitness1 < 0: " << fitness1 << std::endl;
183 fitness1 = 0;
184 }
185
186 fitness2 = rc > 1? (rss - rs
rs/rc) : 0;
187 if (fitness2 < 0) {
188 // std::cerr << "fitness2 < 0: " << fitness2 << std::endl;
189 fitness2 = 0;
190 }
191
192 double fitness = fitness0 + fitness1 + fitness2;
193
194 if (g_conf.feature_costs && g_conf.enable_feature_tunning) {
195 fitness *= g_conf.feature_costs[index];
196 }
197
198 if (*impurity > fitness) {
199 *impurity = fitness;
200 *value = (f1+f2)/2;
201 *gain = fitness00 - fitness1 - fitness2;
202 }
203 }
204
205 return *impurity != std::numeric_limits::max();
206 }

计算impurity我的理解是应该计算全量样本的,代码的逻辑貌似是循环每一条非空特征的样本,然后取最小值。所以我的疑问是198-202行代码是不是应该在for循环之外呢,还是我的理解有问题?求解答

the method to precess missing value

模型在处理missing value有点问题,麻烦看一下,谢谢!
这是我构造的训练集A,20条样本,训练集上的auc=0.55
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1

训练集合B,对A补全了特征,20条样本,训练集上auc=1
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1

请教几个问题?

1、enable_initial_guess这个大概是啥意思?
2、gbdt为啥要加一个bias呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.