Git Product home page Git Product logo

gbdt's People

Contributors

qiyiping avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gbdt's Issues

get impurity question

计算基尼不纯度,代码里的逻辑是这样的,见fitness.cpp代码:

162 for (size_t j = unknown; j < len-1; ++j) {
163 s = data_copy[j]->target * data_copy[j]->weight;
164 ss = Squared(data_copy[j]->target) * data_copy[j]->weight;
165 c = data_copy[j]->weight;
166
167 ls += s;
168 lss += ss;
169 lc += c;
170
171 rs -= s;
172 rss -= ss;
173 rc -= c;
174
175 ValueType f1 = data_copy[j]->feature[index];
176 ValueType f2 = data_copy[j+1]->feature[index];
177 if (AlmostEqual(f1, f2))
178 continue;
179
180 fitness1 = lc > 1? (lss - lsls/lc) : 0;
181 if (fitness1 < 0) {
182 // std::cerr << "fitness1 < 0: " << fitness1 << std::endl;
183 fitness1 = 0;
184 }
185
186 fitness2 = rc > 1? (rss - rs
rs/rc) : 0;
187 if (fitness2 < 0) {
188 // std::cerr << "fitness2 < 0: " << fitness2 << std::endl;
189 fitness2 = 0;
190 }
191
192 double fitness = fitness0 + fitness1 + fitness2;
193
194 if (g_conf.feature_costs && g_conf.enable_feature_tunning) {
195 fitness *= g_conf.feature_costs[index];
196 }
197
198 if (*impurity > fitness) {
199 *impurity = fitness;
200 *value = (f1+f2)/2;
201 *gain = fitness00 - fitness1 - fitness2;
202 }
203 }
204
205 return *impurity != std::numeric_limits::max();
206 }

计算impurity我的理解是应该计算全量样本的,代码的逻辑貌似是循环每一条非空特征的样本,然后取最小值。所以我的疑问是198-202行代码是不是应该在for循环之外呢,还是我的理解有问题?求解答

the method to precess missing value

模型在处理missing value有点问题,麻烦看一下,谢谢!
这是我构造的训练集A,20条样本,训练集上的auc=0.55
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
0 1 0:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1
1 1 1:1

训练集合B,对A补全了特征,20条样本,训练集上auc=1
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
0 1 0:1 1:0
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1
1 1 0:0 1:1

请教几个问题?

1、enable_initial_guess这个大概是啥意思?
2、gbdt为啥要加一个bias呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.