Git Product home page Git Product logo

Comments (11)

xiaoqianjia avatar xiaoqianjia commented on May 28, 2024

我是可以跑通,但是发现了一个错误,代码作者应该是吧权重更新条件搞反了
for j in range(row_S):
weights[row_A + j] = weights[row_A + j] * np.power(bata_T[0, i],(-np.abs(result_label[row_A + j, i] - label_S[j])))

    # 调整辅域样本权重
    for j in range(row_A):
        weights[j] = weights[j] * np.power(bata, np.abs(result_label[j, i] - label_A[j]))

源数据应该bata_T 的负次幂
我对了戴文渊的论文,应该是搞错了

from tradaboost.

zhangjiantianyasmile avatar zhangjiantianyasmile commented on May 28, 2024

经您的提示,我也发现权重更新写错了,代码作者把戴文渊论文里面提到的权重更新公式两个for循环上下颠倒了,看起来好难受的说,关键是负号写错位置了。您修改的权重更新应该是对的。
重新看了下代码和paper算法,还没有发现其他错误。
待我先看一下修改后的权重更新实验结果如何再来回帖。

from tradaboost.

chenchiwei avatar chenchiwei commented on May 28, 2024

代码里面更新权重的方向确实跟论文的反了,之前我用论文的方法去训练,发现结果一直不能收敛,反而误差越来越大,但是反方向却收敛了,这个问题希望大家可以共同解决下,数据是用这里边下载的:https://www.kesci.com/apps/home/#!/competition/58e46b3b9ed26b1e09bfbbb7/content/0

from tradaboost.

zhangjiantianyasmile avatar zhangjiantianyasmile commented on May 28, 2024

重新看了下代码和paper算法,发现beta的值好像存在问题。
bata = 1 / (1 + np.sqrt(2 * np.log(row_A) / N)) #个人调整后的beta
还有一处是代码作者为了防止过拟合添加的代码:
if error_rate == 0:
N = i
break # 防止过拟合
为什么这样才做就能防止过拟合了,不太明白,烦请知之者告知。
还有个问题,就是关于这个Learner,文章里是用SVM实现的,代码是用决策树实现的,
我也想用SVM实现,还不清楚样本权重(权重向量)该这么用,希望大佬能给些支持。
未完待续(先贴上评论,再继续写)
更新:
我在第一次评论上提到的问题还是存在的(预测结果还是全1,当然tradaboost给出的error_rate还一直是0)

from tradaboost.

xiaoqianjia avatar xiaoqianjia commented on May 28, 2024

Svm 直接用sklearn就好了,你可以查查用法……还

from tradaboost.

xiaoqianjia avatar xiaoqianjia commented on May 28, 2024

权重更新反了,那么这个问题也就没有意义了,代码作者可以把每次权重打印出来看,你会发现反了以后,最终误差收敛是因为把误分类的点全部过滤掉了,并没有起到对源数据集分类的意义了……我做的也是收敛,但是开始误差比较大,接近0.5,不能达到戴文渊论文的收敛图形,我也是有点困惑,希望大家多交流……我硕士论文题目想做跟这个相关的

from tradaboost.

chenchiwei avatar chenchiwei commented on May 28, 2024

提前结束迭代是防止数据过拟合其中的一种方法

from tradaboost.

chenchiwei avatar chenchiwei commented on May 28, 2024

@xiaoqianjia 是的,所以目前这个问题还没得到好的解答,你们如果有其他数据,可以共享出来,一起探讨下,代码我晚点更新

from tradaboost.

xiaoqianjia avatar xiaoqianjia commented on May 28, 2024

@chenchiwei 我用的sklearn里面 20newsgroup因为戴文渊论文用的这个,分类也是一样的,但是最后虽然收敛,一开始误差太大……我也是有点不知道怎么解决。但是从权重变化开,确实这种做法让误分类的点变得非常少了,防止过拟合那个条件还不错

from tradaboost.

xiaoqianjia avatar xiaoqianjia commented on May 28, 2024

@zhangjiantianyasmile 建议试试linearSVM logisticRegression, 决策树有的时候分类太强也会导致误差为0

from tradaboost.

zhangjiantianyasmile avatar zhangjiantianyasmile commented on May 28, 2024

感谢 @xiaoqianjia ,使用linearSVM已经解决了分类结果全1的问题。

from tradaboost.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.