Git Product home page Git Product logo

gaic_track3_pair_sim's Introduction

nilboy's GitHub stats-Dark

Top Langs

gaic_track3_pair_sim's People

Contributors

nilboy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gaic_track3_pair_sim's Issues

软标签

方案一中train.sh中,先预训练M个模型,然后训练M*K个kfold分类模型,并用这些分类模型对kfold数据打标签,得到classification的soft label的训练数据A。然后用ensemble模型,训练数据A训练kfold回归模型,然后再给kfold数据打标签,然后得到regression的soft label的训练数据B,然后在用ensemble模型,训练数据B训练全量回归模型。
请教下为什么要打两次softlabel标签呢?基于分类模型来预测的softlabel,直接训练一次回归模型可以吗

测试数据和vocab问题

你好,请问gaiic_track3_round1_testB_20210317.tsv测试文件能提供一下吗?非常感谢!

另外,在看您的代码时有些疑惑,根据docker run走的流程如下:
run.sh->run_inner_2.sh-> pipeline/pipeline_d.py->process_data_s1.sh,然后执行了下面两个.py
convert_data.py --n_splits=8
process_oov_data.py

convert_data:对train.tsv抽取字表,字:字频 保存为normal_vocab.json,字:索引 保存为idmap.json;然后利用这两个表把train.tsv和test.tsv转为id表示后保存。
convert_data.py:这里用construct_vocab函数创建了另一个vocab.json(不同于idmap.json),然后用convert_record_style函数根据vocab.json把之前保存的train.tsv和test.tsv(都用idmap.json转为id了)还原成文字,转完是乱码一样的文字。我疑惑的是为什么用不一样的词表转换呢?为什么这么做?

数据

天池网站的数据已经无法获取,可否给一个链接获取比赛数据

模型融合

求教,多个模型是如何融合的?没太看懂代码

代码方案请教

感谢大佬开源方案,一边看代码一边跑从中学习了很多, 有些两个问题想要请教一下:

  1. 初赛的时候,最终训练的模型是回归模型, 这种做法有在哪个论文里面有提到过吗? 还是只是比赛的一种trick
  2. 为什么初赛,复赛,复赛b榜的方案都不一样呢? 感觉上做法是越做越简单了, 最后定榜的模型直接是多个分类模型的average, 为什么之前的方法(构建soft label 训练regresssion model 和 构建大的ensemble 模型的方法没有继续沿用了呢? )

如果大佬还记得相关的细节,麻烦指导下,再次谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.