Comments (4)
你好!
问题1.1 数据集涵盖范围
我们在当前版本文章中使用的CQIA-Subset和huggingface讨论区给出的数据一致,关于Subset的具体数据并未在初版文章中介绍,我们在完成其余实验后会一并在下一版论文中更新
问题1.2 ruozhiba的数据使用
- ruozhiba数据构造流程是首先人工为GPT4提供弱智吧问题的逻辑缪误,然后使用GPT4生成回答框架,但该回复框架通常并不完全可用,因此我们又继续对该回复框架进行了人工修改,以达到符合SFT需求的目的。Bias的问题可能是存在的,我们仍然在进行实验来得出更严谨的结论
- ruozhiba的数据仅有标题在最终数据集中,贴吧网友所说“你们提供了提问的思路”是正确的,因为弱智吧回复通常不适合作为SFT数据
问题1.3 baseline能力没有给出评估
我们在人工和GPT4评估中都是为了考察模型在zero-shot情况下的问答和指令遵循能力,但Yi-6B等base版本模型未经指令微调或对齐,不太适合做zero-shot设置下的问答和指令等开放生成任务,因此当时并没有考虑将其作为baseline。但我们也会尝试补充这一部分实验。对于其他baseline,我们会考虑对比同等数据量级的中文指令微调数据。
如果您仍然疑惑或者有其他问题,欢迎留言讨论!
from coig-cqia.
好的,非常感谢你快速的回复。想请问下一版论文有放出的时间计划吗?
from coig-cqia.
好的,非常感谢你快速的回复。想请问下一版论文有放出的时间计划吗?
下一版本尽量会在两周之内放出来
from coig-cqia.
好的,期待。
from coig-cqia.
Related Issues (8)
- Where is the data file(s)? HOT 3
- 关于几点clarification HOT 3
- 顺手画了个数据血缘图 HOT 3
- 弱智吧的一条数据有点问题 HOT 2
- 模型下载点不进去 HOT 2
- 模型训练参数和推理参数
- 同一个prompt会存在多个不同的回答,这样的目的是?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coig-cqia.