Comments (2)
Some corpus is protected by copyright and this project owner has no right to release.
For those public corpus, it is actually easy to obtain. You can search keywords 'Chinese corpus' on GitHub, or gather it by yourself.
from chinese-xlnet.
Thank you for your clarification. @yaleimeng
I agree with that the large-scale training data with free access is important in future NLP research. However, the license issue is inevitable in reality. One thing that you should have noticed: you CAN NOT find ready-to-download large-scale Baike data but you will find a lot of spider programs.
In this context, I'm afraid you have to use these spider programs for crawling the data by yourself. Sorry for the inconvenience that have caused.
from chinese-xlnet.
Related Issues (20)
- 如何使用单机多卡GPU训练呢? HOT 1
- train.py HOT 1
- 你好,我用 pytorch 版本的 XLNet-base进行测试生成,未 fine-tuning,发现效果贼差,不知道怎么回事? HOT 7
- 正在训练的时候就报错,重新尝试了几次都是这个错误,不知道是代码原因还是数据原因,跪求解决 HOT 2
- 如何对chinese xlnet 蒸馏?产生小模型 HOT 1
- 相对于官方版本,中文版的xlnet对算法上有改动吗,如果有的话改动在什么地方呢? HOT 2
- 预训练时设置的mem_len=384但是下载的pytorch模型里mem_len=null HOT 4
- XLNet其实不能稳压RoBERTa吧? HOT 2
- 如何做预测 HOT 2
- 在huggingface.co的chinese-xlnet-mid预训练模型做生成任务,没有结果 HOT 2
- 你好,我使用 pytorch 版本的 XLNet 跑 baseline 二分类,效果非常差 HOT 3
- 有没有比过GPU (train_gpu.py)和TPU (train.py)版本的预训练效果 HOT 2
- 关于分词上的一点问题 HOT 5
- Performance issues in the program HOT 5
- Performance issue in src/data_utils.py (by P3) HOT 7
- 想在自己领域数据集上进行二次pretrain,正确的操作方式是什么呢? HOT 6
- 请问大佬,关于中文XLNet自回归的问题 HOT 4
- ValueError: not enough values to unpack (expected 2, got 1) HOT 2
- Feature: cls_index (data type: int64) is required but could not be found HOT 4
- 请教一下有适合的CPU推理加速的框架么? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chinese-xlnet.