Comments (6)
here it is.
https://github.com/ymcui/Chinese-BERT-wwm/tree/master/data/chnsenticorp
from chinese-xlnet.
以下是您提供的腳本參數,想詢問如何讓模型產出 dev.tsv 和 test.tsv 呢?
似乎沒有特別預測哪個檔案
XLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET
MODEL_DIR=YOUR_OUTPUT_MODEL_PATH
DATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS
RAW_DIR=YOUR_RAW_DATA_DIR
TPU_NAME=v2-xlnet
TPU_ZONE=us-central1-b
python -u run_classifier.py \
--spiece_model_file=./spiece.model \
--model_config_path=${XLNET_DIR}/xlnet_config.json \
--init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \
--task_name=csc \
--do_train=True \
--do_eval=True \
--eval_all_ckpt=False \
--uncased=False \
--data_dir=${RAW_DIR} \
--output_dir=${DATA_DIR} \
--model_dir=${MODEL_DIR} \
--train_batch_size=48 \
--eval_batch_size=48 \
--num_hosts=1 \
--num_core_per_host=8 \
--num_train_epochs=3 \
--max_seq_length=256 \
--learning_rate=2e-5 \
--save_steps=5000 \
--use_tpu=True \
--tpu=${TPU_NAME} \
--tpu_zone=${TPU_ZONE}
from chinese-xlnet.
run_classifier.py里有CSCProcessor,你可以看一下,会自动读取dev/test
from chinese-xlnet.
以這個為例,產出檔案則為dev.tsv
若要產出test.tsv,則要在set_type那裏做修正
這樣理解對嗎?
感謝!
def get_devtest_examples(self, data_dir, set_type="dev"):
input_file = os.path.join(data_dir, set_type+".tsv")
tf.logging.info("using file %s" % input_file)
lines = self._read_tsv(input_file)
examples = []
for (i, line) in enumerate(lines):
if i == 0:
continue
guid = "%s-%s" % (set_type, i)
text_a = line[1]
label = line[0]
examples.append(
InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
return examples
from chinese-xlnet.
对的,在调用get_devtest_examples时传入set_type="dev"或者"test"
from chinese-xlnet.
十分感謝!
from chinese-xlnet.
Related Issues (20)
- 关于MRC任务 HOT 1
- 如何使用单机多卡GPU训练呢? HOT 1
- train.py HOT 1
- 你好,我用 pytorch 版本的 XLNet-base进行测试生成,未 fine-tuning,发现效果贼差,不知道怎么回事? HOT 7
- 正在训练的时候就报错,重新尝试了几次都是这个错误,不知道是代码原因还是数据原因,跪求解决 HOT 2
- 如何对chinese xlnet 蒸馏?产生小模型 HOT 1
- 相对于官方版本,中文版的xlnet对算法上有改动吗,如果有的话改动在什么地方呢? HOT 2
- 预训练时设置的mem_len=384但是下载的pytorch模型里mem_len=null HOT 4
- XLNet其实不能稳压RoBERTa吧? HOT 2
- 如何做预测 HOT 2
- 在huggingface.co的chinese-xlnet-mid预训练模型做生成任务,没有结果 HOT 2
- 你好,我使用 pytorch 版本的 XLNet 跑 baseline 二分类,效果非常差 HOT 3
- 有没有比过GPU (train_gpu.py)和TPU (train.py)版本的预训练效果 HOT 2
- 关于分词上的一点问题 HOT 5
- Performance issues in the program HOT 5
- Performance issue in src/data_utils.py (by P3) HOT 7
- 想在自己领域数据集上进行二次pretrain,正确的操作方式是什么呢? HOT 6
- 请问大佬,关于中文XLNet自回归的问题 HOT 4
- ValueError: not enough values to unpack (expected 2, got 1) HOT 2
- Feature: cls_index (data type: int64) is required but could not be found HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chinese-xlnet.