Git Product home page Git Product logo

mengzi's People

Contributors

ag2s1 avatar ak391 avatar cooelf avatar huajingyun avatar lanse-sir avatar yingyibiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mengzi's Issues

Mengzi-T5-base-MT模型大小

为什么Mengzi-T5-base-MT的模型大小只有Mengzi-T5-base的一半,加载模型再保存以后,又恢复和base相同的大小

test_ch.yaml文件

您好,多模态oscar模型,在acc-icc数据集进行推理的时候,采用语句是:
python -m torch.distributed.launch --nproc_per_node=8 oscar/run_captioning.py
--data_dir
--do_test --test_yaml test_ch.yaml
--num_beams 5 --per_gpu_eval_batch_size 128 --max_gen_length 20
--eval_model_dir

请问test_ch.yaml文件是位于哪里呢

请问Mengzi-T5-base的预训练任务是DAE还是LM?

感谢贡献这么优秀的预训练模型。方便的话,能否告知Mengzi-T5-base的预训练任务是denoising auto-encoding (DAE)还是预测下一段文本(LM)?如果是DAE的话,用了什么noise呢?Token Infilling和Sentence Permutation之类的。

batch size究竟是128还是16384

我注意到技术报告中2.1节提到:

We limit the length of sentences in each batch to up to 512 tokens, and the batch size is 128.

这一段后面又提到:

The batch sizes for the two stages are 16384 and 32768, respectively

请问究竟batch size究竟是哪个呢?是否前一个是number of sequences,后面一个是number of tokens?还是由于使用了LAMB所以能支持这么大的batch size?LAMB的paper用的是32868。

How to incorporate knowledge graph in marketing copywriting?

Hi,
Thanks for sharing this awesome work.
According to the Figure 2. of your paper, you incorporate knowledge graph in marketing copywriting task,
but it seems there is no further explanation about this.
Could you please explain more about this method?

What is the input format for the model to automatically generate marketing copy?

hello langboat, thanks for sharing the good work.
Regarding the automatically generated marketing copy in the paper

Given the input title and keywords, the models are required to generate a corresponding descriptive passage

What is the input of the model?
Is it in the form of [cls] title [sep] [keywords1,keywords2,keywords3,keywords4] [sep] [kg11,kg12,kg13] [kg21,kg22,kg23]?

请问Mengzi-BERT-base在CLUE的9项下游任务中,训练的平台配置和参数是多少?

开发者您好!论文中说Mengzi-BERT-base在CLUE的9项下游任务中超过了RoBERTa、BERT等baseline,我有几个问题想请教您一下:
① 请问在下游任务训练中,你们使用的硬件平台配置是多少呢?例如显卡配置、CUDA版本等。
② 而且,方便透露下游任务训练中更具体的参数设置吗?例如优化器的参数配置、warmup的设置、模型初始化的seed值、下游任务中是否使用了fp16等。
③ 刚刚看到FAQ中说不考虑开放training代码,请问Mengzi-BERT-base的下游任务训练代码也不会考虑开放吗?

关于mengzi-gpt-neo-base某些字无法正常显示的问题

例句:“Linux ⁇ 能和Windows相比,其支持的 ⁇ 能较低,但是 ⁇ 能很低。”

这里的“性”字变成了问号。经过我的排查,我发现vocab里面没有这个文字,而是变成了“xing”。这个我猜测是不是训练素材里面把所有的字都给转换了。。。

目前我自己的临时解决办法如下:

wget https://raw.githubusercontent.com/google/sentencepiece/master/src/sentencepiece_model.proto
protoc --python_out=. sentencepiece_model.proto
import sentencepiece_model_pb2 as model

m = model.ModelProto()
m.ParseFromString(open('mengzi_gpt.model', 'rb').read())

for i in m.pieces:
    if i.piece=="xing":
        i.piece="性"
        print("modified")
        break

with open('new.model', 'wb') as f:
    f.write(m.SerializeToString())

望早日修复。

mengzi-gpt-neo-base在huggingface上无法体验,有异常爆出

如题,
错误信息:
Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for 'Langboat/mengzi-gpt-neo-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Langboat/mengzi-gpt-neo-base' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.

推理速度

请问和其他模型比,Mengzi的推理速度怎么样

关于自然语言理解任务的问题

Hi,我想和你们确认个问题。Huggingface的模型在文本分类任务上用BertForSequenceClassification这个类时,其中用到的是bert的pooled_output结果,然后接最终的一层classifier输出。而你们论文中说:“We build the downstream models for the natural language understanding tasks by adding a linear classifier on top of the “[CLS]" token to predict label probabilities.”。这个意思是仅用bert的CLS token,然后直接到最终的classifier是吗?因为我看你们预训练任务中有NSP任务,所以想确认一下文本分类你们具体用的哪种方式。谢谢~

预测性能

Mengzi-BERT-base在cpu和gpu性能怎么样?时延怎么样?qps能达到多少?

我建议用同样的测试脚本重新测一下RoBERTa

CLUE的github上给的各模型的成绩,不少都是偏低的,比如我自己用RoBERTa base可以将CHID做到0.86+,但github上给出的最好结果才0.85+。

所以公平起见,不建议直接引用上面写的成绩,而是用同样的微调脚本重测一遍RoBERTa。

Input prefix of the model mengzi-t5-base

Hi,

I have a question regarding the input of the model mengzi-t5-base. In the original paper of T5, it mentions that "we need to add the task-specific prefix to the original input sequence before feeding it to the model". I wonder that if I want to perform text summarize task with mengzi-t5-base or other downstream tasks, do I need to add some prefix, and what the prefix should be. Thank you very much for your help, looking forward to your reply.

T5-small版模型

感谢开发者开源!这个T5-base模型尺寸还是有点大啊,后面会考虑开源small版本的模型吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.