langboat / mengzi Goto Github PK
View Code? Open in Web Editor NEWMengzi Pretrained Models
License: Apache License 2.0
Mengzi Pretrained Models
License: Apache License 2.0
比如官网Demo里 如何生成营销文案通过运用输入,标签,和知识图谱的例子
https://github.com/Langboat/Mengzi/blob/main/Mengzi-Oscar.md
在上面md中,使用预训练 X152-C4 模型抽取图片特征时,输入数据的格式是?
DATA_DIR <path of image feature>
目前只知道输入文件为tsv文件,但不知道tsv文件的具体内容格式,是否可以给个demo输入数据?
想问一下T5模型中词汇表里面的浮点数代表什么,每个词后面都有一个浮点数
你好,请问为什么mengzi-t5-base的词表里的括号和逗号是英文符号呢,不应该是中文么
为什么Mengzi-T5-base-MT的模型大小只有Mengzi-T5-base的一半,加载模型再保存以后,又恢复和base相同的大小
您好,多模态oscar模型,在acc-icc数据集进行推理的时候,采用语句是:
python -m torch.distributed.launch --nproc_per_node=8 oscar/run_captioning.py
--data_dir
--do_test --test_yaml test_ch.yaml
--num_beams 5 --per_gpu_eval_batch_size 128 --max_gen_length 20
--eval_model_dir
请问test_ch.yaml文件是位于哪里呢
感谢贡献这么优秀的预训练模型。方便的话,能否告知Mengzi-T5-base的预训练任务是denoising auto-encoding (DAE)还是预测下一段文本(LM)?如果是DAE的话,用了什么noise呢?Token Infilling和Sentence Permutation之类的。
文本生成落地怎么做的
我注意到技术报告中2.1节提到:
We limit the length of sentences in each batch to up to 512 tokens, and the batch size is 128.
这一段后面又提到:
The batch sizes for the two stages are 16384 and 32768, respectively
请问究竟batch size究竟是哪个呢?是否前一个是number of sequences,后面一个是number of tokens?还是由于使用了LAMB所以能支持这么大的batch size?LAMB的paper用的是32868。
请问训练base和large模型时,学习率和warmup等分别是怎么设置的?
我在你们paper上看的mengzi-bert-large的惊人表现,但是我发现你们好像并没有发布出来,我想问一下它会被放出来吗
Hi,
Thanks for sharing this awesome work.
According to the Figure 2. of your paper, you incorporate knowledge graph in marketing copywriting task,
but it seems there is no further explanation about this.
Could you please explain more about this method?
请问还可以加入微信讨论群吗?
hello langboat, thanks for sharing the good work.
Regarding the automatically generated marketing copy in the paper
Given the input title and keywords, the models are required to generate a corresponding descriptive passage
What is the input of the model?
Is it in the form of [cls] title [sep] [keywords1,keywords2,keywords3,keywords4] [sep] [kg11,kg12,kg13] [kg21,kg22,kg23]?
开发者您好!论文中说Mengzi-BERT-base在CLUE的9项下游任务中超过了RoBERTa、BERT等baseline,我有几个问题想请教您一下:
① 请问在下游任务训练中,你们使用的硬件平台配置是多少呢?例如显卡配置、CUDA版本等。
② 而且,方便透露下游任务训练中更具体的参数设置吗?例如优化器的参数配置、warmup的设置、模型初始化的seed值、下游任务中是否使用了fp16等。
③ 刚刚看到FAQ中说不考虑开放training代码,请问Mengzi-BERT-base的下游任务训练代码也不会考虑开放吗?
例句:“Linux ⁇ 能和Windows相比,其支持的 ⁇ 能较低,但是 ⁇ 能很低。”
这里的“性”字变成了问号。经过我的排查,我发现vocab里面没有这个文字,而是变成了“xing”。这个我猜测是不是训练素材里面把所有的字都给转换了。。。
目前我自己的临时解决办法如下:
wget https://raw.githubusercontent.com/google/sentencepiece/master/src/sentencepiece_model.proto
protoc --python_out=. sentencepiece_model.proto
import sentencepiece_model_pb2 as model
m = model.ModelProto()
m.ParseFromString(open('mengzi_gpt.model', 'rb').read())
for i in m.pieces:
if i.piece=="xing":
i.piece="性"
print("modified")
break
with open('new.model', 'wb') as f:
f.write(m.SerializeToString())
望早日修复。
如题,
错误信息:
Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for 'Langboat/mengzi-gpt-neo-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Langboat/mengzi-gpt-neo-base' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.
缺少其他的预训练任务感觉会破坏模型效果
请问和其他模型比,Mengzi的推理速度怎么样
请问对token进行裁剪的后得到的bloom-zh,是否进行了finetune。
https://huggingface.co/Langboat/mengzi-oscar-base-caption
请问想指定一张图片进行推理,具体怎么做?
Hi,我想和你们确认个问题。Huggingface的模型在文本分类任务上用BertForSequenceClassification这个类时,其中用到的是bert的pooled_output结果,然后接最终的一层classifier输出。而你们论文中说:“We build the downstream models for the natural language understanding tasks by adding a linear classifier on top of the “[CLS]" token to predict label probabilities.”。这个意思是仅用bert的CLS token,然后直接到最终的classifier是吗?因为我看你们预训练任务中有NSP任务,所以想确认一下文本分类你们具体用的哪种方式。谢谢~
Mengzi-BERT-base在cpu和gpu性能怎么样?时延怎么样?qps能达到多少?
CLUE的github上给的各模型的成绩,不少都是偏低的,比如我自己用RoBERTa base可以将CHID做到0.86+,但github上给出的最好结果才0.85+。
所以公平起见,不建议直接引用上面写的成绩,而是用同样的微调脚本重测一遍RoBERTa。
在一些对话数据集上finetune后,适合做多轮闲聊任务吗?
Hi,
I have a question regarding the input of the model mengzi-t5-base. In the original paper of T5, it mentions that "we need to add the task-specific prefix to the original input sequence before feeding it to the model". I wonder that if I want to perform text summarize task with mengzi-t5-base or other downstream tasks, do I need to add some prefix, and what the prefix should be. Thank you very much for your help, looking forward to your reply.
感谢开发者开源!这个T5-base模型尺寸还是有点大啊,后面会考虑开源small版本的模型吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.