flagopen / flageval Goto Github PK
View Code? Open in Web Editor NEWFlagEval is an evaluation toolkit for AI large foundation models.
License: Apache License 2.0
FlagEval is an evaluation toolkit for AI large foundation models.
License: Apache License 2.0
Where to find the shot configuration for each benchmark? @xuanricheng
The code is a bit hard to understand:
@xuanricheng
请问什么时候能发布评估语言模型的框架,针对自然语言处理任务
Running command in evaluate.md
python evaluate.py --datasets=cifar10,cifar100 --model_name=AltCLIP-XLMR-L
File "C:\anaconda3\lib\site-packages\flagai\model\mm\AltCLIP.py", line 83, in init
self.text_config = STUDENT_CONFIG_DICTkwargs['text_config']['model_type']
KeyError: 'text_config'
According to the source code of class AltCLIPConfig, the text_config should be passed by **kwargs. Actually nothing is passed. How can I do it?
As we know, ResNet50 has a total of 25,636,712 parameters. Of these, 25,583,592 are trainable and 53,120 are non-trainable. The model has 177 layers.
Check this link for further explanation.
不知道这块评测具体细节,但是大部分模型在中文客观题上chat版比base版有提升,反而千问是断崖式下滑(0.596-》0.070),从结果上来看有点异常。跟opencompass那边的评测的结果也有较大出入。
建议还是再确认一下评测细节?是不是prompt啥的有点问题
Such as video comprehension, generation, ...
from cached_property import cached_property
这个是不是错误的,应该是:
from functools import cached_property
吧?
请问Gaokao2023 V1.0 评测集在哪里可以获取?
为什么我算出平均值和榜单给的不一样?有甲醛吗?权重是多少?
如题。现在的代码看起来只有构造的基本的参数请求类。
感觉划分到选择问答
也make sense。。
python的main函数的写法:
def main(): cli()
不应该是:
if __name__ == "__main__": cli()
吗?
你好,文生图模型评估方法有进展吗,比如只针对某一类的生成图,例如,生成人的
请问,排行榜中的小数是否是得分?
ChatGLM-6B在中文选择问答Chinese_MMLU数据集下的得分是0.212,是否可以理解为,满分100分的话,得分为21.2分?也就是说,100道题,只答对了21道?
请问如果想在本地离线进行,对自然语言模型的评价,可以用flageval-serving 模块,在本地进行吗?
还是说离线测试,必须把模型和代码上传到flageval平台?https://flageval.baai.ac.cn/#/rule?m=2
你好,我预装了FlagEval. 它应该是没有问题的。那我该如何使用呢? 前期是先去玩AltCLIP这类模型?https://github.com/FlagOpen/FlagEval/blob/master/imageEval/README.md 从这份readme来看,这份评估工具是专为多模态模型AltCLIP之类准备的吗?它不适用于Aquila。我这么理解有出入吗?
I write a test using ChatGLM2, and run the server, give an input of "你是谁". And I get a reponse with a bunch of unicode.
Is it ok for your evaluation?
Output is :
{
"completions": [
{
"logprobs": [],
"text": "\u4f60\u662f\u8c01?\n\n\u6211\u662f ChatGLM,\u662f\u6e05\u534e\u5927\u5b66KEG\u5b9e\u9a8c\u5ba4\u548c\u667a\u8c31AI\u516c\u53f8\u5171\u540c\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u7684\u4efb\u52a1\u662f\u670d\u52a1\u5e76\u5e2e\u52a9\u4eba\u7c7b,\u4f46\u6211\u5e76\u4e0d\u662f\u4e00\u4e2a\u771f\u5b9e\u7684\u4eba\u3002",
"tokens": "\u4f60\u662f\u8c01?\n\n\u6211\u662f ChatGLM,\u662f\u6e05\u534e\u5927\u5b66KEG\u5b9e\u9a8c\u5ba4\u548c\u667a\u8c31AI\u516c\u53f8\u5171\u540c\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u7684\u4efb\u52a1\u662f\u670d\u52a1\u5e76\u5e2e\u52a9\u4eba\u7c7b,\u4f46\u6211\u5e76\u4e0d\u662f\u4e00\u4e2a\u771f\u5b9e\u7684\u4eba\u3002",
"top_logprobs_dicts": []
}
],
"input_length": 0,
"model_info": "",
"status": 200
}
在知乎文章中您提到,"我们利用 ImageEval-prompt 对知名文生图模型进行评测,针对每个Prompt,让每个模型生成8张图片,标注者在未看到Prompt的情况下对8张图片进行排序,并选择前三张排名较高的图片,最后标注这三张图片是否正确表达了Prompt的关键信息。"
在最后一步,即“标注这三张图片是否正确表达了Prompt的关键信息”,这里的具体操作是什么呢?
例如,对于prompt“穿着华丽的衣服的女士坐在椅子上,素描”,其颜色,性别,五官的标注分别为0,1,2,那么评测人员是否只需要根据标注维度(无视prompt)判断生成的图片是否符合各个维度的标注结果(0未出现,1简单考察,2复杂考察),还是评测人员同时可以看到标注与prompt,再根据标注回到prompt判断图片表达是否准确?例如,标注人员已知性别标注为1,那么根据prompt需要自行判断生成图片内容是否符合prompt中描述的“女士”一项。如果评测人员采取第二种方法,那么对于标注1简短考察与2复杂考察,它们在评测流程中的区别是什么?
最后,在“标注维度说明”一节中,您给出了每一个子维度的具体标注,请问数据集中是否每条数据的每个子维度均有具体标注供评测人员参考?还是目前开源的数据集已经是所有内容了?
感谢您的回答!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.