codefuse-ai / codefuse-devops-eval Goto Github PK

View Code? Open in Web Editor NEW

656.0 656.0 42.0 10.32 MB

Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.

License: Other

Shell 1.09% Python 98.91%

codefuse-devops-eval's People

Contributors

Stargazers

Watchers

codefuse-devops-eval's Issues

您好，请问fcdata-zh-luban和fcdata-zh-codefuse的区别是啥？

您好，打扰了，请问两个评测集的区别是啥啊？两份测试集report的结果分数差异很大？

测试文档优化

是不是可以提供多种模型的prompt模版以便更好降低偏差

en

Feature Request

Develop and open-source a summary benchmark.

Motivation

Dataset Investigation: Conduct research on summary datasets to understand their composition, quality, and applicable scenarios.

Corpus Collection: Focus on the collection of summary corpora to ensure a sufficient and diverse data source to support the construction of the benchmark.

Benchmark Construction: Based on the preliminary research and corpus collection achievements, build a summary benchmark to ensure it comprehensively assesses the performance of summary technologies.

Open-source Benchmark: Make the constructed summary benchmark open-source, allowing the entire community to benefit from it, and improve the transparency and reliability of summary technologies.

zh

功能请求

构建并开源一个summary benchmark。

动机

数据集调研：进行summary数据集的调研，了解现有的summary数据集的构成、质量以及适用场景。
语料收集：专注于summary语料的收集，以保证有足够的、多样化的数据来源来支撑benchmark的构建。
Benchmark构建：依据前期的调研和语料收集成果，构建summary benchmark，确保它能全面评估summary技术的性能。
开源Benchmark：将构建好的summary benchmark开源，让整个社区都能从中受益，提高summary技术的透明度和可靠性。

toollearning数据集是否可以提供？

Integrate with LiteLLM - Evaluate 100+LLMs, 92% faster

Hi @xudafeng @jglee2046
I'm the maintainer of LiteLLM. we allow you to create a proxy server to call 100+ LLMs to make it easier to run benchmark / evals

I'm making this issue because I believe LiteLLM makes it easier for you to run benchmarks and evaluate LLMs (I'd love your feedback if it does not)

Try it here: https://docs.litellm.ai/docs/simple_proxy
https://github.com/BerriAI/litellm

Using LiteLLM Proxy Server

Creating a proxy server

Ollama models

$ litellm --model ollama/llama2 --api_base http://localhost:11434

Hugging Face Models

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

Anthropic

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

Palm

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison

Set api base to proxy

openai.api_base = "http://0.0.0.0:8000"

Using to run an eval on lm harness:

python3 -m lm_eval \
  --model openai-completions \
  --model_args engine=davinci \
  --task crows_pairs_english_age