codefuse-ai / codefuse-devops-eval Goto Github PK
View Code? Open in Web Editor NEWIndustrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.
License: Other
Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.
License: Other
比如deepseek-coder 33B?
您好,打扰了,请问两个评测集的区别是啥啊?两份测试集report的结果分数差异很大?
This work is very exciting!
Develop and open-source a summary benchmark.
Dataset Investigation: Conduct research on summary datasets to understand their composition, quality, and applicable scenarios.
Corpus Collection: Focus on the collection of summary corpora to ensure a sufficient and diverse data source to support the construction of the benchmark.
Benchmark Construction: Based on the preliminary research and corpus collection achievements, build a summary benchmark to ensure it comprehensively assesses the performance of summary technologies.
Open-source Benchmark: Make the constructed summary benchmark open-source, allowing the entire community to benefit from it, and improve the transparency and reliability of summary technologies.
构建并开源一个summary benchmark。
数据集调研:进行summary数据集的调研,了解现有的summary数据集的构成、质量以及适用场景。
语料收集:专注于summary语料的收集,以保证有足够的、多样化的数据来源来支撑benchmark的构建。
Benchmark构建:依据前期的调研和语料收集成果,构建summary benchmark,确保它能全面评估summary技术的性能。
开源Benchmark:将构建好的summary benchmark开源,让整个社区都能从中受益,提高summary技术的透明度和可靠性。
Hi @xudafeng @jglee2046
I'm the maintainer of LiteLLM. we allow you to create a proxy server to call 100+ LLMs to make it easier to run benchmark / evals
I'm making this issue because I believe LiteLLM makes it easier for you to run benchmarks and evaluate LLMs (I'd love your feedback if it does not)
Try it here: https://docs.litellm.ai/docs/simple_proxy
https://github.com/BerriAI/litellm
Ollama models
$ litellm --model ollama/llama2 --api_base http://localhost:11434
Hugging Face Models
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1
Anthropic
$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1
Palm
$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison
openai.api_base = "http://0.0.0.0:8000"
python3 -m lm_eval \
--model openai-completions \
--model_args engine=davinci \
--task crows_pairs_english_age
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.