abacaj / code-eval Goto Github PK
View Code? Open in Web Editor NEWRun evaluation on LLMs using human-eval benchmark
License: MIT License
Run evaluation on LLMs using human-eval benchmark
License: MIT License
@abacaj My environment is NVIDIA TX2, when use the package codecarbon
to get information of GPU,but it can not find GPU:
[codecarbon INFO @ 21:03:55] [setup] RAM Tracking...
[codecarbon INFO @ 21:03:55] [setup] GPU Tracking...
[codecarbon INFO @ 21:03:55] No GPU found.
[codecarbon INFO @ 21:03:55] [setup] CPU Tracking...
[codecarbon WARNING @ 21:03:55] No CPU tracking mode found. Falling back on CPU constant mode.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[codecarbon WARNING @ 21:03:55] We saw that you have a ARMv8 Processor rev 1 (v8l) but we don't know it. Please contact us.
[codecarbon INFO @ 21:03:55] CPU Model on constant consumption mode: ARMv8 Processor rev 1 (v8l)
[codecarbon INFO @ 21:03:55] >>> Tracker's metadata:
[codecarbon INFO @ 21:03:55] Platform system: Linux-5.10.104-tegra-aarch64-with-glibc2.17
[codecarbon INFO @ 21:03:55] Python version: 3.8.13
[codecarbon INFO @ 21:03:55] CodeCarbon version: 2.3.4
[codecarbon INFO @ 21:03:55] Available RAM : 6.329 GB
[codecarbon INFO @ 21:03:55] CPU count: 6
[codecarbon INFO @ 21:03:55] CPU model: ARMv8 Processor rev 1 (v8l)
[codecarbon INFO @ 21:03:55] GPU count: None
[codecarbon INFO @ 21:03:55] GPU model: None
But the result of torch.cuda.is_available()
is true, so i want to know if ``` codecarbon`` could suport the facility TX2? Looking forward to your reply.
Why am I getting low scores on llama-2-13b, pass@1: 3.05%, pass@10: 19.51%, are you applying any other fine prompts to this setup or are the scores related to batch decoding, my setup is such that I need to generate the samples sequentially and can't perform batch decoding.
@abacaj after evaluating the llama2-7b. i only get a file eval.json:
so how can i get the result following:
I'm keeping https://github.com/ErikBjare/are-copilots-local-yet up-to-date, and would love to see some codellama numbers given it's now SOTA :)
建议支持 https://github.com/THUDM/CodeGeeX2 刚刚发布,根据公布1@pass数据达到了35.9
I got only 9.7% for llama2-7B-chat on human-eval using your script
{'pass@1': 0.0975609756097561}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.