Thanks for open-sourcing this amazing work. However, I tried to starcoder with half-pr

We have evaluated the model on HumanEval using the <a href="https://github.com/bigcode

hello, I have the same problem this is my code <a target="_blank" rel="noopene

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

`bfloat16` / `float16` StarCoder keeps producing `<|endoftext|>` for HumanEval inputs in greedy decoding about starcoder HOT 7 CLOSED

bigcode-project commented on August 28, 2024 3

`bfloat16` / `float16` StarCoder keeps producing `<|endoftext|>` for HumanEval inputs in greedy decoding

from starcoder.

Comments (7)

loubnabnl commented on August 28, 2024 1

We have evaluated the model on HumanEval using the evaluation harness, Bf16 and Fp16 give close scores to full precision. You can run the evaluation yourself to check the numbers using the parameters we specify in the paper (for example we use top-p sampling instead of greedy decoding and we strip the prompt before generation + do post-processing to remove eos_token and any text after some stop-tokens).

As for the playground, it calls the inference endpoint to generate code which is equivalent to doing model.generate just make sure you use the same parameters as the playground. It uses random sampling so it's normal to not get the same result as the playground:

from starcoder.

ganler commented on August 28, 2024 1

Update: I finally made StarCoder output reasonable code by following https://huggingface.co/spaces/bigcode/bigcode-playground/blob/2009abb380464f89aba1603069e720f031735cce/app.py

and replicate a pretty nice pass@1:

The detailed usage is listed here: https://github.com/evalplus/evalplus/blob/694528a1e933ea1d12559f41cebac1a6ad1100dc/codegen/model.py#L494

Use infilling
Set repetition_penalty=1
let temperature = 1e-2 instead of 0

Note that maybe not all of them are necessary to make "greedy" decoding work but that is the configuration I tried to be feasible. Thanks for creating the great bigcode project!

from starcoder.

rookielyb commented on August 28, 2024

hello, I have the same problem
this is my code

企业微信截图_d75ee79f-5d8b-47b1-8dd2-90f08aedbf16

I predicted 10 times and didn't get one correct result

企业微信截图_0fdde452-8bfa-4f0e-ab56-1e434709f631

But I try to use your api, can get the correct result

企业微信截图_d92508a3-b65c-4886-a2a7-da07b5b8e7f4

企业微信截图_5fe67b3b-5dab-48c4-9f90-124ed0015831

Why is this? I'm having a hard time achieving your results in human eval. Hope to get your reply!

from starcoder.

jithurjacob commented on August 28, 2024

Facing the same issue, and getting good results on HF inference API but not locally.

from starcoder.

ganler commented on August 28, 2024

@loubnabnl Thanks for the reply. After some in-depth debugging I found starcoder tend to work better given a higher temperature and seem not to fit situations for greedy decoding and a very low temperature such as 0.1. I am curious if it is unexpected to run StarCoder under a greedy decoding setting for benchmarking as the results are not quite reasonable ... because for many other models I tried they tend to even perform better pass@1 than that from random sampling. Thanks!

from starcoder.

ganler commented on August 28, 2024

In addition, in the evaluation did you use autoaggressive generation? The playground code seems to use in-filling. I had the same experience with SantaCoder where the AR does not seem to work reasonably but the in-filling mode works fine.

from starcoder.

loubnabnl commented on August 28, 2024

It's great that it works. Btw I run HumanEval on StarCoder with greedy and the score is pretty high by default (this is left-to-right no infilling, the playground doesn't use infilling by default unless you add the <FILL_HERE> token to your prompt).

CLI in evaluation harness:

accelerate launch  main.py   --model bigcode/starcoder   --max_length_generation 512  --tasks humaneval   --n
_samples 1   --batch_size 1   --temperature 0   --do_sample False   --precision bf16   --allow_code_execution   --use_auth_token

Result:

{
  "humaneval": {
    "pass@1": 0.3475609756097561
  },
  "config": {
    "model": "bigcode/starcoder",
    "temperature": 0.0,
    "n_samples": 1
  }
}

from starcoder.

`bfloat16` / `float16` StarCoder keeps producing `<|endoftext|>` for HumanEval inputs in greedy decoding about starcoder HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent