Git Product home page Git Product logo

taco's People

Contributors

bowen92 avatar eltociear avatar rongaoli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

taco's Issues

数据污染问题

你们的工作太棒了! TACO 是目前看到的开源数据集中,最棒的代码生成数据集。

比较好奇你们在制作数据集时,有没有考虑到数据污染问题。在LLM的时代,测试集是否被污染是一个非常重要的参考点。
我在里面论文里没有找到相关的信息。

AttributeError: module 'inspect' has no attribute 'getargspec'. Did you mean: 'getargs'?

When I use compute_metric.py to evaluate the generation results, the console noted "no module named pyext." I installed it using pip and got the following error:

Collecting pyext
  Using cached pyext-0.7.tar.gz (7.8 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "...\AppData\Local\Temp\pip-install-exx9gznl\pyext_28002897deae467da164cbba24ad8613\setup.py", line 6, in <module>
          import pyext
        File "...\AppData\Local\Temp\pip-install-exx9gznl\pyext_28002897deae467da164cbba24ad8613\pyext.py", line 117, in <module>
          oargspec = inspect.getargspec
                     ^^^^^^^^^^^^^^^^^^
      AttributeError: module 'inspect' has no attribute 'getargspec'. Did you mean: 'getargs'?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

After searching online, I found out that this is a Python bug and I'm using Python 3.11 to support OpenAI's models.

Upon further investigation, it appears that this issue doesn't exist in Python 3.8, so I downgraded the Python version to 3.8.

Moreover, Someone mentioned that the problem may be resolved in the second quarter of 2024: https://community.privacyidea.org/t/python-3-11-support/3115/2

I suggest providing some detailed environment setting on the guide page.

How to construct the appropriate output?

I try to evaluate codellama-7b using the easy difficulty of the TACO dataset. But I find that the output code needs to meet certain specifications in order to pass the test cases, like
s = input()\nprint(s.swapcase())
not
def solve(s):\n return s.swapcase()
How to construct the appropriate output?

compute_metric.py 好像有问题

我从test set里抽了20题,然后用compute_metric.py评测了一下数据集自带的solution code, 但大概只有60%-70%能通过评测。我找了一份没通过的代码,人工评估是对的,提交到Codeforces也Accepted了,所以应该是评测这块有问题。另,论文里report的结果是用这个script算出来的吗?

code-llama-7b-python精度对不上

我尝试了test split中easy部分的200题,跑了一遍测得的pass@1只有3左右,和paper里的精度不符。
我已经使用了repo中的prompt和evaluation部分,设置n_samples=1,temperature=0.8。
想求助原因,非常感谢!

题目的难度是如何确定的?/ How is the difficulty level obtained?

请问数据集中题目难度的标注是如何确定的,是基于网站本身的标签还是用户通过率呢?


Hi, thanks for the great work! I am curious about the criteria of the difficulty level annotation in the files. Is it based on the website's own tagging or the user pass rates? Can you share more details on this? Thank you!

更新后的评测框架似乎存在重大bug?

我再次尝试评测了数据集自带的solution code,但似乎所有评测都失败了,check_correctness()返回的所有结果都是-1。我不太了解APPS的测试框架,因此没有进一步排查,你们能够检查一下这个问题吗?

Finetuned Models

Hello! I'm having some trouble reproducing the finetuning with the script – would you mind releasing the trained models so I can verify their evaluation results and build on top of them? Thank you!

specific performance of gpt-4

Thanks for releasing this dataset and all the amazing work you have done! Do you have any data on the specific performance of gpt-4 on the test data set? If so, can you send me a copy?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.