Git Product home page Git Product logo

leetcode-hard-gym's Introduction

Leetcode-Hard Gym

A gym to evaluate superhuman programming agents built on top of OpenAI's gym.

Written by: Beck Labash

Supports:

  • python
  • c
  • c#
  • java
  • python
  • javascript
  • ruby
  • swift
  • go
  • scala
  • kotlin
  • rust
  • php
  • typescript
  • racket
  • erlang
  • elixir
  • dart
  • mysql

Leaderboard for Leetcode Hard (Python): Pass@1

  • OpenAI's GPT-4: 10.7 (source)
  • OpenAI's Codex: 3.6 (source)
  • OpenAI's GPT-3.5: 0.0 (source)
  • Reflexion + GPT-4: 15.0 (source)

Setup:

  • pip install requirements
  • Set environment variable LEETCODE_SESSION to the cookie LEETCODE_SESSION from a signed-in Leetcode session

Example usage:

We can load the code-snippet annotated dataset like so:

import pandas as pd
data = pd.read_csv("path/to/repo/leetcode_dataset/data/with_snippets/leetcode_hard_with_snippets.csv")
row = data.iloc[0]

Then we can instantiate a submission environment ...

from leetcode_env.environment import LeetCodeEnv

env = LeetCodeEnv()

... and build a submission using a row from the dataset ...

from leetcode_env.leetcode_types import LeetCodeSubmission

code = """
class Solution:
    def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
        return 1
"""
lang = "python3"
question_id = row['id']
question_slug = row['title_slug']

sub = LeetCodeSubmission(code=code,
                         lang=lang,
                         question_id=question_id,
                         question_slug=question_slug
                         timeout = 5)

Finally, we can step through the environment with the submission:

status, reward, done, submission_result = env.step(sub)
print(status, reward, done, submission_result)
# Wrong Answer
# False
# False
# {'status_code': 11, 'lang': 'python3', 'run_success': True, 'status_runtime': 'N/A', 'memory': 14160000, 'question_id': '4', 'elapsed_time': 105, 'compare_result': '00010000000...00000000001000', 'code_output': '1.00000', 'std_output': '', 'last_testcase': '[1,3]\n[2]', 'expected_output': '2.00000', 'task_finish_time': 1680132323596, 'total_correct': 6, 'total_testcases': 2094, 'runtime_percentile': None, 'status_memory': 'N/A', 'memory_percentile': None, 'pretty_lang': 'Python3', 'submission_id': '924506780', 'input_formatted': '[1,3], [2]', 'input': '[1,3]\n[2]', 'status_msg': 'Wrong Answer', 'state': 'SUCCESS'}

Note: compare result was shortened here, it contains a sequence of booleans indicating if a test was passed

Cite

This benchmark was introduced in the following paper:

@misc{shinn2023reflexion,
      title={Reflexion: Language Agents with Verbal Reinforcement Learning}, 
      author={Noah Shinn and Federico Cassano and Beck Labash and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao},
      year={2023},
      eprint={2303.11366},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

leetcode-hard-gym's People

Contributors

becklabs avatar noahshinn024 avatar noahshinn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.