Git Product home page Git Product logo

swe-bench's Introduction

Fork of SWE-bench

Fork of SWE-bench with modifications to use some of its scripts.

Docker images

# only evaluation environment
yuntongzhang/swe-bench:latest
# additionally with projects setup for other tools
yuntongzhang/swe-bench:experiment

Instructions

To install

First, install miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Create and activate conda environment for the benchmark:

conda env create -f environment.yml
conda activate swe-bench

In some distro, /bin/sh points to dash instead of bash, which can cause issues when source is used in the benchmark code. To change it to bash:

ln -sf /bin/bash /bin/sh

Also install system level packages required by the benchmark subjects: These are important for successfully installing the benchmark subject dependencies, as well as successfully running the benchmark subject tests.

sudo apt install -y libffi-dev python3-pytest libfreetype6-dev libqhull-dev pkg-config texlive cm-super dvipng python-tk ffmpeg imagemagick fontconfig ghostscript inkscape graphviz optipng fonts-comic-neue python3-pikepdf build-essential libssl-dev

sudo apt install ttf-mscorefonts-installer

To set up task instances for other tools

Sometimes you may want to only set up the projects' environments without running any evaluation. This is useful if you want to inspect the particular project states.

To set up all task instances, do:

python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result

You can also use multiple processes for setting up environment. However, note that conda is not thread-safe, and doing this may result in deadlock:

python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --num_processes 16

If you only want to set up a subset of tasks, write the list of tasks into a file <subset_file> and do:

python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --subset_file <subset_file> --num_processes 16

If you only want to write out the setup json files without actually cloning the repos and perform the actual setup, do:

python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --only_dump_files

To evaluate on some task instances

  1. Prepare a prediction file in json. This prediction.json file should contain the model's output in the field 'model_patch', which will be used for evaluation. If just want to evaluate on one instance, you can put only that entry's answer in this file.
  2. Prepare the big json file swe-bench.json that contains all the task instance definitions. This can be downloaded from the original Github repo.
  3. Create directories logs and eval-testbed for storing logs and the temporarily cloned projects.

Run the evaluation script like this:

NOTE: do not overwrite existing testbed directory, as it contains setup for other tools to run.

mkdir eval_logs
mkdir eval_testbed
python harness/run_evaluation.py --predictions_path ../predictions_for_swebench.json --swe_bench_tasks ./data/swe-bench.json --log_dir eval_logs --testbed eval_testbed  --verbose

swe-bench's People

Contributors

john-b-yang avatar carlosejimenez avatar yuntongzhang avatar klieret avatar ofirpress avatar jasongross avatar moresearch avatar crhf avatar rdnfn avatar sunwood-ai-labs avatar ysymyth avatar itaowei avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.