Git Product home page Git Product logo

swe_test's Introduction

Setup

Prereqs

NOTE: There might be other things that were implictly present in my system. I think you have to install conda to test the generated patches of stuff.

Ubuntu (untested for now)

sudo apt update
sudo apt install wget libopenblas-base libomp-dev default-jdk memcached libmemcached-tools

Mac

brew install wget libomp openblas openjdk libmemcached

Both

wget https://repo1.maven.org/maven2/io/anserini/anserini/0.36.1/anserini-0.36.1-fatjar.jar
mkdir anserini-jar
mv anserini-0.36.1-fatjar.jar anserini-jar

Now create a .env file with the following:

OPENAI_API_KEY=<key>

SWE-Bench

git submodule update --init --recursive
pip install -r requirements.txt
pip install ./SWE-bench

Sanity Check 1: Try inference and evaluation on lite oracle dataset with OpenAI

mkdir -p sanity-check
# Generate some inferences on 3 examples (3=300/100).
python SWE-bench/inference/run_api.py --dataset_name_or_path princeton-nlp/SWE-bench_Lite_oracle --model_name_or_path gpt-4-0613 --output_dir ./sanity-check --shard_id 0 --num_shards 100

# Check their accuracy. 1 out of the 3 examples should work correctly.
python SWE-bench/swebench/harness/run_evaluation.py --predictions_path "sanity-check/gpt-4-0613__SWE-bench_Lite_oracle__test__shard-0__num_shards-100.jsonl" --swe_bench_tasks "princeton-nlp/SWE-bench_Lite_oracle" --log_dir "sanity-check" --testbed "sanity-check" --skip_existing --timeout 900 --verbose

Sanity Check 2: Try BM25 on lite oracle dataset.

mkdir -p sanity-check/data
ANSERINI_CLASSPATH=$PWD/anserini-jar python SWE-bench/inference/make_datasets/bm25_retrieval.py --dataset_name_or_path princeton-nlp/SWE-bench_Lite  --shard_id 0 --num_shards 100 --splits test --output_dir sanity-check

python SWE-bench/inference/make_datasets/create_text_dataset.py --dataset_name_or_path princeton-nlp/SWE-bench_Lite --splits test --retrieval_file sanity-check/princeton-nlp__SWE-bench_Lite/file_name_and_contents.retrieval.jsonl --file_source bm25 --output_dir sanity-check/data --shard_id 0 --num_shards 100

swe_test's People

Contributors

amlatyrngom avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.