Git Product home page Git Product logo

model-editing-canonical-examples's Introduction

.

Model Editing with Canonical Examples

This codebase provides the code and datasets for replicating the paper Model Editing with Canonical Examples.

The goal of model editing with canonical examples is to take as input simple examples of desirable or undesirable behaviors, and update a language model to behave better with respect to those examples in general without otherwise changing anything. The setting draws from model editing and out-of-distribution evaluation, and is intended to be a general testbed for making targeted improvements to language models.

Getting started

Installing dependencies

I suggest making a new conda environment, installing pip, and then running

pip install -r requirements.txt

And if you're using a conda env, you can conda install pip first.

Running an experiment

To run a finetune-and-evaluate experiment, you can execute:

python ft_experiment.py configs/example_full.yaml

Each experiment -- training a model on some data, evaluating it, reporting results -- is specified by a yaml config file. I suggest taking a look at configs/example_full.yaml to see some of the fields. All experiments in the paper have corresponding yaml configs.

Code layout

Here are some important files and their functions:

  • ft_experiment.py sets up and runs an intervene-and-evaluate experiment.
  • evaluate.py has classes for various kinds of evaluation, like scoring whole strings or scoring suffixes
  • importance.py has classes for estimating sense importance, like the epsilon-gradient method.
  • norm_only_model.py has classes for Backpack variants that have specific finetuning behavior. It should be renamed.
  • sense_finetuning.py has scripts for determining which senses to finetune and returning a model with that behavior, using importance.py and norm_only_model.py.
  • trainer.py has the finetuning loop.
  • utils.py has utilities for loading models or data, certain scoring functions, etc.

Replicating paper experiments

These are the recipes to replicate the experiments in the paper. There are a few thousand training and evaluation runs that went into the paper, so I suggest parallelizing this on slurm or another cluster management software.

Backpack experiments

  1. mkdir backpackresults to store the results of the val experiments.
  2. Run experiments for all configs in configs/*/backpack_sweep/*.yaml (or run configs/*/backpack_sweep/make_sweep.py to make configs, and then run those configs.)
  3. Run python plotting/report_val_backpack.py, which makes backpack_best.jsonl.
  4. Move mv backpack_best.jsonl configs/test.
  5. mv configs/test and then run python make_test.py to build the test configs from the best-performing val configs.
  6. Make a new directory, mkdir testbackpackresults and run all configs/test/test-stanfordnlp*.yaml configs.
  7. Run python plotting/report_test_backpack.py testbackpackresults backpackresults which will print out the results table LaTeX. Same for python plotting/report_test_backpack_hard_negs.py testbackpackresults backpackresults for hard negatives.

Pythia experiments

Same process as for Backpacks (replacing scripts and configs with Pythia paths) , but once you run report_test_pythia.py, bring initial.jsonl and results.jsonl to plotting/workshop-pythia and run plot_pythia_all.py and plot_pythia_all_hard_negs.py to make the plots.

For MEMIT results:

  1. As done in memit/notebooks/make_sweep.py, for each relevant configuration of model_name and hyperparameters, run experiments via
python3 run_memit.py {model_name} --v_num_grad_steps 20 --clamp_norm_factor {clamp_norm_factor} --mom2_update_weight {mom2_update_weight} 
--kl_factor {kl_factor} 
--dataset_names {dataset_name} --subject_types true_subject,prefix_subject 
--log_dir {log_dir}'
  1. Identify the best val hyperparameter configs and create test scripts via memit/notebooks/load_val_results.ipynb.
  2. Run the test scripts and get results via memit/notebooks/load_test_results.ipynb, which generates memit_test_results.json and memit_test_results.noedit.json.

GPT-J experiments

Run the GPT-J validation and test experiments like for Pythia. Once you've run the Backpack experiments and the GPT-J experiments, submit all the Backpack senses test configs, but with the improve_llama_experiment.py script instead of ft_experiment.py. This will make results files; copy them as cat testbackpackresults/*.intervention~ > plotting/workshop-gptj/gptj_intervene_results.jsonl. Now go to plotting/workshop-gptj and run plot_gptj_test.py plot_gptj_test_hard_negs.py.

Making new datasets

If you wanted to run existing intervention methods on a new dataset, you should replace:

training:
  dataset_path: <here>

to a path with a file in jsonlines format, where each line has the structure:

{'prefix': <text> 'suffix': <text>}

where the prefix specifies some context and then the suffix specifies the {desired,undesired} behavior in that context.

Next, you should replace:

validation:
  degredation_targeted_path: <here>
  intervention_eval_path: <here>

where intervention_eval_path contains the path to the generalization evaluation set -- for example, mentions of countries' capitals in naturalistic text, and degredation_targeted_path contains the path to a dataset just like the intervention_eval_path, but the prefix and suffix are joined into a single text variable; this is intended to test whether LM predictions degrade a lot when the target senses appear (i.e., where we actually took gradients!)

And now you're good! You can run your new expts.

Data

The data for this work has files for canonical examples separate from those for out-of-distribituion evaluation, and also separates validation files (i.e., a pair of train and evaluation files for method development) from test files (i.e., a pair of train and evaluation files for final reporting.) In our experiments, all hyperparameters are chosen on the validation pairs, and final results are reproted by porting those hyperparameters over to the test pairs. There are 6 datasets:

Country-capital

Validation

  • Canonical examples: data/country_capital/split/country_capital_fixed-val.jsonl
  • Evaluation examples: data/country_capital/split/country_capital_clear_eval-val.jsonl

Test

  • Canonical examples: data/country_capital/split/country_capital_fixed-test.jsonl
  • Evaluation examples: data/country_capital/split/country_capital_clear_eval-test.jsonl

Company-CEO

Validation

  • Canonical examples: data/company_ceo/split/company_ceo_train-val.jsonl
  • Evaluation examples data/company_ceo/split/company_ceo_eval_clear-val.jsonl

Test

  • Canonical examples: data/company_ceo/split/company_ceo_train-test.jsonl
  • Evaluation examples data/company_ceo/split/company_ceo_eval_clear-test.jsonl

Gender Bias in Pronouns

Validation

  • Canonical examples: data/pronoun_gender_bias//split//pronoun_gender_bias_train-val.jsonl
  • Evaluation examples: data/pronoun_gender_bias//split//pronoun_gender_bias_eval-val.jsonl

Test

  • Canonical examples: data/pronoun_gender_bias//split//pronoun_gender_bias_train-test.jsonl
  • Evaluation examples: data/pronoun_gender_bias//split//pronoun_gender_bias_eval-test.jsonl

Temporal Updating

Validation

  • Canonical examples: data/temporal/split/temporal_train-val.jsonl
  • Evaluation examples: data/temporal/split/temporal_eval_clear-val.jsonl

Test

  • Canonical examples: data/temporal/split/temporal_train-test.jsonl
  • Evaluation examples: data/temporal/split/temporal_eval_clear-test.jsonl

Stereoset destereotyping

Validation

  • Canonical examples: data/stereoset/split/stereoset_train-val.jsonl
  • Evaluation examples: data/stereoset/split/stereoset_eval_clear-val.jsonl

Test

  • Canonical examples: data/stereoset/split/stereoset_train-test.jsonl
  • Evaluation examples: data/stereoset/split/stereoset_eval_clear-test.jsonl

Hard syntax -- verb conjugation

Validation

  • Canonical examples: data/verb_conjugation/split/verb_conjugation_train-val.jsonl
  • Evaluation examples: data/verb_conjugation/split/verb_conjugation_eval-val.jsonl

Test

  • Canonical examples: data/verb_conjugation/split/verb_conjugation_train-test.jsonl
  • Evaluation examples: data/verb_conjugation/split/verb_conjugation_eval-test.jsonl

model-editing-canonical-examples's People

Contributors

john-hewitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

model-editing-canonical-examples's Issues

requirements.txt points to local file references and not standard version specification

Hey, trying to clone and run this repo but noticed that several packages in the requirements.txt are listed with paths to local files (i think).

These are specified using the @ symbol followed by a URI (uniform resource identifier), which points to a local file path.
Would it be possible to convert the local file references to standard version specifications? Thank you!

From:

boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1677499911949/work
cffi @ file:///croot/cffi_1670423208954/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work

to something like this:

brotlipy==0.7.0
certifi==2023.7.22
jsonpointer==2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.