Git Product home page Git Product logo

probe-sampling's Introduction

Accelerating Greedy Coordinate Gradient via Probe Sampling

This repository contains code for the paper "Accelerating Greedy Coordinate Gradient via Probe Sampling". Below is the workflow of probe sampling.

./

Installation

Our codebase comes from the paper Universal and Transferable Adversarial Attacks on Aligned Language Models (github). The package can be installed by running the following command at the root of this repository:

pip install -e .

Parameters

Beyond the parameters of the original GCG, probe sampling necessitates the specification of two additional key parameters: probe set size and filtered set size, referred to as probe_set and filtered_set, respectively. These can be configured within ./experiments/launch_scripts/ as follows.

--config.probe_set=xx \
--config.filtered_set=xx

In addition, to combine probe sampling with simulated annealing, modify the config.anneal setting in the ./experiments/configs/template.py file as shown below.

config.anneal=True

Models

Target Models

The path to the target model should be specified in ./experiments/configs/, with /DIR representing the directory where the model is stored.

  config.model_paths = [
      "/DIR/Llama2-7b-chat",
  ]
  config.tokenizer_paths = [
      "/DIR/Llama2-7b-chat",
  ]

Draft Model

The location of the draft model is defined in ./experiments/main.py; replace /DIR with the directory path where the model is stored. Additionally, the GPU on which the draft model is placed is determined by the setting params_small.devices.

params_small.model_paths = ["/DIR/GPT2"]
params_small.tokenizer_paths = ["/DIR/GPT2"]
params_small.devices = ['cuda:0']

Experiments

  • To execute specific experiments involving harmful behaviors and strings, execute the code below within the experiments directory. Note that replacing vicuna with llama2 and substituting behaviors with strings will transition to alternative experimental configurations:

    cd experiments/launch_scripts
    bash run_gcg_individual.sh vicuna behaviors

    Running this code will enable you to reproduce the results for the 'Human Strings' dataset and the 'Human Behaviors' dataset under Individual setting as presented in Table 1 of our paper. Specifically, in the context of the Individual Human Behaviors setting, when utilizing probe sampling without simulated annealing under the Llama2-7b-chat model, you will reproduce an ASR of 81.0 and 3.5 times speedup.

  • To perform multiple behaviors experiments, run the following code inside experiments:

    cd experiments/launch_scripts
    bash run_gcg_multiple.sh vicuna

    Running this code will enable you to reproduce the results for the 'Human Behaviors' dataset under Multiple setting as presented in Table 1 of our paper. Specifically, in the context of the Multiple Human Behaviors setting, when utilizing probe sampling under the Llama2-7b-chat model, you will reproduce an ASR of 96.0 and 5.6 times speedup.

probe-sampling's People

Contributors

zhaoyiran924 avatar

Stargazers

Jiahui Li avatar rickyman avatar Amey Varhade avatar Giuseppe Massaro avatar Longxu Dou avatar jiaxiaojun avatar Harshavardhan K avatar Yichuan Mo avatar  avatar zhang cheng avatar  avatar Jeff Carpenter avatar Duan, Keyu avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

yemaedahrav

probe-sampling's Issues

some question about Params

Hi,
In this repo, your code described the size of two set:
```
--config.probe_set=64
--config.filtered_set=32

However, your paper mentioned probe_set should be 32 (the size of B/16, our B is 512),  filtered_set should be 8 (R = 8).

Which one would be usful if I want to get the best result?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.