Git Product home page Git Product logo

acr-memorization's Introduction

Rethinking LLM Memorization through the Lens of Adversarial Compression

A compression-based approach to defining and measuring memorization with LLMs.

This repository contains the code needed to measure memorization in LLMs using input-output compression. This method is presented in our paper. This repo was developed collaboratively by Avi Schwarzschild, Zhili Feng, and Pratyush Maini at Carnegie Mellon University in 2024. This code is particularly useful for reproducing the results in our paper on the topic.

Getting Started

Requirements

This code was developed and tested with Python 3.10.4. After cloning the repository, you can install the requirements and run our experiments.

To install requirements:

$ pip install -r requirements.txt

Memorization Measurements

Try computing the compression ratio of the first sample in the Famous Quotes dataset with the following command.

% python prompt-minimization-main.py dataset=famous_quotes data_idx=0

Logging Style and Data Analysis

outputs
└── happy-Melissa
        ├── .hydra
        │   ├── config.yaml
        │   ├── hydra.yaml
        │   └── overrides.yaml
        ├── results.json
        └── log.log

These output folders can be parsed and analyzed as a DataFrame using Pandas. Open the analyze_results notebook to process experiments or run make_table_of_results.py to see parse the output folder. The notebook will load all the results into a Pandas DataFrame and then it can be edited (for example by adding cells) to to whatever analysis is needed. The script is a short Python script that will show you the set of experiment names, a table with every entry, and a summary table aggregating across (model, dataset, optimizer) groups. It can also be used with the flag --experiment_name <experiment-name-0> <experiment-name-1>... to aggregate results from any number of experiments.

Optimizing Prompts

We include a simple script of optimizing input tokens to elicit a targeted output from an LLM. This is only one step in finding minimal prompts, but it may be helpful to see how prompt optimization can be done in general.

% python example_script.py

Contributing

We encourage anyone using the code to reach out to us directly and open issues and pull requests with questions and improvements!

Citing Our Work

@misc{schwarzschild2024rethinking,
      title={Rethinking LLM Memorization through the Lens of Adversarial Compression}, 
      author={Avi Schwarzschild and Zhili Feng and Pratyush Maini and Zachary C. Lipton and J. Zico Kolter},
      year={2024},
      eprint={2404.15146},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

acr-memorization's People

Contributors

aks2203 avatar pratyushmaini avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.