Git Product home page Git Product logo

anon-sep's Introduction

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

XXXX-1*, XXXX-2*, XXXX-3*, XXXX-4, XXXX-5, XXXX-6

| Abstract | Citation | Requirements | Installation | Tutorial | Codebase

XXXX-9 Python 3.11 PyTorch License Maintenance

Abstract

We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

Citation

@misc{XXXX-8,
      title={Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs}, 
      author={XXXX-1 and XXXX-2 and XXXX-3 and XXXX-4 and XXXX-5 and XXXX-6},
      year={2024},
      eprint={XXXX-7},
      archivePrefix={XXXX-9},
      primaryClass={cs.CL},
      url={XXXX-7}, 
}

Requirements

Hardware Dependencies

To obtain the hidden states of the large language model, you are required to do forward passes through the model on the relevant prompts/answers. Our code makes use of GPUs doing inference at FP16.

Common memory requirements per model size:

  • 7B models: ~24 GB
  • 13B models: ~48GB
  • 70B model: ~160GB

Software Dependencies

Dependecies for this code include Python 3.11 and PyTorch 2.1.

In environment_export.yaml, we list the precise versions for all Python packages.

Installation

To install Python with all necessary dependencies, we recommend you use conda.

We refer to https://conda.io/ for an installation guide.

After installing conda, you can set up and activate the conda environment by executing the following commands at the root folder of this repository:

conda-env update -f sep_enviroment.yaml
conda activate se_probes

Our experiments rely on Weights & Biases to log results. You may need to log in with your wandb API key upon initial execution.

Our experiments rely on HuggingFace for all LLM models and most of the datasets. Set the environment variable HUGGING_FACE_HUB_TOKEN to the token associated with your Hugging Face account. For Llama models, apply for access to use the official repository of Meta's LLaMa-2 models.

Our experiments with sentence-length generation use GPT models from the OpenAI API. Please set the environment variable OPENAI_API_KEY to your OpenAI API key in order to use these models. Costs for reproducing our results vary depending on experiment configuration, but, without any guarantee, should lie somewhere between 10 and 100 USD.

For almost all tasks, the dataset is downloaded automatically from HuggingFace Datasets library upon first execution. Only for bioasq, data needs to be downloaded manually.

Tutorial

Generate Semantic Entropy Probes Dataset

Execute

python generate_answers.py --model_name=Llama-2-7b-chat --dataset=trivia_qa

to reproduce results for short-phrase generation with LLaMa-2 Chat (7B) on the TriviaQA dataset.

The expected runtime of this demo is 1 hour using an A100 GPU, 24 cores of a Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz, and 192 GB of RAM. Runtime may be longer upon first execution, as models need to be downloaded first.

Note down the wandb id assigned to your demo run.

To obtain a barplot similar to those of the paper, open the the iPython notebook in semantic_entropy_probes/train-latent-probe.ipynb, populate wandb_id with the id of your demo run, and execute all cells.

Training Semantic Entropy Probes

We retrieve saved model hidden states on two token positions (TBG, SLT) with which we train linear probes to predict model semantic uncertainty and further predict correctness.

We have released (accuracy and SE) probe weights (seed 0) for all models in probe_weights folder. It contains the trained Logistic Regression model coefficients on each dataset with descriptions.

See this notebook for step-by-step guide on training SEPs, which also contains handy tools for data loading, visualizations, and computing baselines.

Codebase

Repository Structure

  • Code to generate the semantic entropy is contained in the semantic uncertainty folder. With in this, a standard SE generation run executes the following three scripts in order:

    1. generate_answers.py: Sample responses (and their likelihods/hidden states) from the models for the questions.
    2. compute_uncertainties.py: Compute uncertainty metrics given responses.
    3. analyze_results.py: Compute aggregate performance metrics.
  • Once this is calculated, you can use the train_latent-probe.ipynb notebook, contained in the semantic_entropy_probes folder to train your SEPs.

Reproducing the Experiments

To reproduce the experiments of the paper, one just needs to run the above demo for the various combinations of models and datasets.

The simplest way is to execute slurm/run.sh (with commands to generate both short-form and long-form answers to all datasets) if you are using slurm.

Or you may directly execute iteratively

python generate_answers.py --model_name=$MODEL --dataset=$DATASET $EXTRA_CFG

where

  • $MODEL is one of: [Llama-2-7b, Llama-2-13b, Llama-2-70b, Llama-2-7b-chat, Llama-2-13b-chat, Llama-2-70b-chat, falcon-7b, falcon-40b, falcon-7b-instruct, falcon-40b-instruct, Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1, Phi-3-mini-128k-instruct],
  • $DATASET is one of [trivia_qa, squad, med_qa, bioasq, record, nq, svamp],
  • and $EXTRA_CFG is empty for short-phrase generation and for sentence-length generation, EXTRA_CFG=--num_few_shot=0 --model_max_new_tokens=100 --brief_prompt=chat --metric=llm_gpt-4 --entailment_model=gpt-3.5 --no-compute_accuracy_at_all_temps.

The results for any run can be obtained by passing their wandb_id to an evaluation notebook identical to the demonstration in semantic_entropy_probes/train-latent-probe.ipynb.

anon-sep's People

Contributors

jthh avatar mtrazzak avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.