Git Product home page Git Product logo

model-written-evals's Introduction

Model Written Evals for generating Inverse Scaling effect datasets

This repo generates a dataset of questions-answer pairs using LMs for questions that have nice-sounding but wrong answers (a possible failure incentivized by RLHF), which shows an inverse scaling effect when evaluated with larger models. This project is inspired by 'Discovering Language Model Behaviors with Model-Written Evaluations' by Perez et. al The model written evaluations set consists of input-output pairs { $(x_i, y_i)_{i=1...n} | x \in Questions, y \in ['Yes', 'No']$ }, where $y_1,...,y_n$ are drawn from the finite set of possible answer labels [‘Yes’, ‘No’].

This model written eval set generation involved the following steps:

  1. Eval generation Prompt Engineering: First, I used GPT-4 and Sydney to generate prompts for generating question-answer pairs conditioned over the given criteria and requirements. Then, I merged those prompts, and loomed over them to curate the final prompt Prompts File
  2. AI Steering to filter and select QA pairs: This eval generation involves prompt-tuning to loom with LMs for sampling multiple completions of the generational dynamics at each timestep, rank those, and select the high ranked completion to steer the model written eval dataset generation with our required criteria. $$\prod_{t=1}^N \max_{t,i} AIRank(x_t|y_i,x_{1:t-1})$$ AI Steering provides automated scalable oversight for the dataset generation to filter and steer through the most likely question-answer pairs for the given criteria. Dataset generation and steering has been performed using gpt-3.5-turbo with temperature 1.0 for QA pair generation and temperature 0.0 for AI steering. Screenshot 2023-05-08 at 9 44 56 am
  3. Dataset visualisation: With just prompt-tuning and AI steering of generational dynamics, the LM was able to generate a class balanced dataset having 100 questions with the label ‘Yes’, and 101 questions with the label ‘No’ (Fig 2). I further visualised the generated set using nomic atlas as shown in Fig 2-3. (data visualisation can be accessed and explored here using nomic’s atlas) ![download (3)]){: width="250px" height="250px"}

  1. Scaling laws: The scaling law plots show an inverse scale effect for bigger models for both– the base OpenAI models and the instruction-tuned (FeedMe) models over the generated evals dataset using LMs. base-vs-feedme download (1) download (2) logodds-base Screenshot 2023-05-09 at 6 51 16 pm download (4)

Code organisation–

/dataset
	/atlas-data-visualisation.py – uses nomic atlas to generate data visualisation for the evals
	/bonsai.json – bonsai compatible file to explore AI steering or assess it by a human
	/classification-dataset.csv – classification score eval dataset
	/clean.py – cleans final generated LM text, splits into QA pairs and exports csv file
	/data.txt – final generated LM text via steering
	/logodds-dataset.csv – logodds metric dataset
/results - evaluation results for base and feedme models
/plots
	/plot-base-feedme.py - script to generate a line chart of base vs. feedme models accuracy vs. model size
	/Inverse_Scaling_GPT_3_Colab.ipynb - colab notebook for generating classification loss and logodd charts (credit: [Inverse Scaling](https://github.com/inverse-scaling/) Prize Notebook)
/eval_generation.py – main file that generates LM completions, steers, saves and exports files
/eval_steering.py – prompt-tuned steering to selected one of the n samples
/model.py – defines models including base model (code-davinci-002) & chat (gpt-3.5-turbo/4)
/prompts.py – defines diff. prompts used: main_prompt, system_prompt, ai_steering_prompt
/test_config.py – save this as config.py and add your api_keys here
/utils.py – basic util operations
/bonsai_export.py – generates a bonsai (web version of LOOM) compatible json file

Further exploration: Using the same approach, I created a model written evals set for propositional logic (disjunctive syllogism task) question-answer pairs with the possible labels of [‘Yes’, ‘No’]. Dataset link. Scaling law plots:- Screenshot 2023-05-09 at 5 59 01 am

model-written-evals's People

Contributors

hunarbatra avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.