Git Product home page Git Product logo

llms's Introduction

LLMs

This is a tool to evaluate large language models on NLP tasks such as text classification and summarization. It implements a common API for traditional encoder-decoder and prompt-based large language models, as well as APIs such as OpenAI and Cohere.

Currently, these functionalities are available:

  • Prompting and truncation logic
  • Support for vanilla LLMs (OPT, LLaMa) and instruction-tuned models (T0, Alpaca)
  • Evaluation based on ๐Ÿค— Datasets or CSV files
  • Memoization: inference outputs are cached on disk
  • Parallelized computation of metrics

Setup

git clone https://github.com/thefonseca/llms.git
cd llms && pip install -e .

Classification

llm-classify \
--model_name llama-2-7b-chat \
--model_checkpoint_path path_to_llama2_checkpoint \
--model_dtype float16 \
--dataset_name imdb \
--split test \
--source_key text \
--target_key label \
--model_labels "{'Positive':1,'Negative':0}" \
--max_samples 1000

Summarization

Evaluating BigBird on PubMed validation split, and saving the results on the output folder:

llm-summarize \
--dataset_name scientific_papers \
--dataset_config pubmed \
--split validation \
--source_key article \
--target_key abstract \
--max_samples 1000 \
--model_name google/bigbird-pegasus-large-pubmed \
--output_dir output

where --model_name is a huggingface model identifier.

Evaluating Alpaca (float16) on arXiv validation split:

llm-summarize \
--arxiv_id https://arxiv.org/abs/2304.15004v1 \
--model_name alpaca-7b \
--model_checkpoint_path path_to_alpaca_checkpoint \
--budget 7 \
--budget_unit sentences \
--model_dtype float16 \
--output_dir output

Notes:

  • --budget controls length of instruct-tuned summaries (by default, in sentences).
  • --model_checkpoint_path allows changing checkpoint folder while keeping the cache key (--model_name) constant.

Evaluating ChatGPT API on arXiv validation split:

export OPENAI_API_KEY=<your_api_key>
llm-summarize \
--dataset_name scientific_papers \
--dataset_config arxiv \
--split validation \
--source_key article \
--target_key abstract \
--max_samples 1000 \
--model_name gpt-3.5-turbo \
--output_dir output

Evaluating summary predictions from a CSV file:

llm-summarize \
--dataset_name scientific_papers \
--dataset_config arxiv \
--split validation \
--source_key article \
--target_key abstract \
--prediction_path path_to_csv_file \
--prediction_key prediction \
--max_samples 1000 \
--output_dir output

llms's People

Contributors

thefonseca avatar

Watchers

 avatar  avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.