LLMs

This is a tool to evaluate large language models on NLP tasks such as text classification and summarization. It implements a common API for traditional encoder-decoder and prompt-based large language models, as well as APIs such as OpenAI and Cohere.

Currently, these functionalities are available:

Prompting and truncation logic
Support for vanilla LLMs (OPT, LLaMa) and instruction-tuned models (T0, Alpaca)
Evaluation based on 🤗 Datasets or CSV files
Memoization: inference outputs are cached on disk
Parallelized computation of metrics

Setup

git clone https://github.com/thefonseca/llms.git
cd llms && pip install -e .

Classification

llm-classify \
--model_name llama-2-7b-chat \
--model_checkpoint_path path_to_llama2_checkpoint \
--model_dtype float16 \
--dataset_name imdb \
--split test \
--source_key text \
--target_key label \
--model_labels "{'Positive':1,'Negative':0}" \
--max_samples 1000

Summarization

Evaluating BigBird on PubMed validation split, and saving the results on the output folder:

llm-summarize \
--dataset_name scientific_papers \
--dataset_config pubmed \
--split validation \
--source_key article \
--target_key abstract \
--max_samples 1000 \
--model_name google/bigbird-pegasus-large-pubmed \
--output_dir output

where --model_name is a huggingface model identifier.

Evaluating Alpaca (float16) on arXiv validation split:

llm-summarize \
--arxiv_id https://arxiv.org/abs/2304.15004v1 \
--model_name alpaca-7b \
--model_checkpoint_path path_to_alpaca_checkpoint \
--budget 7 \
--budget_unit sentences \
--model_dtype float16 \
--output_dir output

Notes:

--budget controls length of instruct-tuned summaries (by default, in sentences).
--model_checkpoint_path allows changing checkpoint folder while keeping the cache key (--model_name) constant.

Evaluating ChatGPT API on arXiv validation split:

export OPENAI_API_KEY=<your_api_key>
llm-summarize \
--dataset_name scientific_papers \
--dataset_config arxiv \
--split validation \
--source_key article \
--target_key abstract \
--max_samples 1000 \
--model_name gpt-3.5-turbo \
--output_dir output

Evaluating summary predictions from a CSV file:

llm-summarize \
--dataset_name scientific_papers \
--dataset_config arxiv \
--split validation \
--source_key article \
--target_key abstract \
--prediction_path path_to_csv_file \
--prediction_key prediction \
--max_samples 1000 \
--output_dir output

thefonseca / llms Goto Github PK

llms's Introduction

LLMs

Setup

Classification

Summarization

llms's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent