Git Product home page Git Product logo

a24-llm-stdir's Introduction

Stage direction classification in French theater with LLM

Pablo Ruiz Fabo & Alexia Schneider, Université de Strasbourg

Repository structure

  • Clients to run the LLM are:
    • openai_client.py
    • llama_client.py
    • mistral_client.py
  • For each batch, they create an output directory under outputs.
  • Evaluation scripts are:
    • evaluate.py for GPT-4 and Mistral results
    • evaluate_llama.py for Llama results
  • Configuration options are in config.py
  • Examples to classify are in data

Usage

CLI options can be seen in the argparse options in the clients.

Example

  • The command python openai_client.py batch_014 data/stgdir_labelGeneric_trainvalid_100-test.csv gpt-4o-mini def-few-shot en does the following (CLI option listed in order):
    • Output is saved to outputs/batch_014
    • Examples to classify are in data/stgdir_labelGeneric_trainvalid_100-test.csv
    • The model is gpt-4o-mini
    • Prompting mode is def-few-shot, which uses a detailed definition for each category and 20 examples for each category. Allowed prompt modes are in config.py.
    • The language for the category definitions is en (possible values in config.py).

Data

Prompts

  • The prompts, in both English and French, are in prompts.py. The prompts are templates with several fields, filled dynamically by the client with category definitions in all settings and, additionally, some examples in the few-shot setting.
  • Category definitions are in data/category_info.py.
  • In the case of few-shot batches, the examples in data/sampled_df_for_prompts_0001.tsv are added to the prompt.

Evaluation

  • Script evaluate.py and evaluate_llama.py produce a classification report and confustion matrix for a batch, for GPT-4 and Llama 3.1 results respectively. Results are written to the plots directory in each batch's output directory.

Result summary

A description of column values follows the table.

Batch IDs starting with 0 are for GPT-4 models, starting with 1 are for Llama models, and starting with 2 are for Mistral models.

id definition type prompt language example use data split model macro F1 weighted F1 acc
001 simple en two-shot 1 gpt-4o-mini 0.48 0.53 0.52
002 simple en zero-shot 0.3 stratified gpt-4o-mini 0.51 0.57 0.57
003 detailed en zero-shot 0.3 stratified gpt-4o-mini 0.53 0.59 0.57
004 detailed en zero-shot 0.3 stratified gpt-4o 0.7 0.73 0.72
005 detailed en zero-shot 1 gpt-4o 0.71 0.74 0.73
006 detailed en zero-shot 1 gpt-4o-mini 0.58 0.64 0.61
007 simple en few-shot (20 per class) 0.3 stratified gpt-4o-mini 0.57 0.64 0.63
008 detailed en few-shot (20 per class) 0.3 stratified gpt-4o-mini 0.58 0.65 0.67
009 detailed en few-shot (20 per class) 0.3 stratified gpt-4o 0.73 0.79 0.78
010 detailed fr few-shot (20 per class) 0.3 stratified gpt-4o-mini 0.62 0.7 0.69
011 detailed fr zero-shot 1 gpt-4o 0.69 0.72 0.71
012 detailed fr zero-shot 1 gpt-4o-mini 0.54 0.59 0.57
013 detailed fr few-shot (20 per class) 1 gpt-4o-mini 0.61 0.68 0.67
014 detailed en few-shot (20 per class) 1 gpt-4o-mini 0.6 0.67 0.67
015 detailed fr few-shot (20 per class) 1 gpt-4o 0.7 0.75 0.75
102 detailed en zero-shot 1 llama-3 0.43 0.52 0.49
103 detailed en zero-shot 1 llama-3.1 0.56 0.63 0.61
104 detailed fr zero-shot 1 llama-3.1 0.52 0.6 0.62
105 detailed en few-shot (20 per class) 0.3 llama-3.1 0.46 0.5 0.51

Legend for the result summary

Definition type column

  • simple definition: The prompt contains a short definition for each category. It may additionally contain examples or not, this is reflected in the example use column.
  • detailed: Besides a definition (which may provide some more detail than the simple type), the prompt contains a list of expressions that may be related to the category. These expressions are not direct examples of stage directions, they may be verbs or other expressions related to the vocabulary of the category's stage directions. The prompt may additionally contain examples or not, this is reflected in the example use column.

Note that, in all batches from 005 onwards, all prompts in English mention explicitly that the text to classify is in French. The reason for adding this information is the following: We asked the models to generate a text to "explain" each classification. In early batches, it was seen that gpt-4o-mini was interpreting some potentially ambiguous examples as English text. E.g. a case like "Evelyn sort" might get interpreted as English text, as a type of "sort" (a method for arranging things in a order), rather than as French text (inflected form of verb "sortir", for "exiting").

Data splits column

a24-llm-stdir's People

Contributors

pruizf avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.