Git Product home page Git Product logo

copycatai's Introduction

CopycatAI ๐Ÿ˜ผ

##Fine-tune an OpenAI model based on your favorite Medium blogger in two easy steps.

Copycat provides an effortless way to fine-tune OpenAI models using Medium posts. This project automates the process of collecting, cleaning, prompt-completion pair generation, and training, making it easier to refine AI models for specific writing tasks.

Features

  • Automatically get links to medium posts written by the author
  • Generate prompt-completion pairs from blog posts
  • Fine-tune OpenAI models based on generated prompt-completion pairs

Installation

Prerequisites

  • Python 3.9 or later
  • BYOK ๐Ÿ”‘ (bring your own keys- OpenAI API key)

Steps

Automatic Installation on Linux

  1. Install the dependencies:
$ git clone https://github.com/jcorbett/copycatAI.git
  1. Navigate to the project directory cd copycatAI.
  2. Install python dependencies (you should use a virtual env):
$ pip install -r requirements.txt
  1. Add your OPENAI_API_KEY to the environment:
$ echo "OPENAI_API_KEY=sk-your-api-key" > .env

Usage

Generate prompt-pairs

$ python ./train.py [medium_username] [output_directory] [style|subject] [linear|chatgpt]

medium_username: this it he username of the author you'd like to train a model on. Make sure this is on someone that has posts on Medium that aren't "member-only". (this could be solved with the current selenium implementation, but I haven't worried about it yet)

output_directory: this is the directory where the fine-tuned model will be saved. If the directory doesn't exist, it will be created.

style|subject: This is used to generate the prompt-completion pairs and is either the style or subject of the author you're training on. This is used to generate the prompt-completion pairs. For example, if you're training on style, it will use the OpenAI-recommended blank prompt (prompt=""). If you are choosing subject, it will use the next parameter to generate the prompt pair based on what the author is writing about.

linear|chatgpt: This is used to generate the prompt-completion pairs and is either linear or chatgpt. For example, if you're training on linear, it will use the next sentence as the completion (completion:[next sentence]) with the current sentence as the prompt (prompt=[current sentence]). If you are choosing chatgpt, it will ask chatgpt to generate prompts that would generate the current sentence as a completion. This will default to linear if not specified.

NOTE: This is only necessary when using subject as the 3rd parameter. Also, chatgpt is much slower than linear and can be expensive.

example:

$ python ./train.py MarcoAngeloBendigo .output style 
$ python ./train.py cryptohayes .output subject linear

WARNING - generating prompt pairs based on subject with chatgpt from several long medium articles can take a long time and can be expensive.

Prep fine-tuning file

openai tools fine_tunes.prepare_data -f [directory]/[medium_username].jsonl

example:

openai tools fine_tunes.prepare_data -f .output/cryptohayes.jsonl

Launch fine-tuning

openai api fine_tunes.create -t [directory]/[medium_username}_prepared.jsonl -m [model] --suffix [something to help track name]

example

openai api fine_tunes.create -t .output/cryptohayes_prepared.jsonl -m davinci --suffix "cryptohayes"

Best Practices

While you can run the fine-tuning process automatically, make sure to check each jsonl file to ensure clean prompt pairs. The quality of your fine-tuning is fully dependent on the quality of your data.

Fine-tune best practices can be found here: https://platform.openai.com/docs/guides/fine-tuning

Share what you build with me on Twitter @_ustin ๐Ÿ‘‹

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

copycatai's People

Contributors

jcorbett avatar

Watchers

 avatar Kostas Georgiou avatar

Forkers

chinpeerapat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.