Git Product home page Git Product logo

thepipe's Introduction

codecov python-gh-action

The pipe is a multimodal-first tool for feeding real-world information into large language models. It is built on top of dozens of carefully-crafted heuristics to create sensible text and image prompts from files, directories, web pages, papers, github repos, etc.

Demo

Features ๐ŸŒŸ

  • Prepare prompts from dozens of complex file types ๐Ÿ“„
  • Visual document extraction for complex PDFs, markdown, etc ๐Ÿง 
  • Outputs optimized for multimodal LLMs ๐Ÿ–ผ๏ธ + ๐Ÿ’ฌ
  • Multi-threaded โšก๏ธ
  • Works with missing file extensions, in-memory data streams ๐Ÿ’พ
  • Works with directories, URL, git repos, and more ๐ŸŒ

To use the pipe with Python, simply append the output to the start of your prompt:

import openai
import thepipe
openai_client = openai.OpenAI()
response = openai_client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages = thepipe.make_prompt_from_source("https://github.com/emcf/thepipe"),
)

How it works ๐Ÿ› ๏ธ

The pipe is accessible from the command line or from Python. The input source is either a file path, a URL, or a directory (or zip file) path. The pipe will extract information from the source and process it for downstream use with language models, vision transformers, or vision-language models. The output from the pipe is a sensible text-based (or multimodal) representation of the extracted information, carefully crafted to fit within context windows for any models from gemma-7b to GPT-4. It uses a variety of heuristics for optimal performance with vision-language models, including AI filetype detection with filetype detection, AI PDF extraction, efficient token compression, automatic image encoding, reranking for lost-in-the-middle effects, and more, all pre-built to work out-of-the-box.

Getting Started ๐Ÿš€

To use The Pipe, clone the repository and install the requirements:

git clone https://github.com/emcf/thepipe
pip install -r requirements.txt
npm install
npx playwright install --with-deps

Linux users can install ctags with

sudo apt-get install -y universal-ctags

Windows users must ensure ctags.exe is in their PATH environment variable.

To use The Pipe from the command line, simply run

python thepipe.py path/to/directory

This command will process all supported files within the specified directory, compressing any information over the token limit if necessary, and outputting the resulting prompt and images to a folder.

Arguments are:

  • The input source (required): can be a file path, a URL, or a directory path.
  • --match (optional): Regex pattern to match files in the directory.
  • --ignore (optional): Regex pattern to ignore files in the directory.
  • --limit (optional): The token limit for the output prompt, defaults to 100K. Prompts exceeding the limit will be compressed.
  • --mathpix (optional): Extract images, tables, and math from PDFs using Mathpix.
  • --text_only (optional): Do not extract images from documents or websites. Additionally, image files will be represented with OCR instead of as images.

You can use the pipe's output with other LLM providers via LiteLLM.

Supported File Types ๐Ÿ“š

Source Type Input types Token Compression ๐Ÿ—œ๏ธ Image Extraction ๐Ÿ‘๏ธ Notes ๐Ÿ“Œ
Directory Any /path/to/directory โœ”๏ธ โœ”๏ธ Extracts from all files in directory, supports match and ignore patterns
Code .py, .tsx, .js, .html, .css, .cpp, etc โœ”๏ธ (varies) โŒ Combines all code files. .c, .cpp, .py are compressible with ctags, others are not
Plaintext .txt, .md, .rtf, etc โœ”๏ธ โŒ Regular text files
PDF .pdf โœ”๏ธ โœ”๏ธ Extracts text and optionally images; can use Mathpix for enhanced extraction
Image .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .svg โŒ โœ”๏ธ Extracts images and can convert to text using OCR
Data Table .csv, .xls, .xlsx, supabase โœ”๏ธ โŒ Extracts data from spreadsheets or SQL tables; converts to text representation. For very large datasets, will only extract column names and types
Jupyter Notebook .ipynb โŒ โŒ Extracts content from Jupyter notebooks
Microsoft Word Document .docx โœ”๏ธ โœ”๏ธ Extracts text from Word documents
Microsoft PowerPoint Presentation .pptx โœ”๏ธ โœ”๏ธ Extracts text from PowerPoint presentations
Website URLs (http, https, www, ftp) โœ”๏ธ โœ”๏ธ Extracts content from web pages; text-only extraction available
GitHub Repository GitHub repo URLs โœ”๏ธ โœ”๏ธ Extracts from GitHub repositories; supports branch specification
ZIP File .zip โœ”๏ธ โœ”๏ธ Extracts contents of ZIP files; supports nested directory extraction

thepipe's People

Contributors

emcf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.