Git Product home page Git Product logo

programming-gpts's Introduction

Programming GPTs 🦜

For the better part of 2023, my hobby has been programming and experimenting with it. What I mean by programming GPTs does not mean that I am recreating the AI behind GPT 3.5 or 4 from scratch. I have tried to fine tune an open-source pretrained AI model like LLaMa2 (from Meta), or start from scratch with much smaller models, but the results you get with such an approach are not as good as what OpenAI provides. And it takes way more skill to reinvent the service that OpenAI offers at a price that is very competitive. So instead, I am focusing here on building on top of the OpenAI giants.

In this blog, you will learn how to program GPTs primarily leveraging OpenAI’s APIs. If you don’t know what an API is (Application Programming Interface), then this is not the blog you’re looking for. Go look it up on the internet, and come back after learning some basics of programming, preferably in Python, as this will be the language used for the tutorials in each chapter. Generative Pre-Trained Transformers are quite complicated general-purpose machines that can do a lot of different things, like handling sequences of text, images, and sounds. You will learn the basic concepts and principles behind GPTs, and how they work under the hood. But more importantly, you will learn how to integrate them inside of your applications.

The blog is divided into 10 chapters, each covering a different topic and a different aspect of programming GPTs. The chapters are:

  • Chapter 1: πŸ€– Introduction to GPTs. How they work and their evolution.

  • Chapter 2: πŸ±πŸ’¬ The Chat API. In this chapter, you will learn how to use the Chat API, a simple way to create conversational agents with GPTs. You will learn how to create your own chatbot.

    Open in Streamlit

streamlit-streamlit_app-2023-07-15-22-07-70.webm

  • Chapter 3: πŸ”— Chaining & Summarization. In this chapter, you will learn how to chain calls to a Large Language Model and use it to summarize texts, such as articles, books, or transcripts. You will learn how to use the Chat API together with the LangChain package to enhance GPTs.

    Open in Streamlit

streamlit-summarize_chain-2023-04-01-17-04-80.webm

  • Chapter 4: πŸ”Žβ“ Vector search & Question Answering. In this chapter, you will learn how to use embeddings and vector search as a way to retrieve informative answers to answer questions while quoting sources.

    Open in Streamlit

streamlit-qa_doc-2023-05-26-15-05-74.webm

  • Chapter 5: πŸ•΅οΈβ€β™€οΈπŸ› οΈ Agent & Tools. In this chapter, you will learn to build an Agent, called Smith, that has access to tools, such as getting the current weather. You will also learn how to use the Assistant API provided by OpenAI, and to extend their capabilities with tools to integrate GPTs with external services. This will be illustrated with the implementation of your own Code Interpreter, that can help you write and run code with GPTs.

    Open in Streamlit

streamlit-smith-2023-06-17-11-06-28.webm

  • Chapter 6: πŸ—£οΈπŸ“’ Text to Speech & Synthesis. In this chapter, you will learn how to use GPTs to transcript text from speech (such as Youtube videos), and synthetize speech from text (such as articles).

    Open in Streamlit

streamlit-streamlit_app-2023-11-20-22-11-57.webm

  • Chapter 7: πŸ‘€ Vision API. In this chapter, you will learn how to use GPTs to process and analyze images, such as mock-ups or drawings. You will learn how to use the Vision API, to perform various tasks with GPTs, such as text recognition, or video captioning.

    Open in Streamlit

streamlit-vision-2023-11-06-22-11-97.webm

  • Chapter 8: πŸŽ¨πŸ–ŒοΈ Dall-E image generation. In this chapter, you will learn how to use Dall-E 2 & 3, which can create stunning and creative images from any text input. You will also learn how to use the outpainting, inpainting and variations APIs, which can complete or modify existing images.

    Open in Streamlit

streamlit-image_app-2023-11-30-21-11-28.webm

  • Chapter 9: πŸ“Œ Conclusion. In this chapter, you will review what you have learned, and reflect on the potential and challenges of programming GPTs. You will also learn how to keep up with the latest developments and innovations in the field of GPTs and OpenAI with additional resources.

  • Chapter 10: πŸ“š Appendix. In this chapter, you will find additional resources, such as a glossary of terms, and a list of references and further readings. You will also find some applications I developed to support some of my work.

programming-gpts's People

Contributors

yanndebray avatar

Stargazers

David avatar Bennet Sunder avatar Arjun Guha avatar Hans Scharler avatar

Watchers

 avatar

Forkers

davidbellamy

programming-gpts's Issues

Add resources to build the book

Like pandoc recipes from word to markdown (storing images in a specific folder)

pandoc "input.docx" -o README.md --extract-media="img"

Manual changes to the md generated:

  • Escape \', \", \[, \], \*, \#
  • Need to add ``` for code blocks
  • Footnotes added to the end as [^12]
  • Images with description and dimensions
    ![A screenshot of a software Description automatically generated](img/media/image7.png){width="6.5in" height="3.767361111111111in"}

More resources

Chap 10 - Open interpreter

Open Interpreter is an open-source project similar to ChatGPT's Code Interpreter that allows you to run large language models locally.

  • It has made significant progress since initial release - can now fully control your computer, developer-friendly to build apps on top, and has vision capabilities.
  • Installing is very easy with the Python package manager - just pip install open-interpreter.
  • To use, first export your OpenAI API key, then run interpreter to start.
  • It can execute shell commands to control your computer, like listing folders, opening files, converting images etc.
  • The vision version can read screenshots and generate code to recreate UI elements.
  • You can create reusable scripts/tools with it to automate tasks.
  • Easily build applications on top with the Python module. It will write, fix, and execute code.
  • Can run completely locally using open-source models from LM Studio instead of API. Quality isn't as good but works.

More resources:

Chap 7 - Investigate LayoutLM alternative to GPT-4V

LayoutLM is another language model (not as large as GPTs) that extends the BERT architecture to incorporate the layout information of the document, such as the bounding boxes, sizes, and positions of the text segments. The model can encode both the textual and visual features of the document and perform tasks such as document classification, form understanding, or entity extraction.

Chap 9 - ChromaDB sqlite on Linux

RuntimeError: οΏ½[91mYour system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0.οΏ½[0m οΏ½[94mPlease visit https://docs.trychroma.com/troubleshooting#sqlite to learn how to upgrade.οΏ½[0m

chap 7 - improve GPT-4V object detector with function calling syntax

i = 767
prompt = 'Is there at least one car in the image?'
base64_image = base64.b64encode(requests.get(bucket+keys[i]).content).decode('utf-8')
car = vision_function(prompt, base64_image, tools)
print(car)
Image.open(io.BytesIO(requests.get(bucket+keys[i]).content))

image

This shouldn't return {'presence': 'True'}

chap 4 - simple chatbot with search

import streamlit as st
from utils import *

st.set_page_config(page_title='search',page_icon='πŸ”')
st.sidebar.title(f'Search πŸ”')

if 'avatar' not in st.session_state:
  st.session_state.avatar = {"assistant": "πŸ€–", "user": "🐱"}

avatar = st.session_state.avatar

if 'convo' not in st.session_state:
    st.session_state.convo = []

n = len(os.listdir('chat'))
if 'id' not in st.session_state:
    st.session_state.id = n

id = st.session_state.id


if 'model' not in st.session_state:
    st.session_state.model = 'gpt-3.5-turbo'
# models_name = ['gpt-3.5-turbo', 'gpt-4o']
# selected_model = st.sidebar.selectbox('Select OpenAI model', models_name)
selected_model = st.session_state.model
st.sidebar.write(f'Selected model: {selected_model}')

if st.sidebar.button(f'New Chat {avatar["user"]}'):
   new_chat()
for file in sorted(os.listdir('chat')):
  filename = file.replace('.json','')
  if st.sidebar.button(f'πŸ’¬ {filename}'):
     select_chat(file)

# Display the response in the Streamlit app
for line in st.session_state.convo:
    # st.chat_message(line.role,avatar=avatar[line.role]).write(line.content)
    if line['role'] == 'user':
      st.chat_message('user',avatar=avatar['user']).write(line['content'])
    elif line['role'] == 'assistant':
      st.chat_message('assistant',avatar=avatar['assistant']).write(line['content'])

# Create a text input widget in the Streamlit app
prompt = st.chat_input(f'convo{st.session_state.id}')

if prompt:
    # Append the text input to the conversation
    with st.chat_message('user',avatar=avatar['user']):
        st.write(prompt)
        text = search(prompt)
    question = f"""Given the following context of Google search, answer the question:
    {prompt}
    ---
    Here is the context retrieve from Google search:
    {text}
    """
    st.session_state.convo.append({'role': 'user', 'content': prompt })
    convo_search = st.session_state.convo
    convo_search.append({'role': 'user', 'content': text})
    # Query the chatbot with the complete conversation
    with st.chat_message('assistant',avatar=avatar['assistant']):
        result = chat_stream(convo_search,selected_model)
        #  result = dumb_chat()
    # Add response to the conversation
    st.session_state.convo.append({'role':'assistant', 'content':result})
    save_chat(id)

# Debug
# st.sidebar.write(st.session_state.convo)

chap 7 - more about computer vision

More about object detection:
Object detection, a fundamental task in computer vision, has seen remarkable advancements through various AI approaches. Traditional methods, like sliding window and region-based convolutional neural networks (CNNs), paved the way for modern techniques. One significant breakthrough came with the emergence of deep learning, particularly CNNs, which revolutionized object detection by learning hierarchical features directly from data. Models like YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector) introduced the concept of real-time detection by framing it as a regression problem, enabling swift inference on images and videos. Another notable approach is the region-based detection, exemplified by Faster R-CNN, which combines a region proposal network with a CNN, achieving impressive accuracy by efficiently generating region proposals. Recent advancements incorporate attention mechanisms and transformer architectures, enhancing the ability to capture long-range dependencies and contextual information, thus further improving object detection performance, especially in complex scenes and varied object scales. These AI approaches collectively propel object detection into new realms of accuracy, speed, and scalability, fostering its wide-ranging applications across industries like autonomous vehicles, surveillance, and augmented reality.

chap 5 - support image outputs in code interpreter

Example: plot function 1/sin(x)

AttributeError: 'ImageFileContentBlock' object has no attribute 'text'
Traceback:
File "C:\Users\ydebray\Downloads\gpt-programming-book\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
File "C:\Users\ydebray\Downloads\gpt-programming-book\chap5\code_interpreter.py", line 115, in <module>
    st.chat_message('assistant',avatar=avatar['assistant']).write(line.content[0].text.value)
File "C:\Users\ydebray\Downloads\gpt-programming-book\env\lib\site-packages\pydantic\main.py", line 755, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') from exc
[
  "Message(id='msg_KGjc0NInvSi38Y16DFSFTDoz', assistant_id='asst_5zjj3Cp5W2DOT6sRLeT6Cf23', attachments=[], completed_at=None, content=[ImageFileContentBlock(image_file=ImageFile(file_id='file-0dhn9eURNlHGJbNJkQvqr0wY'), type='image_file'), TextContentBlock(text=Text(annotations=[], value='Here is the plot of the function \\\\( \\\\frac{1}{\\\\sin(x)} \\\\). The plot shows the behavior of the function over the range of \\\\([-2\\\\pi, 2\\\\pi]\\\\).'), type='text')], created_at=1716945774, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='assistant', run_id='run_qphSwg615K1aZG4ml6yBzcU8', status=None, thread_id='thread_TJIOSZyHsk6vu1rJ6DEsux6j')",
  "Message(id='msg_S8q6tSmXfzdqMWH5sntSyMv8', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='plot function 1/sin(x)'), type='text')], created_at=1716945760, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_TJIOSZyHsk6vu1rJ6DEsux6j')"
]

Fix Daily Tech Podcast

Traceback (most recent call last):
  File "/home/runner/work/programming-GPTs/programming-GPTs/chap6/6_3_daily_tech_podcast.py", line 1[13](https://github.com/yanndebray/programming-GPTs/actions/runs/9010117278/job/24755589032#step:6:14), in <module>
Channel Title: TechCrunch
Channel Description: Startup and Technology News
Channel Link: https://techcrunch.com/
Last Build Date: Thu, 09 May 2024 00:01:49 +0000
    (title,link,text) = scrape_article(item,episode)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/programming-GPTs/programming-GPTs/chap6/6_3_daily_tech_podcast.py", line 34, in scrape_article
    text = soup.find(class_="article-content").get_text()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get_text'

image

chap 10 - fine tuning

A GPT can then be fine-tuned, which means that it can be trained on a smaller amount of text data related to a specific task or domain. For example, ChatGPT can be fine-tuned on a dataset of customer service conversations, to make it better at answering questions and solving problems.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.