yanndebray / programming-gpts Goto Github PK

View Code? Open in Web Editor NEW

4.0 1.0 1.0 193.15 MB

Book in writing ... 🦜

Home Page: https://yanndebray.github.io/programming-GPTs/

License: MIT License

Python 0.31% Jupyter Notebook 98.28% HTML 1.41%

openai streamlit

programming-gpts's Introduction

Programming GPTs 🦜

For the better part of 2023, my hobby has been programming and experimenting with it. What I mean by programming GPTs does not mean that I am recreating the AI behind GPT 3.5 or 4 from scratch. I have tried to fine tune an open-source pretrained AI model like LLaMa2 (from Meta), or start from scratch with much smaller models, but the results you get with such an approach are not as good as what OpenAI provides. And it takes way more skill to reinvent the service that OpenAI offers at a price that is very competitive. So instead, I am focusing here on building on top of the OpenAI giants.

In this blog, you will learn how to program GPTs primarily leveraging OpenAI’s APIs. If you don’t know what an API is (Application Programming Interface), then this is not the blog you’re looking for. Go look it up on the internet, and come back after learning some basics of programming, preferably in Python, as this will be the language used for the tutorials in each chapter. Generative Pre-Trained Transformers are quite complicated general-purpose machines that can do a lot of different things, like handling sequences of text, images, and sounds. You will learn the basic concepts and principles behind GPTs, and how they work under the hood. But more importantly, you will learn how to integrate them inside of your applications.

The blog is divided into 10 chapters, each covering a different topic and a different aspect of programming GPTs. The chapters are:

Chapter 1: 🤖 Introduction to GPTs. How they work and their evolution.
Chapter 2: 🐱💬 The Chat API. In this chapter, you will learn how to use the Chat API, a simple way to create conversational agents with GPTs. You will learn how to create your own chatbot.

streamlit-streamlit_app-2023-07-15-22-07-70.webm

Chapter 3: 🔗 Chaining & Summarization. In this chapter, you will learn how to chain calls to a Large Language Model and use it to summarize texts, such as articles, books, or transcripts. You will learn how to use the Chat API together with the LangChain package to enhance GPTs.

streamlit-summarize_chain-2023-04-01-17-04-80.webm

Chapter 4: 🔎❓ Vector search & Question Answering. In this chapter, you will learn how to use embeddings and vector search as a way to retrieve informative answers to answer questions while quoting sources.

streamlit-qa_doc-2023-05-26-15-05-74.webm

Chapter 5: 🕵️‍♀️🛠️ Agent & Tools. In this chapter, you will learn to build an Agent, called Smith, that has access to tools, such as getting the current weather. You will also learn how to use the Assistant API provided by OpenAI, and to extend their capabilities with tools to integrate GPTs with external services. This will be illustrated with the implementation of your own Code Interpreter, that can help you write and run code with GPTs.

streamlit-smith-2023-06-17-11-06-28.webm

Chapter 6: 🗣️📢 Text to Speech & Synthesis. In this chapter, you will learn how to use GPTs to transcript text from speech (such as Youtube videos), and synthetize speech from text (such as articles).

streamlit-streamlit_app-2023-11-20-22-11-57.webm

Chapter 7: 👀 Vision API. In this chapter, you will learn how to use GPTs to process and analyze images, such as mock-ups or drawings. You will learn how to use the Vision API, to perform various tasks with GPTs, such as text recognition, or video captioning.

streamlit-vision-2023-11-06-22-11-97.webm

Chapter 8: 🎨🖌️ Dall-E image generation. In this chapter, you will learn how to use Dall-E 2 & 3, which can create stunning and creative images from any text input. You will also learn how to use the outpainting, inpainting and variations APIs, which can complete or modify existing images.

streamlit-image_app-2023-11-30-21-11-28.webm

Chapter 9: 📌 Conclusion. In this chapter, you will review what you have learned, and reflect on the potential and challenges of programming GPTs. You will also learn how to keep up with the latest developments and innovations in the field of GPTs and OpenAI with additional resources.
Chapter 10: 📚 Appendix. In this chapter, you will find additional resources, such as a glossary of terms, and a list of references and further readings. You will also find some applications I developed to support some of my work.

programming-gpts's People

Contributors

Stargazers

Watchers

Forkers

davidbellamy

programming-gpts's Issues

Add resources to build the book

Like pandoc recipes from word to markdown (storing images in a specific folder)

pandoc "input.docx" -o README.md --extract-media="img"

Manual changes to the md generated:

Escape \', \", \[, \], \*, \#
Need to add ``` for code blocks
Footnotes added to the end as [^12]
Images with description and dimensions
![A screenshot of a software Description automatically generated](img/media/image7.png){width="6.5in" height="3.767361111111111in"}

More resources

Links, links, links:

Chap 5 - Add Tour operator app 🌍 with function calling

https://github.com/andfanilo/social-media-tutorials/blob/master/20231116-st_assistants/streamlit_app.py
https://github.com/opengeos/streamlit-map-template

chap 5 - add file input to code interpreter

Chap 6 - create app to generate podcast from an article url

Chap 7 - Fix the JS callback for the joke website

https://yanndebray.github.io/programming-GPTs/chap7/joke_website

Chap 10 - Open interpreter

Open Interpreter is an open-source project similar to ChatGPT's Code Interpreter that allows you to run large language models locally.

It has made significant progress since initial release - can now fully control your computer, developer-friendly to build apps on top, and has vision capabilities.
Installing is very easy with the Python package manager - just pip install open-interpreter.
To use, first export your OpenAI API key, then run interpreter to start.
It can execute shell commands to control your computer, like listing folders, opening files, converting images etc.
The vision version can read screenshots and generate code to recreate UI elements.
You can create reusable scripts/tools with it to automate tasks.
Easily build applications on top with the Python module. It will write, fix, and execute code.
Can run completely locally using open-source models from LM Studio instead of API. Quality isn't as good but works.

More resources:

Open Interpreter
Tutorial: https://www.youtube.com/watch?v=xPd8FFzIeOw

chap 5 - enable download from file generated by code interpreter

links to sandbox created in the following form:
sandbox:/mnt/data/titanic_data_analysis_report.ipynb

Chap 7 - Investigate LayoutLM alternative to GPT-4V

LayoutLM is another language model (not as large as GPTs) that extends the BERT architecture to incorporate the layout information of the document, such as the bounding boxes, sizes, and positions of the text segments. The model can encode both the textual and visual features of the document and perform tasks such as document classification, form understanding, or entity extraction.

Chap 9 - ChromaDB sqlite on Linux

RuntimeError: �[91mYour system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0.�[0m �[94mPlease visit https://docs.trychroma.com/troubleshooting#sqlite to learn how to upgrade.�[0m

chap 7 - improve GPT-4V object detector with function calling syntax

i = 767
prompt = 'Is there at least one car in the image?'
base64_image = base64.b64encode(requests.get(bucket+keys[i]).content).decode('utf-8')
car = vision_function(prompt, base64_image, tools)
print(car)
Image.open(io.BytesIO(requests.get(bucket+keys[i]).content))

This shouldn't return {'presence': 'True'}

chap 10 - S3 Image viewer

S3 (Simple Storage Service) is a foundational AWS Web Service. It provides programmers with the ability to store file easily and make those accessible through an API, which access might be controlled by another service called IAM (Identity and Access Management).
https://car-sim.streamlit.app/
https://car-dataset.streamlit.app/
https://s3-image-viewer.streamlit.app/

Chap 9 - Fix Rag demo

Chap 7 - Solve puzzles with vision

chap 7 - Improve OCR assistant

image

prompt

write python code to extract the following selection

chap 4 - simple chatbot with search

import streamlit as st
from utils import *

st.set_page_config(page_title='search',page_icon='🔍')
st.sidebar.title(f'Search 🔍')

if 'avatar' not in st.session_state:
  st.session_state.avatar = {"assistant": "🤖", "user": "🐱"}

avatar = st.session_state.avatar

if 'convo' not in st.session_state:
    st.session_state.convo = []

n = len(os.listdir('chat'))
if 'id' not in st.session_state:
    st.session_state.id = n

id = st.session_state.id


if 'model' not in st.session_state:
    st.session_state.model = 'gpt-3.5-turbo'
# models_name = ['gpt-3.5-turbo', 'gpt-4o']
# selected_model = st.sidebar.selectbox('Select OpenAI model', models_name)
selected_model = st.session_state.model
st.sidebar.write(f'Selected model: {selected_model}')

if st.sidebar.button(f'New Chat {avatar["user"]}'):
   new_chat()
for file in sorted(os.listdir('chat')):
  filename = file.replace('.json','')
  if st.sidebar.button(f'💬 {filename}'):
     select_chat(file)

# Display the response in the Streamlit app
for line in st.session_state.convo:
    # st.chat_message(line.role,avatar=avatar[line.role]).write(line.content)
    if line['role'] == 'user':
      st.chat_message('user',avatar=avatar['user']).write(line['content'])
    elif line['role'] == 'assistant':
      st.chat_message('assistant',avatar=avatar['assistant']).write(line['content'])

# Create a text input widget in the Streamlit app
prompt = st.chat_input(f'convo{st.session_state.id}')

if prompt:
    # Append the text input to the conversation
    with st.chat_message('user',avatar=avatar['user']):
        st.write(prompt)
        text = search(prompt)
    question = f"""Given the following context of Google search, answer the question:
    {prompt}
    ---
    Here is the context retrieve from Google search:
    {text}
    """
    st.session_state.convo.append({'role': 'user', 'content': prompt })
    convo_search = st.session_state.convo
    convo_search.append({'role': 'user', 'content': text})
    # Query the chatbot with the complete conversation
    with st.chat_message('assistant',avatar=avatar['assistant']):
        result = chat_stream(convo_search,selected_model)
        #  result = dumb_chat()
    # Add response to the conversation
    st.session_state.convo.append({'role':'assistant', 'content':result})
    save_chat(id)

# Debug
# st.sidebar.write(st.session_state.convo)

chap 10 - GPT action calling Retrieval Plugin

https://github.com/openai/chatgpt-retrieval-plugin

chap 7 - more about computer vision

More about object detection:
Object detection, a fundamental task in computer vision, has seen remarkable advancements through various AI approaches. Traditional methods, like sliding window and region-based convolutional neural networks (CNNs), paved the way for modern techniques. One significant breakthrough came with the emergence of deep learning, particularly CNNs, which revolutionized object detection by learning hierarchical features directly from data. Models like YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector) introduced the concept of real-time detection by framing it as a regression problem, enabling swift inference on images and videos. Another notable approach is the region-based detection, exemplified by Faster R-CNN, which combines a region proposal network with a CNN, achieving impressive accuracy by efficiently generating region proposals. Recent advancements incorporate attention mechanisms and transformer architectures, enhancing the ability to capture long-range dependencies and contextual information, thus further improving object detection performance, especially in complex scenes and varied object scales. These AI approaches collectively propel object detection into new realms of accuracy, speed, and scalability, fostering its wide-ranging applications across industries like autonomous vehicles, surveillance, and augmented reality.

chap 5 - support image outputs in code interpreter

Example: plot function 1/sin(x)

AttributeError: 'ImageFileContentBlock' object has no attribute 'text'
Traceback:
File "C:\Users\ydebray\Downloads\gpt-programming-book\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
File "C:\Users\ydebray\Downloads\gpt-programming-book\chap5\code_interpreter.py", line 115, in <module>
    st.chat_message('assistant',avatar=avatar['assistant']).write(line.content[0].text.value)
File "C:\Users\ydebray\Downloads\gpt-programming-book\env\lib\site-packages\pydantic\main.py", line 755, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') from exc

[
  "Message(id='msg_KGjc0NInvSi38Y16DFSFTDoz', assistant_id='asst_5zjj3Cp5W2DOT6sRLeT6Cf23', attachments=[], completed_at=None, content=[ImageFileContentBlock(image_file=ImageFile(file_id='file-0dhn9eURNlHGJbNJkQvqr0wY'), type='image_file'), TextContentBlock(text=Text(annotations=[], value='Here is the plot of the function \\\\( \\\\frac{1}{\\\\sin(x)} \\\\). The plot shows the behavior of the function over the range of \\\\([-2\\\\pi, 2\\\\pi]\\\\).'), type='text')], created_at=1716945774, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='assistant', run_id='run_qphSwg615K1aZG4ml6yBzcU8', status=None, thread_id='thread_TJIOSZyHsk6vu1rJ6DEsux6j')",
  "Message(id='msg_S8q6tSmXfzdqMWH5sntSyMv8', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='plot function 1/sin(x)'), type='text')], created_at=1716945760, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_TJIOSZyHsk6vu1rJ6DEsux6j')"
]

Chap 7 - Investigate Tesseract OCR alternative to GPT-4V

chap 10 - investigate options to deploy GPTs actions

Replit
https://www.pythonanywhere.com/pricing/
Firebase
Vercel
Render
- https://youtu.be/moi8WPO3Xhs?si=ckUsiv0snkdn4ea4
Fly.io
Digital Ocean 🌊
Zapier

Chap 8 - Save adventure report

Report formats:

doc
ppt
zip (txt + png)
mp4

Chap 1 - explain base concepts

https://www.thoughtspot.com/data-trends/ai/what-is-transformer-architecture-chatgpt

Chap 2 - store conversations in S3 or supabase

Add chap 5 Application: Agent Smith surfing the internet

Fix Daily Tech Podcast

Traceback (most recent call last):
  File "/home/runner/work/programming-GPTs/programming-GPTs/chap6/6_3_daily_tech_podcast.py", line 1[13](https://github.com/yanndebray/programming-GPTs/actions/runs/9010117278/job/24755589032#step:6:14), in <module>
Channel Title: TechCrunch
Channel Description: Startup and Technology News
Channel Link: https://techcrunch.com/
Last Build Date: Thu, 09 May 2024 00:01:49 +0000
    (title,link,text) = scrape_article(item,episode)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/programming-GPTs/programming-GPTs/chap6/6_3_daily_tech_podcast.py", line 34, in scrape_article
    text = soup.find(class_="article-content").get_text()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get_text'

Chap 2 - Update prices of the API

chap 10 - local and open-weight models

Ollama - local models on your machine
https://youtu.be/Ox8hhpgrUi0?si=LxpAd1n29InncB78

Open-weight models

Llama3
Mistral 7B v0.3

Use cases:

interactive vs non-intersecting
local RAG with sensitive data
overnight text processing

Chap 6 - store daily tech podcast on S3

chap 10 - fine tuning

A GPT can then be fine-tuned, which means that it can be trained on a smaller amount of text data related to a specific task or domain. For example, ChatGPT can be fine-tuned on a dataset of customer service conversations, to make it better at answering questions and solving problems.