Git Product home page Git Product logo

gpt4_langchain_pdf-detailed-walktrough's Introduction

Table of Contents

Forwords to China

Some have reported issues with this project when they start it in china. There might be regional blocking from openai/pinecone (21.05.2023).

Solution from jchermy

  1. : Create Cloudflare Worker, refer to noobnooc/noobnooc#9;
  2. :Proxy all the needs of openai. The example is as follows:
const embeddings = new OpenAIEmbeddings(undefined, {
  basePath: 'https://openai.1rmb.tk/v1/',
  apiKey: 'xxx',
}); 

Prerequisites

Discord Server

Join the discord if you have questions.

Project setup

Download the project

Download the project: https://github.com/mayooear/gpt4-pdf-chatbot-langchain

Install nodejs and npm

Install Node.js and npm. Official website: https://nodejs.org/en/download

Verfiy that everything is installed with the following terminal commands.

  • Your version may differ, it is only important that node has version 18 or higher.

image

Install all required packages

Open a terminal at the top level of the project and run "npm install" to install all necessary dependencies.

image

Create a ".env"

Create a file named ".env" inside your project with the following content:

OPENAI_API_KEY = 

PINECONE_API_KEY = 
PINECONE_ENVIRONMENT = 

PINECONE_INDEX_NAME = 

image

Openai api key

Go to https://platform.openai.com/overview and got "View API keys".

image

Click "+ Create new secrete key" and copy the key in your ".env" file image

Pinecone api key

Go to https://app.pinecone.io/organizations/ and create an api key there as well. Add them to the ".env" file.

image

Pinecone index

Create a new pinecone index with the Dimension 1536.

image

It should look similar to this:

image

Add pinecone index to the ".env" file

Add the information from your index correct to the .env file. At this point it should look like this:

image

(Optional) Set namespace in config/pinecone.ts

Give the namespace for your vectors a name in "config/pinecone.ts".

image

All vectors that are ingested will be available by this namespace in your pinecone index. The name you defined in "config/pinecone.ts" will be used to store your ingested data in this namespace, but also lookup your data on a question to the chatbot.

  • It is not possible to search data in multiple namespaces simultaneously.

image

Docs folder

Create a folder "docs" where you have your pdf files.

image

Ingest your pdf

Run "npm run ingest" in a terminal. This will ingest your pdf into the pinecone database. Remove the pdf document from the "docs" folder, otherwise it will be ingested a second time if you run "npm run ingest" again.

The terminal must be in the project folder at the top level. I have installed "gpt4-pdf-chatbot-langchain" in "Documents". image

Start appliation

Start the application with "npm run dev" in the terminal. Open a browser and go to the address http://localhost:3000/. You should see the project. image

Debugging

Alternative ingest with python

To trouble shoot it might be useful to take a look at my python reimplementation at ingest. It gives more control over each step of ingest.

https://github.com/ucl98/pinecone_ingest_python_implementation

Openai paid account

Check if you have a paid account. If you see the following message, you need to set up a paid account.

Openai free trial expired

You may used a free trial for the openai api. This trial expires and in this case you cannot use the openai api. For this case add a payed option. image image

Issues with GPT4 api key

Go to openai playgroud and check if you have access to GPT4. If the model is not available, you will have no access to it and need to use GPT3.5.

  • Access to the GPT4 api key does not depend on a ChatGPT subscription. You needd to apply for it.
image

Check if pinecone is working

You can make a pinecone api request on the webpage of pinecone. Go to the "Query" tab, enter the namespace, approve the namespace and then hit "query". If it returns a result, the connection to pinecone is working. image image

Api keys

Another issue can be that your api key is not working. Create a new one and test it again. image

Contribution

  • ucl
  • angelina-magidova-synder
  • chaudhary_181
  • jchermy

gpt4_langchain_pdf-detailed-walktrough's People

Contributors

ucl98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gpt4_langchain_pdf-detailed-walktrough's Issues

Pinecone npm run ingest gives 429 error

c:\Users\KaihaoLiu\Desktop\AI\onepdf\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:46
throw new Error('Failed to ingest your data');
^

Error: Failed to ingest your dataError: Request failed with status code 429
at run (c:\Users\KaihaoLiu\Desktop\AI\onepdf\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:46:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at (c:\Users\KaihaoLiu\Desktop\AI\onepdf\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51:3)

Node.js v18.14.0

The pdf is extremely short, what is the problem here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.