mayooear / gpt4-pdf-chatbot-langchain Goto Github PK

View Code? Open in Web Editor NEW

14.6K 147.0 3.0K 5.34 MB

GPT4 & LangChain Chatbot for large PDF docs

Home Page: https://www.youtube.com/watch?v=ih9PBGVVOO4

TypeScript 80.73% JavaScript 1.90% CSS 17.37%

gpt4 langchain nextjs openai pdf typescript

gpt4-pdf-chatbot-langchain's People

Contributors

Stargazers

Watchers

Forkers

tractortoby lehmannerich kriswilkinson cjnama andersonbarutti kuddah kwaku lerayan podsnigame rossman22590 passcraft clockworkomni thetejmahal christophercelaya codeaudit linsanity15 rajendharmendra fabiorizzomatos redheli dhnanjay glaceage nicholasfields shinework faisal-alsrheed hirajanwin aicob pdkyll knightcn1983 tony163163 jansxue skymsg cyoyo-geek bozliu whitecrow skyjacker kennethjefferson abekam banqingyuan starkbear leeeboo yifree longtianye ideal19dev20 hljpeter liusishan mengxingshike2012 mkrbm 0xdigiscore mulinfro heng-xiu saifeilee yuezheng2006 bodhish jogenius22 haricharan64 alreadybone kellalovemems haydenwu2048 abderrazzakchabab genkgz570 hardsun-cn ankri antback roryfriday edanweis samsonhoi rhimanshu909 mtfelix bharatsingla12 linecode chaos182 podsnizero meetayue zempet aryaky manuaaq cgierman ailabteam miracle1103 0xkydo dianxer12 cicilalala g7matos hhy5277 hung-ngm peins hertera1 nicolofranceschi ddr12 loglnx xdarabseh yousra-aoudi stanleyjacob areeb1245 chiho13 hanw brentes dkrusenstrahle stanleycyang avkumar

gpt4-pdf-chatbot-langchain's Issues

Does this css support large screens?

When screen is large its sits to the left of the screen any idea where to fix this css?

Chat history is not included in the prompt (model is not aware of previous interactions)

Hi,

I have realized that the chat history is not included in the prompt. The logic for this is implemented, but the model is not aware of previous interactions (see the screenshot attached).

    //Ask a question
    const response = await chain.call({
      question: sanitizedQuestion,
      chat_history: history || [],
    });

I feel that this is a bug or weird behaviour of Langchain, but I'm not sure. What do you think? Can you spot a bug? Is there any solution?

Ability to have different QA_PROMPT to choose from

can we have the ability to have Multiple embedded QA_PROMPT for different use cases

like this project : https://github.com/jas3333/gptchat_pinecone.

regards

How to use HNSWLib to create and consume a vector store?

Hey hi!

First of all thank you for all the work here. I really appreciate it.

I've been trying to create a script similar to ingest-data but using HNSWLib. So far no luck, I've followed langchain's example from the docs as well as the examples from HNSWLib-node docs but no luck yet. I keep getting failed to ingest. As I look into the error it seems I'm not splitting the pdf correctly.

Any guidance here would be appreciated.

catch an error when run ingest

Hello Mayooear,
Thanks for your great job. I'm locally running your code, when run the npm run ingest command, I get errors below:

 creating vector store...
error [TypeError: t.replaceAll is not a function]
(node:32832) UnhandledPromiseRejectionWarning
(Use `node --trace-warnings ...` to show where the warning was created)
(node:32832) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 27)
(node:32832) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code

What is the cause? Thanks.

always waiting for response after one answer.

the bot won't answer twice.....

the showing question is for test, I change question for many times.

Getting an error on ingestion

Hi, thanks for the amazing video and repo.

I have just cloned it to start playing around, however when I go to ingest I get the following error whcih I cant seem to debug.

error [TypeError: t.replaceAll is not a function]

Any suggestions most welcome.

error [TypeError: t.replaceAll is not a function]

Document {
pageContent: 'But, it added, this “is not a case in which the record contains \n' +
'sufficient factual findings upon which we could base a de\n' +
'novo assessment of Google’s affirmative defense of fair use.” \n' +
'Id., at 1377. And it remanded the case for another trial on \n' +
'that question. Google petitioned this Court for a writ of cer-\n' +
'tiorari, seeking review of the Federal Circuit’s copyrighta-\n' +
'bility determination. We denied the petition. Google, Inc. \n' +
'v. Oracle America, Inc., 576 U. S. 1071 (2015).\n' +
'On remand the District Court, sitting with a jury, heard\n' +
'evidence for a week. The court instructed the jury to an-\n' +
'swer one question: Has Google “shown by a preponderance\n' +
'of the evidence that its use in Android” of the declaring code',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: '11 Cite as: 593 U. S. ____ (2021) \n' +
'Opinion of the Court \n' +
'and organizational structure contained in the 37 Sun Java\n' +
'API packages that it copied “constitutes a ‘fair use’ under \n' +
'the Copyright Act?” App. 294. After three days of deliber-\n' +
'ation the jury answered the question in the affirmative. Id., \n' +
'at 295. Google had shown fair use.\n' +
'Oracle again appealed to the Federal Circuit. And the \n' +
'Circuit again reversed the District Court. The Federal Cir-\n' +
'cuit assumed all factual questions in Google’s favor. But, it \n' +
'said, the question whether those facts constitute a “fair use” \n' +
'is a question of law. 886 F. 3d, at 1193. Deciding that ques-\n' +
'tion of law, the court held that Google’s use of the Sun Java \n' +
'API was not a fair use. It wrote that “[t]here is nothing fair \n' +
'about taking a copyrighted work verbatim and using it for \n' +
'the same purpose and function as the original in a compet-\n' +
'ing platform.” Id., at 1210. It remanded the case again,',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: 'about taking a copyrighted work verbatim and using it for \n' +
'the same purpose and function as the original in a compet-\n' +
'ing platform.” Id., at 1210. It remanded the case again,\n' +
'this time for a trial on damages.\n' +
'Google then filed a petition for certiorari in this Court. It \n' +
'asked us to review the Federal Circuit’s determinations as \n' +
'to both copyrightability and fair use. We granted its\n' +
'petition. \n' +
'III \n' +
'A \n' +
'Copyright and patents, the Constitution says, are to “pro-\n' +
'mote the Progress of Science and useful Arts, by securing \n' +
'for limited Times to Authors and Inventors the exclusive \n' +
'Right to their respective Writings and Discoveries.” Art. I, \n' +
'§8, cl. 8. Copyright statutes and case law have made clear\n' +
'that copyright has practical objectives. It grants an author\n' +
'an exclusive right to produce his work (sometimes for a \n' +
'hundred years or more), not as a special reward, but in or-\n' +
'der to encourage the production of works that others might',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: 'an exclusive right to produce his work (sometimes for a \n' +
'hundred years or more), not as a special reward, but in or-\n' +
'der to encourage the production of works that others might \n' +
'reproduce more cheaply. At the same time, copyright \n' +
'has negative features. Protection can raise prices to con-\n' +
'sumers. It can impose special costs, such as the cost of con-\n' +
'tacting owners to obtain reproduction permission. And the',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: '12 GOOGLE LLC v. ORACLE AMERICA, INC. \n' +
'Opinion of the Court \n' +
'exclusive rights it awards can sometimes stand in the way\n' +
'of others exercising their own creative powers. See gener-\n' +
'ally Twentieth Century Music Corp. v. Aiken, 422 U. S. 151, \n' +
'156 (1975); Mazer v. Stein, 347 U. S. 201, 219 (1954).\n' +
'Macaulay once said that the principle of copyright is a\n' +
'“tax on readers for the purpose of giving a bounty to writ-\n' +
'ers.” T. Macaulay, Speeches on Copyright 25 (E. Miller ed. \n' +
'1913). Congress, weighing advantages and disadvantages, \n' +
'will determine the more specific nature of the tax, its \n' +
'boundaries and conditions, the existence of exceptions and \n' +
'exemptions, all by exercising its own constitutional power \n' +
'to write a copyright statute. \n' +
'Four provisions of the current Copyright Act are of par-\n' +
'ticular relevance in this case. First, a definitional provision',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: 'to write a copyright statute. \n' +
'Four provisions of the current Copyright Act are of par-\n' +
'ticular relevance in this case. First, a definitional provision \n' +
'sets forth three basic conditions for obtaining a copyright.\n' +
'There must be a “wor[k] of authorship,” that work must be\n' +
'“original,” and the work must be “fixed in any tangible me-\n' +
'dium of expression.” 17 U. S. C. §102(a); see also Feist Pub-\n' +
'lications, Inc. v. Rural Telephone Service Co., 499 U. S. 340, \n' +
'345 (1991) (explaining that copyright requires some origi-\n' +
'nal “creative spark” and therefore does not reach the facts \n' +
'that a particular expression describes). \n' +
'Second, the statute lists certain kinds of works that cop-\n' +
'yright can protect. They include “literary,” “musical,” “dra-\n' +
'matic,” “motion pictur[e],” “architectural,” and certain \n' +
'other works. §102(a). In 1980, Congress expanded the\n' +
'reach of the Copyright Act to include computer programs. \n' +
'And it defined “computer program” as “‘a set of statements',
metadata: { source: 'docs/18-956_d18f.pdf' }
},
Document {
pageContent: 'other works. §102(a). In 1980, Congress expanded the\n' +
'reach of the Copyright Act to include computer programs. \n' +
'And it defined “computer program” as “‘a set of statements \n' +
'or instructions to be used directly or indirectly in a com-\n' +
'puter in order to bring about a certain result.’” §10, 94 Stat.\n' +
'3028 (codified at 17 U. S. C. §101).\n' +
'Third, the statute sets forth limitations on the works that \n' +
'can be copyrighted, including works that the definitional\n' +
'provisions might otherwise include. It says, for example,\n' +
'that copyright protection cannot be extended to “any idea, \n' +
'procedure, process, system, method of operation, concept,',
metadata: { source: 'docs/18-956_d18f.pdf' }
}
]
before embedDocuments
error [TypeError: t.replaceAll is not a function]
/Users/truffles/Library/Mobile Documents/com~~apple~~CloudDocs/Data Science/My Projects/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:56
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

Node.js v18.15.0

Error: Failed to ingest your data - 401 but credentials are valid

Hello all. Looking for a little help - I've walked through this step by step and apparently I need another set of eyes.

I read through the issues but none were quite the same as this. :\

I have double checked all API keys and I've tried multiple Pinecone databases, although this seems to point to unauthorized which doesn't make sense because the OpenAI key is valid.

If anyone can see what I'm missing I'd appreciate it. 💯

` url: 'https://api.openai.com/v1/embeddings'
},
request: Request {
[Symbol(realm)]: { settingsObject: [Object] },
[Symbol(state)]: {
method: 'POST',
localURLsOnly: false,
unsafeRequest: false,
body: [Object],
client: [Object],
reservedClient: null,
replacesClientId: '',
window: 'client',
keepalive: false,
serviceWorkers: 'all',
initiator: '',
destination: '',
priority: null,
origin: 'client',
policyContainer: 'client',
referrer: 'client',
referrerPolicy: '',
mode: 'cors',
useCORSPreflightFlag: false,
credentials: 'same-origin',
useCredentials: false,
cache: 'default',
redirect: 'follow',
integrity: '',
cryptoGraphicsNonceMetadata: '',
parserMetadata: '',
reloadNavigation: false,
historyNavigation: false,
userActivation: false,
taintedOrigin: false,
redirectCount: 0,
responseTainting: 'basic',
preventNoCacheCacheControlHeaderModification: false,
done: false,
timingAllowFailed: false,
headersList: [HeadersList],
urlList: [Array],
url: [URL]
},
[Symbol(signal)]: AbortSignal { aborted: false },
[Symbol(headers)]: HeadersList {
[Symbol(headers map)]: [Map],
[Symbol(headers map sorted)]: null
}
},
response: {
ok: false,
status: 401,
statusText: 'Unauthorized',
headers: HeadersList {
[Symbol(headers map)]: [Map],
[Symbol(headers map sorted)]: null
},
config: {
transitional: [Object],
adapter: [AsyncFunction: fetchAdapter],
transformRequest: [Array],
transformResponse: [Array],
timeout: 0,
xsrfCookieName: 'XSRF-TOKEN',
xsrfHeaderName: 'X-XSRF-TOKEN',
maxContentLength: -1,
maxBodyLength: -1,
validateStatus: [Function: validateStatus],
headers: [Object],
method: 'post',
data: '{"model":"text-embedding-ada-002","input":["551US2 Unit: $U68 [09-20-11 18:50:10] PAGES PGT: OPIN 393 OCTOBER TERM, 2006 Syllabus MORSE et al. v. FREDERICK certiorari to the united states court of appeals for the ninth circuit No. 06–278. Argued March 19, 2007—Decided June 25, 2007 At a school-sanctioned and school-supervised event, petitioner Morse, the high school principal, saw students unfurl a banner stating “BONG HiTS 4 JESUS,” which she regarded as promoting illegal drug use. Consistent with established school policy prohibiting such messages at school events, Morse directed the students to take down the banner. When one of the students who had brought the banner to the event— respondent Frederick—refused, Morse confiscated the banner and later suspended him. The school superintendent upheld the suspension, ex plaining, inter alia, that Frederick was disciplined because his banner","suspended him. The school superintendent upheld the suspension, ex plaining, inter alia, that Frederick was disciplined because his banner appeared to advocate illegal drug use in violation of school policy. Peti tioner school board also upheld the suspension. Frederick filed suit under 42 U. S. C. § 1983, alleging that the school board and Morse had violated his First Amendment rights. The District Court granted peti tioners summary judgment, ruling that they were entitled to qualified immunity and that they had not infringed Frederick’s speech rights. The Ninth Circuit reversed. Accepting that Frederick acted during a school-authorized activity and that the banner expressed a positive sen timent about marijuana use, the court nonetheless found a First Amend ment violation because the school punished Frederick without demon strating that his speech threatened substantial disruption. It also","ment violation because the school punished Frederick without demon strating that his speech threatened substantial disruption. It also concluded that Morse was not entitled to qualified immunity because Frederick’s right to display the banner was so clearly established that a reasonable principal in Morse’s position would have understood that her actions were unconstitutional. Held: Because schools may take steps to safeguard those entrusted to their care from speech that can reasonably be regarded as encouraging illegal drug use, the school officials in this case did not violate the First Amendment by confiscating the pro-drug banner and suspending Fred erick. Pp. 400–410. (a) Frederick’s argument that this is not a school speech case is re jec ted. The event in question occurred during normal school hours and was sanctioned by Morse as an approved social event at which the dis","jec ted. The event in question occurred during normal school hours and was sanctioned by Morse as an approved social event at which the dis trict’s student conduct rules expressly applied. Teachers and adminis trators were among the students and were charged with supervising them. Frederick stood among other students across the street from"]}',
url: 'https://api.openai.com/v1/embeddings'
},
request: Request {
[Symbol(realm)]: [Object],
[Symbol(state)]: [Object],
[Symbol(signal)]: [AbortSignal],
[Symbol(headers)]: [HeadersList]
},
data: { error: [Object] }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}
e:\DEV\LANG\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

Node.js v18.14.0
ELIFECYCLE Command failed with exit code 1.`

Pinecone error

I keep getting this error:

error [Error: PineconeClient: Error calling query: Error: PineconeClient: Error calling queryRaw: FetchError: The request failed and the interceptors did not return an alternative response]

when i try to ask a question... ingestion also doesn't work...

I've double checked my env settings as well as pinecone config.

any idea what this could be?

Error getting project name: Fetch from Pinecone /whoami

I'm having trouble to connect to the pinecone index,
When I run the project and types in a question I get this error :

PineconeClient: Error getting project name: FetchError: request to https://controller.documents-2xxxxxx.svc.us-central1-gcp.pinecone.io.pinecone.io/actions/whoami

By reading another issue I found out that in the node_modules/@pinecone-database there is a switch in the library that creates this variable
whoami = "".concat(controllerPath, "/actions/whoami");

If I remove the concat and hardcode the path, then it actually works, but not really ideal when rebuilding the node and deploying.

any one else having this problem? or know how to fix it.
Thanks in advance

Unhandled Runtime Error

When I use chinese to ask him, then I can see this error.

Unhandled Runtime Error

AbortError: The operation was aborted. 
Source

pages\index.tsx (104:17) @ abort

  102 |   }));
  103 |   setLoading(false);
> 104 |   ctrl.abort();
      |       ^
  105 | } else {
  106 |   const data = JSON.parse(event.data);
  107 |   if (data.sourceDocs) {

Unhandled Runtime Error

AbortError: The operation was aborted. 
Call Stack
onVisibilityChange
node_modules\.pnpm\@[email protected]\node_modules\@microsoft\fetch-event-source\lib\esm\fetch.js (25:0)

But the CMD can find the ansower sometimes.

My node's is v18.15.0, npm is v9.5.0, system is Windows Server 2022 Datacenter 21H2 20348.169.
I changed makechain.ts' line15-26 into

  `您是一个AI助手，提供有用的建议。您得到了以下长文档的提取部分和一个问题。根据提供的上下文等进行交流式回答。
  您既可以引用下面上下文的超链接，又可以自己思考回答。但是不要编造超链接。
  如果您在下面的上下文中找不到答案，请说“嗯，我不确定。”然后请说出自行回答的内容“我的想法是：”。但不要试图编造答案。
  如果问题与上下文无关，请礼貌地回应，您只能回答与上下文有关的问题。您尽可能用中文回答，可以中英对照回答

Question: {question}
=========
{context}
=========
Answer in Markdown:`,
);```
So what can I do to solve the wrong?

error on launch when i send message

next-dev.js?613c:20 Warning: Each child in a list should have a unique "key" prop.

Check the render method of Home. See https://reactjs.org/link/warning-keys for more information.
at Home (webpack-internal:///./pages/index.tsx:30:78)
at main
at MyApp (webpack-internal:///./pages/_app.tsx:12:11)
at PathnameContextProviderAdapter (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/shared/lib/router/adapters.js:62:11)
at ErrorBoundary (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/compiled/@next/react-dev-overlay/dist/client.js:301:63)
at ReactDevOverlay (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/compiled/@next/react-dev-overlay/dist/client.js:850:919)
at Container (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/client/index.js:62:1)
at AppContainer (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/client/index.js:172:11)
at Root (webpack-internal:///./node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/client/index.js:347:11)

please teach us to deploy

Hey @mayooear thank you so much for your incredible work and sharing it with the world.
For those of us, who are not very experienced, can you pls share your recommendations with step-by-step how to deploy it on vercel or any other platform? Vercel in my case keeps throwing tantrums.
Thank you!

PINECONE_NAME_SPACE

it is possible to not have PINECONE_NAME_SPACE name configured in config file.
the reasons its i already have a pinecone DB with Vectors without namespace.

regards

chat history

I lost my chat history, how i can kepp my chat history?

Error on launch

This is on localhost and I get same error on Vercel:

  type: 'apiMessage'
}

],
history: [],
pendingSourceDocs: []
}
Warning: Each child in a list should have a unique "key" prop.

Check the top-level render call using

. See https://reactjs.org/link/warning-keys for more information.
at Home (webpack-internal:///./pages/index.tsx:32:78)
at main
at MyApp (webpack-internal:///./pages/_app.tsx:14:18)
at StyleRegistry (C:\Users\Ross\Desktop\Github Projects\gpt4-pdf-chatbot-langchain-1\node_modules\styled-jsx\dist\index\index.js:449:36)
at PathnameContextProviderAdapter (C:\Users\Ross\Desktop\Github Projects\gpt4-pdf-chatbot-langchain-1\node_modules\next\dist\shared\lib\router\adapters.js:60:11)
at AppContainer (C:\Users\Ross\Desktop\Github Projects\gpt4-pdf-chatbot-langchain-1\node_modules\next\dist\server\render.js:291:29)
at AppContainerWithIsomorphicFiberStructure (C:\Users\Ross\Desktop\Github Projects\gpt4-pdf-chatbot-langchain-1\node_modules\next\dist\server\render.js:327:57)
at div
at Body (C:\Users\Ross\Desktop\Github Projects\gpt4-pdf-chatbot-langchain-1\node_modules\next\dist\server\render.js:614:21)
wait - compiling /api/chat (client and server)...
event - compiled successfully in 93 ms (75 modules)

Auth / 401 error on ingestion

I have updated the pinecone and OpenAI API keys in the env file, and my pinecone index matches the pinecone index name in pinecone.ts.

It seems that it's all fine on Pinecone's end. My pinecone API key seems to be fine as I can run curl -i https://controller.us-central1-gcp.pinecone.io/actions/whoami -H 'API_KEY' without receiving an error. The initialisation of Pinecone in pinecone-client.ts seems to be fine with no errors.

I'm thinking it's a problem when dealing with the OpenAI API. My OpenAI API key has access to GPT-4 and works fine in another chatbot I created, so I know that's ok.

What seems weird (to my limited knowledge of the embeddings API) is that when I log the embeddings instance, the apiKey value is not my API key. It seems to look like an OpenAI API key but it is not mine. The key also doesn't exist anywhere in the codebase. This may be normal but it seems weird as this is also the place that it's throwing an error.

Here is the output when I log embeddings:

EMBEDDINGS:  OpenAIEmbeddings {
  modelName: 'text-embedding-ada-002',
  batchSize: 512,
  maxRetries: 6,
  stripNewLines: true,
  apiKey: 'sk-9Ng******j3Wun',
  client: undefined
}

I have tried creating a new openai key but still results in the same error.
It might be a problem on OpenAI's end. This seems a little similar to the problem here - https://community.openai.com/t/api-key-trouble/20017/10

error when start and send message

Hi there, when I enter the web page and send a message, I got the error below and did not get any response.

 messages: [                                                                                                                              
    {                                                                                                                                      
      message: 'Hi, what would you like to learn about this legal case?',                                                                  
      type: 'apiMessage'                                                                                                                   
    }                                                                                                                                      
  ],                                                                                                                                       
  history: [],                                                                                                                             
  pendingSourceDocs: []                                                                                                                    
}                                                                                                                                          
Warning: Each child in a list should have a unique "key" prop.                                                                             
                                                                                                                                           
Check the top-level render call using <div>. See https://reactjs.org/link/warning-keys for more information.                               
    at Home (webpack-internal:///./pages/index.tsx:32:78)                                                                                  
    at main                                                                                                                                
    at MyApp (webpack-internal:///./pages/_app.tsx:14:18)                                                                                  
    at StyleRegistry (/home/ubuntu/gpt/gpt4-pdf-chatbot-langchain/node_modules/.pnpm/[email protected][email protected]/node_modules/styled-jsx/dist/index/index.js:449:36)                                                                                                                
    at PathnameContextProviderAdapter (/home/ubuntu/gpt/gpt4-pdf-chatbot-langchain/node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/shared/lib/router/adapters.js:60:11)                                                                              
    at AppContainer (/home/ubuntu/gpt/gpt4-pdf-chatbot-langchain/node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/server/render.js:291:29)                                                                                                            
    at AppContainerWithIsomorphicFiberStructure (/home/ubuntu/gpt/gpt4-pdf-chatbot-langchain/node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/server/render.js:327:57)                                                                                
    at div                                                                                                                                     at Body (/home/ubuntu/gpt/gpt4-pdf-chatbot-langchain/node_modules/.pnpm/[email protected]_biqbaboplfbrettd7655fr4n2y/node_modules/next/dist/s
erver/render.js:614:21)                                                                                                                    messageState {                                                                                                                             
  messages: [                                                                                                                                  {                                                                                                                                      
      message: 'Hi, what would you like to learn about this legal case?',                                                                        type: 'apiMessage'                                                                                                                   
    }                                                                                                                                        ],                                                                                                                                       
  history: [],                                                                                                                               pendingSourceDocs: []                                                                                                                    
}                                                                                                                                          wait  - compiling /api/chat (client and server)...                                                                                         
event - compiled successfully in 80 ms (75 modules)                                                                                        us-central1-gcp                                                                                                                            
error [Error: PineconeClient: Error calling query: Error: PineconeClient: Error calling queryRaw: FetchError: The request failed and the interceptors did not return an alternative response]                                                                                         
error [Error: PineconeClient: Error calling query: Error: PineconeClient: Error calling queryRaw: FetchError: The request failed and the interceptors did not return an alternative response]                                                                                         
error [Error: PineconeClient: Error calling query: Error: PineconeClient: Error calling queryRaw: FetchError: The request failed and the interceptors did not return an alternative response]

[Error: PineconeClient: Error calling upsert: TypeError: stream.getReader is not a function]

When running the ingest script, I am always getting this error:
error [Error: PineconeClient: Error calling upsert: TypeError: stream.getReader is not a function] [Error: Failed to ingest your data]

error [TypeError: this.embeddings.embedQuery is not a function]

error [TypeError: this.embeddings.embedQuery is not a function]
I've already had vectors stored in pinecone, and after submitting a question, there is an error.

“npm run ingest”, “network error”, clashx，解决方案：clashx-pro

如果是 mac 电脑遇到这个问题，可以安装 clashx-pro 解决，开启增强模式与全局代理后可以解决，分享给受折磨的朋友

If you encounter this problem on a Mac computer, you can install ClashX-Pro to solve it. After enabling the enhanced mode and global proxy, the problem can be solved. Share it with friends who are suffering from the same issue.

download：https://itlanyan.com/download.php?filename=/v2/macos/ClashX-Pro-v1.72.0.4.dmg

Extrapolate for Multiple individual pdf docs

Can we do this for multiple pdf documents or is it for a single pdf. Sorry I haven't gone through the code yet.

How would I ingest text files instead of PDF files?

I have text files generated by Whisper API that I would like to ingest, instead of PDFs. I did also try saving a Google Doc in PDF format but the PDF loader does not seem to load the text correctly.

PDF:

Context included in ChatGPT prompt:

Error: Failed to ingest your data

I extracted parts of the code except for the pdf content. Please I just can't find the bug.....

new issue with pinecone now - any idea how to solve?

us-east1-gcp
PineconeClient: Error getting project name: FetchError: invalid json response body at https://controller.us-east1-gcp.pinecone.io/actions/whoami reason: Unexpected token A in JSON at position 0
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}
error - [Error: PineconeClient: Project name not set. Call init() first.] {
page: '/api/chat'
}

use http proxy error

project in the

import fetchAdapter from "../util/axios-fetch-adapter.js";

const clientConfig = new Configuration({
              ...this.clientConfig,
              baseOptions: { adapter: fetchAdapter },
          });

im edit this code

···
const clientConfig = new Configuration({
...this.clientConfig,
// baseOptions: { adapter: fetchAdapter },
});
const tunnelProxy = tunnel.httpsOverHttp({
proxy: {
host: '127.0.0.1',
port: 7890
},
});
// use custom axios instance
const customAxios = axios.create({
proxy: false,
httpsAgent: tunnelProxy,
});

        this.batchClient = new OpenAIApi(clientConfig,BASE_PATH,customAxios);

···

Commented on this line

// baseOptions: { adapter: fetchAdapter },

scripts/ingest-data.ts by with http proxy run
but utils/makechain.ts is error
Because this conversation uses langchain/llms module ,in the module request code

 /** @ignore */
    async completionWithRetry(request) {
        if (!request.stream && !this.batchClient) {
            const clientConfig = new Configuration({
                ...this.clientConfig,
                baseOptions: { adapter: fetchAdapter },
            });
            this.batchClient = new OpenAIApi(clientConfig);
        }
        if (request.stream && !this.streamingClient) {
            const clientConfig = new Configuration(this.clientConfig);
            this.streamingClient = new OpenAIApi(clientConfig);
        }
        const client = !request.stream ? this.batchClient : this.streamingClient;
        const makeCompletionRequest = async () => client.createChatCompletion(request, request.stream ? { responseType: "stream" } : undefined);
        return backOff(makeCompletionRequest, {
            startingDelay: 4,
            maxDelay: 10,
            numOfAttempts: this.maxRetries,
            // TODO(sean) pass custom retry function to check error types.
        });
    }

at if (request.stream && !this.streamingClient) the line ,stream server error

I haven't found the proxy configuration for the fetchAdapter adapter from many places. Can I replace it with another adapter?
im not open vpn .

Use it as translation bot

First of all, thanks for creating the code that allow people to feed in PDF.

I was trying to change the prompt to something like

"`You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question. "

However, it seems the program do not have the concept of page number. When i try to tell the bot to translate page 1 into English, it will return some random page and do the translation. I wonder if this bot is able to work like a translator from some foreign language into English?

My ultimate goal is to feed in a foreign language pdf and it will translate into a English PDF that I can download.

Thanks.

Vectors not propperly uploaded to Pinecone

Eventhough the ingest execution finishes ok, the vectors dont seem OK:

The Pinecode dimension is set to 1536, and the query gives same result as you:

{"vector":[0,0,0......0,0],
"topK":5,
"includeMetadata":true,
"includeValues":true,
"namespace":""}

This is an example of the result of the fetch

{
"vectors":{}
"namespace":"demo"
}

How is this possible?, is it because of the chunk size, or other parameter?

Thanks in advance

translation issue with gpt-3.5-turbo

I uploaded a pdf with english, and asked a Spanish question, its give me answer with english but i wanna get the answer with Spanish
https://prnt.sc/U3Sw2VhJPrqW

Running Ingest from Colab

I'm running into the following error when trying to ingest data to Pinecone running gpt4-pdf-chatbot-langchain in a Colab notebook. I have properly set the PINECONE_INDEX_NAME, and created a new PINECONE_NAME_SPACE in the pinecone.ts file. I have set the environment variables using %env (e.g., %env PINECONE_ENVIRONMENT='us-west4-gcp').

This is the result of running ! pnpm run ingest:

> [email protected] ingest /content/gdrive/MyDrive/[Path]/gpt4-pdf-chatbot-langchain
> tsx -r dotenv/config scripts/ingest-data.ts

'us-west4-gcp'
PineconeClient: Error getting project name: TypeError: fetch failed
error [Error: EISDIR: illegal operation on a directory, read] {
  errno: -21,
  code: 'EISDIR',
  syscall: 'read'
}
/content/gdrive/MyDrive/[Path]/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:51
    throw new Error('Failed to ingest your data');
          ^


[Error: Failed to ingest your data]

Node.js v19.8.1
 ELIFECYCLE  Command failed with exit code 1.

Racking My Brain - Failing To Ingest

Have read the documentation over, and over, and over... Have created new API keys on OpenAI / Pinecone. Have double, and triple checked the name space, index name, and environment are 100% correct.

** Visual Studio 2022 Developer PowerShell v17.5.3
** Copyright (c) 2022 Microsoft Corporation

PS C:\Users\Randall Cornett\Source\Repos\mayooear\gpt4-pdf-chatbot-langchain> url: 'https://api.openai.com/v1/embeddings'

},
request: Request {
  [Symbol(realm)]: [Object],
  [Symbol(state)]: [Object],
  [Symbol(signal)]: [AbortSignal],
  [Symbol(headers)]: [HeadersList]
},
data: { error: [Object] }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}
c:\Users\Randall Cornett\source\repos\mayooear\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:49
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

Anything to help me fix this would be greatly appreciated. Spent like 6 hours last night and made like no progress.

Chatbot pdf

Live Deployment Issues

I'm having a couple of issues with a live deployment to Vercel that I don't have in the local environment. Requests are often ending with An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT. Looking in the Vercel function logs you can actually see responses are getting created, but just ending in a 504. Additionally when the request is returned successfully there is no live text printout, the answer just shows up when completed. Everything works well in the local environment however. Anyone else experiencing this?

Empty bot response - `POST /v1/chat/completions` yields "404 Not Found"

Thank you for providing such a powerful tool for the community. I do own GPT-4 access, and can't wait to try this with painful documents of mine.

I've encountered an issue while using the app, I wanted to bring this to your attention.
Bootstrap and pnpm run dev works flawless - great job! I am able to send a message. But I never get an answer from the bot.

Terminal tells me the POST on https://api.openai.com/v1/chat/completions yields a 404. UI shows an empty bot response. I removed sensitive information.

Terminal message:

error [Error: Request failed with status code 404] {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [Function: httpAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer <removed>',
      'Content-Length': 2823
    },
    method: 'post',
    responseType: 'stream',
    data: `{"model":"gpt-4","temperature":0,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":true,"messages":[{"role":"user","content":"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\\nYou should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.\\nIf you can't find the answer in the context below, just say \\"Hmm, I'm not sure.\\" Don't try to make up an answer.\\nIf the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\\n\\nQuestion: <removed>\\n=========\\nAnswer in Markdown:"}]}`,
    url: 'https://api.openai.com/v1/chat/completions'
  },
  request: <ref *1> ClientRequest {
    _events: [Object: null prototype] {
      abort: [Function (anonymous)],
      aborted: [Function (anonymous)],
      connect: [Function (anonymous)],
      error: [Function (anonymous)],
      socket: [Function (anonymous)],
      timeout: [Function (anonymous)],
      prefinish: [Function: requestOnPrefinish]
    },
    _eventsCount: 7,
    _maxListeners: undefined,
    outputData: [],
    outputSize: 0,
    writable: true,
    destroyed: false,
    _last: true,
    chunkedEncoding: false,
    shouldKeepAlive: false,
    maxRequestsOnConnectionReached: false,
    _defaultKeepAlive: true,
    useChunkedEncodingByDefault: true,
    sendDate: false,
    _removedConnection: false,
    _removedContLen: false,
    _removedTE: false,
    _contentLength: null,
    _hasBody: true,
    _trailer: '',
    finished: true,
    _headerSent: true,
    _closed: false,
    socket: TLSSocket {
      _tlsOptions: [Object],
      _secureEstablished: true,
      _securePending: false,
      _newSessionPending: false,
      _controlReleased: true,
      secureConnecting: false,
      _SNICallback: null,
      servername: 'api.openai.com',
      alpnProtocol: false,
      authorized: true,
      authorizationError: null,
      encrypted: true,
      _events: [Object: null prototype],
      _eventsCount: 9,
      connecting: false,
      _hadError: false,
      _parent: null,
      _host: 'api.openai.com',
      _readableState: [ReadableState],
      _maxListeners: undefined,
      _writableState: [WritableState],
      allowHalfOpen: false,
      _sockname: null,
      _pendingData: null,
      _pendingEncoding: '',
      server: undefined,
      _server: null,
      ssl: [TLSWrap],
      _requestCert: true,
      _rejectUnauthorized: true,
      parser: null,
      _httpMessage: [Circular *1],
      [Symbol(res)]: [TLSWrap],
      [Symbol(verified)]: true,
      [Symbol(pendingSession)]: null,
      [Symbol(async_id_symbol)]: 137487,
      [Symbol(kHandle)]: [TLSWrap],
      [Symbol(kSetNoDelay)]: false,
      [Symbol(lastWriteQueueSize)]: 0,
      [Symbol(timeout)]: null,
      [Symbol(kBuffer)]: null,
      [Symbol(kBufferCb)]: null,
      [Symbol(kBufferGen)]: null,
      [Symbol(kCapture)]: false,
      [Symbol(kBytesRead)]: 0,
      [Symbol(kBytesWritten)]: 0,
      [Symbol(connect-options)]: [Object],
      [Symbol(RequestTimeout)]: undefined
    },
    _header: 'POST /v1/chat/completions HTTP/1.1\r\n' +
      'Accept: application/json, text/plain, */*\r\n' +
      'Content-Type: application/json\r\n' +
      'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
      'Authorization: Bearer <removed>\r\n' +
      'Content-Length: 2823\r\n' +
      'Host: api.openai.com\r\n' +
      'Connection: close\r\n' +
      '\r\n',
    _keepAliveTimeout: 0,
    _onPendingData: [Function: nop],
    agent: Agent {
      _events: [Object: null prototype],
      _eventsCount: 2,
      _maxListeners: undefined,
      defaultPort: 443,
      protocol: 'https:',
      options: [Object: null prototype],
      requests: [Object: null prototype] {},
      sockets: [Object: null prototype],
      freeSockets: [Object: null prototype] {},
      keepAliveMsecs: 1000,
      keepAlive: false,
      maxSockets: Infinity,
      maxFreeSockets: 256,
      scheduling: 'lifo',
      maxTotalSockets: Infinity,
      totalSocketCount: 1,
      maxCachedSessions: 100,
      _sessionCache: [Object],
      [Symbol(kCapture)]: false
    },
    socketPath: undefined,
    method: 'POST',
    maxHeaderSize: undefined,
    insecureHTTPParser: undefined,
    path: '/v1/chat/completions',
    _ended: false,
    res: IncomingMessage {
      _readableState: [ReadableState],
      _events: [Object: null prototype],
      _eventsCount: 1,
      _maxListeners: undefined,
      socket: [TLSSocket],
      httpVersionMajor: 1,
      httpVersionMinor: 1,
      httpVersion: '1.1',
      complete: true,
      rawHeaders: [Array],
      rawTrailers: [],
      aborted: false,
      upgrade: false,
      url: '',
      method: null,
      statusCode: 404,
      statusMessage: 'Not Found',
      client: [TLSSocket],
      _consuming: false,
      _dumped: false,
      req: [Circular *1],
      responseUrl: 'https://api.openai.com/v1/chat/completions',
      redirects: [],
      [Symbol(kCapture)]: false,
      [Symbol(kHeaders)]: [Object],
      [Symbol(kHeadersCount)]: 14,
      [Symbol(kTrailers)]: null,
      [Symbol(kTrailersCount)]: 0,
      [Symbol(RequestTimeout)]: undefined
    },
    aborted: false,
    timeoutCb: null,
    upgradeOrConnect: false,
    parser: null,
    maxHeadersCount: null,
    reusedSocket: false,
    host: 'api.openai.com',
    protocol: 'https:',
    _redirectable: Writable {
      _writableState: [WritableState],
      _events: [Object: null prototype],
      _eventsCount: 3,
      _maxListeners: undefined,
      _options: [Object],
      _ended: true,
      _ending: true,
      _redirectCount: 0,
      _redirects: [],
      _requestBodyLength: 2823,
      _requestBodyBuffers: [],
      _onNativeResponse: [Function (anonymous)],
      _currentRequest: [Circular *1],
      _currentUrl: 'https://api.openai.com/v1/chat/completions',
      [Symbol(kCapture)]: false
    },
    [Symbol(kCapture)]: false,
    [Symbol(kNeedDrain)]: false,
    [Symbol(corked)]: 0,
    [Symbol(kOutHeaders)]: [Object: null prototype] {
      accept: [Array],
      'content-type': [Array],
      'user-agent': [Array],
      authorization: [Array],
      'content-length': [Array],
      host: [Array]
    }
  },
  response: {
    status: 404,
    statusText: 'Not Found',
    headers: {
      date: 'Wed, 22 Mar 2023 11:22:14 GMT',
      'content-type': 'application/json; charset=utf-8',
      'content-length': '179',
      connection: 'close',
      vary: 'Origin',
      'x-request-id': '<removed>',
      'strict-transport-security': 'max-age=15724800; includeSubDomains'
    },
    config: {
      transitional: [Object],
      adapter: [Function: httpAdapter],
      transformRequest: [Array],
      transformResponse: [Array],
      timeout: 0,
      xsrfCookieName: 'XSRF-TOKEN',
      xsrfHeaderName: 'X-XSRF-TOKEN',
      maxContentLength: -1,
      maxBodyLength: -1,
      validateStatus: [Function: validateStatus],
      headers: [Object],
      method: 'post',
      responseType: 'stream',
      data: `{"model":"gpt-4","temperature":0,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":true,"messages":[{"role":"user","content":"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\\nYou should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.\\nIf you can't find the answer in the context below, just say \\"Hmm, I'm not sure.\\" Don't try to make up an answer.\\nIf the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\\n\\nQuestion: <removed>\\n=========\\nAnswer in Markdown:"}]}`,
      url: 'https://api.openai.com/v1/chat/completions'
    },
    request: <ref *1> ClientRequest {
      _events: [Object: null prototype],
      _eventsCount: 7,
      _maxListeners: undefined,
      outputData: [],
      outputSize: 0,
      writable: true,
      destroyed: false,
      _last: true,
      chunkedEncoding: false,
      shouldKeepAlive: false,
      maxRequestsOnConnectionReached: false,
      _defaultKeepAlive: true,
      useChunkedEncodingByDefault: true,
      sendDate: false,
      _removedConnection: false,
      _removedContLen: false,
      _removedTE: false,
      _contentLength: null,
      _hasBody: true,
      _trailer: '',
      finished: true,
      _headerSent: true,
      _closed: false,
      socket: [TLSSocket],
      _header: 'POST /v1/chat/completions HTTP/1.1\r\n' +
        'Accept: application/json, text/plain, */*\r\n' +
        'Content-Type: application/json\r\n' +
        'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
        'Authorization: Bearer <removed>\r\n' +
        'Content-Length: 2823\r\n' +
        'Host: api.openai.com\r\n' +
        'Connection: close\r\n' +
        '\r\n',
      _keepAliveTimeout: 0,
      _onPendingData: [Function: nop],
      agent: [Agent],
      socketPath: undefined,
      method: 'POST',
      maxHeaderSize: undefined,
      insecureHTTPParser: undefined,
      path: '/v1/chat/completions',
      _ended: false,
      res: [IncomingMessage],
      aborted: false,
      timeoutCb: null,
      upgradeOrConnect: false,
      parser: null,
      maxHeadersCount: null,
      reusedSocket: false,
      host: 'api.openai.com',
      protocol: 'https:',
      _redirectable: [Writable],
      [Symbol(kCapture)]: false,
      [Symbol(kNeedDrain)]: false,
      [Symbol(corked)]: 0,
      [Symbol(kOutHeaders)]: [Object: null prototype]
    },
    data: IncomingMessage {
      _readableState: [ReadableState],
      _events: [Object: null prototype],
      _eventsCount: 1,
      _maxListeners: undefined,
      socket: [TLSSocket],
      httpVersionMajor: 1,
      httpVersionMinor: 1,
      httpVersion: '1.1',
      complete: true,
      rawHeaders: [Array],
      rawTrailers: [],
      aborted: false,
      upgrade: false,
      url: '',
      method: null,
      statusCode: 404,
      statusMessage: 'Not Found',
      client: [TLSSocket],
      _consuming: false,
      _dumped: false,
      req: [ClientRequest],
      responseUrl: 'https://api.openai.com/v1/chat/completions',
      redirects: [],
      [Symbol(kCapture)]: false,
      [Symbol(kHeaders)]: [Object],
      [Symbol(kHeadersCount)]: 14,
      [Symbol(kTrailers)]: null,
      [Symbol(kTrailersCount)]: 0,
      [Symbol(RequestTimeout)]: undefined
    }
  },
  isAxiosError: true,
  toJSON: [Function: toJSON]
}

License?

Hi, there is no license shipped with the code, are you planning to add one?

Pinecone Ingest Error

It seems like there was an update to "chunk" it out while I was working on this, so it looks like a known problem. But, even with the chunking I am still getting:

error [Error: PineconeClient: Error calling upsert: Error: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]
/Users/n****l/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:50
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

Node.js v18.15.0

Any ideas would be great, thanks!

Don't support Chinese

I take this pdf file. http://www.zstack.io/docs/4.6.0/pdf-cn/PD4906%20ZStack%20Cloud%20V4.6.0%20%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98%E8%A7%A3%E7%AD%94%EF%BC%88FAQ%EF%BC%89.pdf

After ingest, I tried some queries. The answer is blank. But I can see correct response in the console log.

How to solve that?

Error: Failed to ingest your data

Unfortunately, I get this error when trying to ingest the PDF "creating vector store...
error [Error: PineconeClient: Error calling upsert: Error: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]
/Users/admin/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:43
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

" I get the same error with different PDFs and it just seems weird because the terminal shows parts of the tex of the pdf in green writing before the error message. Do you have any idea on how to fix this?

upload file

Hi @mayooear
would it be possible to have upload file button and run ingest-data with the uploaded file? Or upload the file directly to pinecone?

[Error: PineconeClient: Error calling upsert: Error: Vector dimension 1536 does not match the dimension of the index 1]

I ran the indexing script and returned this error
pnpm run ingest

[Error: PineconeClient: Error calling upsert: Error: Vector dimension 1536 does not match the dimension of the index 1]

Node.js v19.7.0
Waiting for the debugger to disconnect...
Waiting for the debugger to disconnect...

doesn't answer correctly.

https://www.chatpdf.com/
this one give me more clear answer but i bot didn't give me correct answer

429 Too Many Requests response status code?

The error is the [Failed to ingest your data]

      url: 'https://api.openai.com/v1/embeddings'
    },
    request: Request {
      [Symbol(realm)]: [Object],
      [Symbol(state)]: [Object],
      [Symbol(signal)]: [AbortSignal],
      [Symbol(headers)]: [HeadersList]
    },
    data: { error: [Object] }
  },
  isAxiosError: true,
  toJSON: [Function: toJSON]
}
c:\Users\Jack Ma\Desktop\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51
    throw new Error('Failed to ingest your data');
          ^


[Error: Failed to ingest your data]

with a piece of error somewhere above states:

response: {
    ok: false,
    status: 429,
    statusText: 'Too Many Requests',
    headers: HeadersList {
      [Symbol(headers map)]: [Map],
      [Symbol(headers map sorted)]: null
    },

My troubleshoot is listed below:

Make sure you're running the latest Node version. Run node -v

Node.js v18.12.1

Make sure you're using the same versions of LangChain and Pinecone as this repo.

I never install these packages before, and I only did npm install after I forked this repo, so should be the same (?)

Check that you've created an .env file that contains your valid (and working) API keys.

If you change modelName in OpenAIChat note that the correct name of the alternative model is gpt-3.5-turbo

Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.

I have just created an index. Shouldn't be deleted.

Make sure your pinecone dashboard environment and index matches the one in your config folder.

Check that you've set the vector dimensions to 1536.

Switch your Environment in pinecone to us-east1-gcp if the other environment is causing issues.

Currently we cannot change enviroment.

If you're stuck after trying all these steps, delete node_modules, restart your computer, then pnpm install again.

When running “npm run ingest”, there is a “network error”, how to solve

The problem I'm having now is: whenever the program runs to creating vector store... After a minute or so, there will be a network error, indicating extraction failure! The screenshot is as follows I use gpt-3 in the configuration file, the pinecone and openai configurations are also configured, indexname is also the same as the one set in pinecone, Dimensions is set to 1536, and the index is in the ready state in the console. openai api parameters are also configured. In order to prevent the limitations of gpt-4, in the configuration also changed to gpt3 (in accordance with the format of the tips to change), now it is not clear what the problem is, gpt desktop version can be accessed normally at any time, pdf document I specifically selected a only 20 pages of English material, from the console logs, the content can be resolved, but it is stuck in the extraction here, please help to see, thank you!

npm run ingest

Hi! i keep geting this massage when i run "npm run ingest"
i have changed PINECONE_INDEX_NAME=xxxx in .enve file tom my index name in pinecone but it dose not work.
do i need primum . thank you!!

C:\Users\aref\Desktop\projects\gpt4-pdf-chatbot-langchain>npm run ing
est

[email protected] ingest
tsx -r dotenv/config scripts/ingest-data.ts

c:\Users\aref\Desktop\projects\gpt4-pdf-chatbot-langchain\config\pinecone.ts:6
throw new Error('Missing Pinecone index name in .env file');
^

[Error: Missing Pinecone index name in .env file]

Node.js v18.15.0

Total costs $ for the 56-Page PDF Document vs 1 query

Hello,

Could you give us an idea of the total costs for the 56-Page documents given 1 query:

creating the embedding (a one time step)
storing the embeddings in Pinecone
matching a query of 250 tokens vs. the embedding: costs of ADA, and costs of the query to Pinecone
the first query to gpt4: chat history + the query
the second query to gpt4: standalone question + relevant documents

It seems like a lot of queries, it would be very helpful to have an idea about these costs.

Btw, thank you for this tutorial !

stream.getReader

error [Error: PineconeClient: Error calling upsert: TypeError: stream.getReader is not a function]
[Error: Failed to ingest your data]

Not sure where this issue is coming from, any advice. - it comes after running ingest.

where should I config for proxy?

export HTTP_PROXY=http://yourproxyhostname:yourproxyport

which config file should I put for proxy for openai.

thanks

[Error: Failed to ingest your data]

creating vector store...
error [Error: PineconeClient: Project name not set. Call init() first.]
g:\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51
throw new Error('Failed to ingest your data');
^

[Error: Failed to ingest your data]

404 error

wait - compiling /api/chat (client and server)... event - compiled successfully in 45 ms (75 modules) us-central1-gcp error [Error: Request failed with status code 404] { config: { transitional: { silentJSONParsing: true, forcedJSONParsing: true, clarifyTimeoutError: false }, adapter: [Function: httpAdapter], transformRequest: [ [Function: transformRequest] ], transformResponse: [ [Function: transformResponse] ], timeout: 0, xsrfCookieName: 'XSRF-TOKEN', xsrfHeaderName: 'X-XSRF-TOKEN', maxContentLength: -1, maxBodyLength: -1, validateStatus: [Function: validateStatus], headers: { Accept: 'application/json, text/plain, */*', 'Content-Type': 'application/json', 'User-Agent': 'OpenAI/NodeJS/3.2.1', Authorization: 'Bearer sk-dTyvu35Xwgzxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'Content-Length': 2330 }, method: 'post', responseType: 'stream', data:{"model":"gpt-3.5","temperature":0,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":true,"messages":[{"role":"user","content":"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\nYou should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.\nIf you can't find the answer in the context below, just say \"Hmm, I'm not sure.\" Don't try to make up an answer.\nIf the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\n\nQuestion: what is the case about?\n=========\nhouse. Yet no one wishes to substitute courts for school \nboards, or to turn the judge’s chambers into the principal’s \noffice. \nIn order to avoid resolving the fractious underlying consti\ntutional question, we need only decide a different question \nthat this case presents, the question of “qualified immunity.” \nSee Pet. for Cert. 23–28. The principle of qualified immu\nnity fits this case perfectly and, by saying so, we would di\nminish the risk of bringing about the adverse consequences \nI have identified. More importantly, we should also adhere \nto a basic constitutional obligation by avoiding unnecessary \ndecision of constitutional questions. See Ashwander v.\n\nhouse. Yet no one wishes to substitute courts for school \nboards, or to turn the judge’s chambers into the principal’s \noffice. \nIn order to avoid resolving the fractious underlying consti\ntutional question, we need only decide a different question \nthat this case presents, the question of “qualified immunity.” \nSee Pet. for Cert. 23–28. The principle of qualified immu\nnity fits this case perfectly and, by saying so, we would di\nminish the risk of bringing about the adverse consequences \nI have identified. More importantly, we should also adhere \nto a basic constitutional obligation by avoiding unnecessary \ndecision of constitutional questions. See Ashwander v.\n=========\nAnswer in Markdown:"}]}, url: 'https://api.openai.com/v1/chat/completions' }, request: <ref *1> ClientRequest { _events: [Object: null prototype] { abort: [Function (anonymous)], aborted: [Function (anonymous)], connect: [Function (anonymous)], error: [Function (anonymous)], socket: [Function (anonymous)], timeout: [Function (anonymous)], finish: [Function: requestOnFinish] }, _eventsCount: 7, _maxListeners: undefined, outputData: [], outputSize: 0, writable: true, destroyed: false, _last: true, chunkedEncoding: false, shouldKeepAlive: false, maxRequestsOnConnectionReached: false, _defaultKeepAlive: true, useChunkedEncodingByDefault: true, sendDate: false, _removedConnection: false, _removedContLen: false, _removedTE: false, strictContentLength: false, _contentLength: 2330, _hasBody: true, _trailer: '', finished: true, _headerSent: true, _closed: false, socket: TLSSocket { _tlsOptions: [Object], _secureEstablished: true, _securePending: false, _newSessionPending: false, _controlReleased: true, secureConnecting: false, _SNICallback: null, servername: 'api.openai.com', alpnProtocol: false, authorized: true, authorizationError: null, encrypted: true, _events: [Object: null prototype], _eventsCount: 9, connecting: false, _hadError: false, _parent: null, _host: 'api.openai.com', _closeAfterHandlingError: false, _readableState: [ReadableState], _maxListeners: undefined, _writableState: [WritableState], allowHalfOpen: false, _sockname: null, _pendingData: null, _pendingEncoding: '', server: undefined, _server: null, ssl: [TLSWrap], _requestCert: true, _rejectUnauthorized: true, parser: null, _httpMessage: [Circular *1], [Symbol(res)]: [TLSWrap], [Symbol(verified)]: true, [Symbol(pendingSession)]: null, [Symbol(async_id_symbol)]: 22735, [Symbol(kHandle)]: [TLSWrap], [Symbol(lastWriteQueueSize)]: 0, [Symbol(timeout)]: null, [Symbol(kBuffer)]: null, [Symbol(kBufferCb)]: null, [Symbol(kBufferGen)]: null, [Symbol(kCapture)]: false, [Symbol(kSetNoDelay)]: false, [Symbol(kSetKeepAlive)]: true, [Symbol(kSetKeepAliveInitialDelay)]: 60, [Symbol(kBytesRead)]: 0, [Symbol(kBytesWritten)]: 0, [Symbol(connect-options)]: [Object] }, _header: 'POST /v1/chat/completions HTTP/1.1\r\n' + 'Accept: application/json, text/plain, */*\r\n' + 'Content-Type: application/json\r\n' + 'User-Agent: OpenAI/NodeJS/3.2.1\r\n' + 'Authorization: Bearer sk-dTyvu35Xwgxxxxxxxxxxxxxxxxxxxxxxxx\r\n' + 'Content-Length: 2330\r\n' + 'Host: api.openai.com\r\n' + 'Connection: close\r\n' + '\r\n', _keepAliveTimeout: 0, _onPendingData: [Function: nop], agent: Agent { _events: [Object: null prototype], _eventsCount: 2, _maxListeners: undefined, defaultPort: 443, protocol: 'https:', options: [Object: null prototype], requests: [Object: null prototype] {}, sockets: [Object: null prototype], freeSockets: [Object: null prototype] {}, keepAliveMsecs: 1000, keepAlive: false, maxSockets: Infinity, maxFreeSockets: 256, scheduling: 'lifo', maxTotalSockets: Infinity, totalSocketCount: 1, maxCachedSessions: 100, _sessionCache: [Object], [Symbol(kCapture)]: false }, socketPath: undefined, method: 'POST', maxHeaderSize: undefined, insecureHTTPParser: undefined, joinDuplicateHeaders: undefined, path: '/v1/chat/completions', _ended: false, res: IncomingMessage { _readableState: [ReadableState], _events: [Object: null prototype], _eventsCount: 1, _maxListeners: undefined, socket: [TLSSocket], httpVersionMajor: 1, httpVersionMinor: 1, httpVersion: '1.1', complete: true, rawHeaders: [Array], rawTrailers: [], joinDuplicateHeaders: undefined, aborted: false, upgrade: false, url: '', method: null, statusCode: 404, statusMessage: 'Not Found', client: [TLSSocket], _consuming: false, _dumped: false, req: [Circular *1], responseUrl: 'https://api.openai.com/v1/chat/completions', redirects: [], [Symbol(kCapture)]: false, [Symbol(kHeaders)]: [Object], [Symbol(kHeadersCount)]: 14, [Symbol(kTrailers)]: null, [Symbol(kTrailersCount)]: 0 }, aborted: false, timeoutCb: null, upgradeOrConnect: false, parser: null, maxHeadersCount: null, reusedSocket: false, host: 'api.openai.com', protocol: 'https:', _redirectable: Writable { _writableState: [WritableState], _events: [Object: null prototype], _eventsCount: 3, _maxListeners: undefined, _options: [Object], _ended: true, _ending: true, _redirectCount: 0, _redirects: [], _requestBodyLength: 2330, _requestBodyBuffers: [], _onNativeResponse: [Function (anonymous)], _currentRequest: [Circular *1], _currentUrl: 'https://api.openai.com/v1/chat/completions', [Symbol(kCapture)]: false }, [Symbol(kCapture)]: false, [Symbol(kBytesWritten)]: 0, [Symbol(kEndCalled)]: true, [Symbol(kNeedDrain)]: false, [Symbol(corked)]: 0, [Symbol(kOutHeaders)]: [Object: null prototype] { accept: [Array], 'content-type': [Array], 'user-agent': [Array], authorization: [Array], 'content-length': [Array], host: [Array] }, [Symbol(errored)]: null, [Symbol(kUniqueHeaders)]: null }, response: { status: 404, statusText: 'Not Found', headers: { date: 'Mon, 20 Mar 2023 13:02:59 GMT', 'content-type': 'application/json; charset=utf-8', 'content-length': '167', connection: 'close', vary: 'Origin', 'x-request-id': 'a0e919fb2bxxxxxxxxxxxxxxxx', 'strict-transport-security': 'max-age=15724800; includeSubDomains' }, config: { transitional: [Object], adapter: [Function: httpAdapter], transformRequest: [Array], transformResponse: [Array], timeout: 0, xsrfCookieName: 'XSRF-TOKEN', xsrfHeaderName: 'X-XSRF-TOKEN', maxContentLength: -1, maxBodyLength: -1, validateStatus: [Function: validateStatus], headers: [Object], method: 'post', responseType: 'stream', data: {"model":"gpt-3.5","temperature":0,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":true,"messages":[{"role":"user","content":"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\nYou should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.\nIf you can't find the answer in the context below, just say \"Hmm, I'm not sure.\" Don't try to make up an answer.\nIf the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\n\nQuestion: what is the case about?\n=========\nhouse. Yet no one wishes to substitute courts for school \nboards, or to turn the judge’s chambers into the principal’s \noffice. \nIn order to avoid resolving the fractious underlying consti\ntutional question, we need only decide a different question \nthat this case presents, the question of “qualified immunity.” \nSee Pet. for Cert. 23–28. The principle of qualified immu\nnity fits this case perfectly and, by saying so, we would di\nminish the risk of bringing about the adverse consequences \nI have identified. More importantly, we should also adhere \nto a basic constitutional obligation by avoiding unnecessary \ndecision of constitutional questions. See Ashwander v.\n\nhouse. Yet no one wishes to substitute courts for school \nboards, or to turn the judge’s chambers into the principal’s \noffice. \nIn order to avoid resolving the fractious underlying consti\ntutional question, we need only decide a different question \nthat this case presents, the question of “qualified immunity.” \nSee Pet. for Cert. 23–28. The principle of qualified immu\nnity fits this case perfectly and, by saying so, we would di\nminish the risk of bringing about the adverse consequences \nI have identified. More importantly, we should also adhere \nto a basic constitutional obligation by avoiding unnecessary \ndecision of constitutional questions. See Ashwander v.\n=========\nAnswer in Markdown:"}]}`,
url: 'https://api.openai.com/v1/chat/completions'
},
request: <ref *1> ClientRequest {
_events: [Object: null prototype],
_eventsCount: 7,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: false,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: false,
maxRequestsOnConnectionReached: false,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: true,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
strictContentLength: false,
_contentLength: 2330,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
_closed: false,
socket: [TLSSocket],
_header: 'POST /v1/chat/completions HTTP/1.1\r\n' +
'Accept: application/json, text/plain, /\r\n' +
'Content-Type: application/json\r\n' +
'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
'Authorization: Bearer sk-dxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
'Content-Length: 2330\r\n' +
'Host: api.openai.com\r\n' +
'Connection: close\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: nop],
agent: [Agent],
socketPath: undefined,
method: 'POST',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
joinDuplicateHeaders: undefined,
path: '/v1/chat/completions',
_ended: false,
res: [IncomingMessage],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'api.openai.com',
protocol: 'https:',
_redirectable: [Writable],
[Symbol(kCapture)]: false,
[Symbol(kBytesWritten)]: 0,
[Symbol(kEndCalled)]: true,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype],
[Symbol(errored)]: null,
[Symbol(kUniqueHeaders)]: null
},
data: IncomingMessage {
_readableState: [ReadableState],
_events: [Object: null prototype],
_eventsCount: 1,
_maxListeners: undefined,
socket: [TLSSocket],
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
rawHeaders: [Array],
rawTrailers: [],
joinDuplicateHeaders: undefined,
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 404,
statusMessage: 'Not Found',
client: [TLSSocket],
_consuming: false,
_dumped: false,
req: [ClientRequest],
responseUrl: 'https://api.openai.com/v1/chat/completions',
redirects: [],
[Symbol(kCapture)]: false,
[Symbol(kHeaders)]: [Object],
[Symbol(kHeadersCount)]: 14,
[Symbol(kTrailers)]: null,
[Symbol(kTrailersCount)]: 0
}
},
isAxiosError: true,
toJSON: [Function: toJSON]
}

mayooear / gpt4-pdf-chatbot-langchain Goto Github PK

gpt4-pdf-chatbot-langchain's People

Contributors

Stargazers

Watchers

Forkers

gpt4-pdf-chatbot-langchain's Issues

如果是 mac 电脑遇到这个问题，可以安装 clashx-pro 解决，开启增强模式与全局代理后可以解决，分享给受折磨的朋友

Recommend Projects

Recommend Topics

Recommend Org