Git Product home page Git Product logo

Comments (14)

dosubot avatar dosubot commented on May 28, 2024 2

🤖

Hey @Darrenf040, great to see you back! Hope you're doing well.

The error "Cannot read properties of undefined (reading 'text')" typically occurs when the code is trying to access a property of an object that hasn't been defined. In the context of the gpt4-pdf-chatbot-langchain, this could be due to several reasons:

  1. The OpenAI or Pinecone API keys are not correctly set in the .env file. Make sure you have correctly copied your API keys into the .env file and that they are valid.

  2. The PINECONE_NAME_SPACE in the config folder might not have been correctly set. Ensure that you have replaced it with a namespace where you'd like to store your embeddings on Pinecone.

  3. The modelName in new OpenAI in utils/makechain.ts might not have been correctly set to gpt-4. Ensure that you have access to the gpt-4 API and that you have changed the modelName accordingly.

  4. The PDF files might not have been correctly converted to embeddings. Make sure you have added your PDF files in the docs folder and run the yarn run ingest script to 'ingest' and embed your docs. Also, check your Pinecone dashboard to verify that your namespace and vectors have been added.

  5. There might be an issue with the versions of LangChain and Pinecone you are using. Ensure that you are using the same versions as this repo.

  6. There might be an issue with your Node version. Make sure you are running the latest Node version.

If you have checked all these points and the error still persists, you might want to console.log your env variables to make sure they are exposed and hard code your API keys into the process.env variables. If the error still persists, you might want to retry from scratch with a new Pinecone project, index, and cloned repo. You can refer to the README.md for more details.

Regarding the structure of the object that the 'text' property is being read from, it's being read from the 'parsed' object. The 'parsed' object is the result of the 'pdf' function being called with 'raw' as an argument. The 'pdf' function is imported from the 'pdf-parse/lib/pdf-parse.js' module. The 'text' property contains the content of the PDF document and 'numpages' property contains the number of pages in the PDF document. Here is the relevant code:

const parsed = await pdf(raw);
return [
  new Document({
    pageContent: parsed.text,
    metadata: {
      ...metadata,
      pdf_numpages: parsed.numpages,
    },
  }),
];

You can find this in the utils/customPDFLoader.ts file.

I also found some similar issues in the repository that might help you:

  1. Error when trying to transition from GPT-4 to PaLM
  2. "TypeError: Cannot read properties of undefined (reading 'text')"
  3. Missing value for input

Please check these issues and see if the solutions provided there can help you resolve your problem.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

from gpt4-pdf-chatbot-langchain.

Darrenf040 avatar Darrenf040 commented on May 28, 2024

how to check pinecode and langchain version

from gpt4-pdf-chatbot-langchain.

Darrenf040 avatar Darrenf040 commented on May 28, 2024

and my model name is: 'gpt-3.5-turbo' since i dont have gpt4

from gpt4-pdf-chatbot-langchain.

scenaristeur avatar scenaristeur commented on May 28, 2024

hi @Darrenf040 i have the same error, look at the end of the error message

/gpt4-pdf-chatbot-langchain/scripts/ingest-data.ts:52
    throw new Error('Failed to ingest your data');

for me it says to look in ingest-data.ts ligne 52 . this can help you to find what data is not found.
for me an issue with pinecone name space or index

    await PineconeStore.fromDocuments(docs, embeddings, {
      pineconeIndex: index,
      namespace: PINECONE_NAME_SPACE,
      textKey: 'text',
    });

from gpt4-pdf-chatbot-langchain.

richard523 avatar richard523 commented on May 28, 2024

I believe there are now type errors when ingesting using newest Pinecone types.

I think they want you to now convert the embeddings into vectors and upsert in the new way?

But again, this has conflicts with the "makechain" script.

Here's the new upsert Pinecone wants you to use: "https://docs.pinecone.io/docs/upsert-data"

LMK if you made any progress on cleaning up the types. I'm also incredibly stuck!

from gpt4-pdf-chatbot-langchain.

ScottBlinman avatar ScottBlinman commented on May 28, 2024

Make sure you are using the podbased pinecode index. The serverless index doesn't work

from gpt4-pdf-chatbot-langchain.

mowliv avatar mowliv commented on May 28, 2024

I'm stuck on this as well. I have a forked repo with an extended feature set at https://github.com/anandaworldwide/ananda-library-chatbot and it is failing with the error "Cannot read properties of undefined (reading 'text')."

I tried upgrading to langchain 0.1.30 but that didn't help and caused other issues from breaking changes.

What is the "text" textKey parameter here? ChatGPT suggested changing it to pageContent, which is a field in my document data, but on smaller datasets it is finding the content using "text". I haven't located API docs to explain it.

await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  textKey: 'text',
});

from gpt4-pdf-chatbot-langchain.

richard-aoede avatar richard-aoede commented on May 28, 2024

textKey is the actual text being stored as metadata within Pinecone, I believe.
Edit: It's the key of the text once it's stored in the database as metadata.

I've given up on serverless Pinecone with this project because there's unresolved type errors between Pinecone serverless docs and Langchain.JS that I cannot figure out.

From what I've read about this project, pod-based storage is the way to go since serverless Pinecone is still experimental.

from gpt4-pdf-chatbot-langchain.

richard-aoede avatar richard-aoede commented on May 28, 2024

I did successfully upsert pdfs in serverless using typescript but when I tried to search it threw errors with the makeChain function for me.

from gpt4-pdf-chatbot-langchain.

mowliv avatar mowliv commented on May 28, 2024

from gpt4-pdf-chatbot-langchain.

mowliv avatar mowliv commented on May 28, 2024

I just figured out how to duplicate the error and how to fix it. I noticed that sometimes when I get the error, I also get a JavaScript heap out of memory error as a secondary exception. In the test I just did, however, I only got the primary error. But expanding Javascript memory allocation solves it!

Change to line in package.json that fixes it for me:
"ingest": "NODE_OPTIONS='--max-old-space-size=4096' tsx -r dotenv/config scripts/ingest-data.ts"

Failure:
Failed to embed documents or store in Pinecone: TypeError: Cannot read properties of undefined (reading 'text') at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:44:57) at step (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:33:23) at Object.next (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:14:53) at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:8:71) at new Promise (<anonymous>) at __awaiter (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:4:12) at extractMessage (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:40:48) at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:66:70) at step (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:33:23) at Object.next (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:14:53)

And here's a failure that includes the JavaScript heap out of memory error:

Failed to embed documents or store in Pinecone: TypeError: Cannot read properties of undefined (reading 'text')
    at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:44:57)
    at step (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:33:23)
    at Object.next (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:14:53)
    at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:8:71)
    at new Promise (<anonymous>)
    at __awaiter (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:4:12)
    at extractMessage (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/utils.js:40:48)
    at <anonymous> (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:66:70)
    at step (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:33:23)
    at Object.next (/Users/Michael/Documents/development/gpt4-pdf-chatbot-langchain-ananda-lib/node_modules/@pinecone-database/pinecone/dist/errors/handling.js:14:53)

<--- Last few GCs --->

[95569:0x7fb878008000]   614381 ms: Mark-Compact (reduce) 4065.8 (4122.3) -> 4065.8 (4122.3) MB, 75.23 / 0.00 ms  (average mu = 0.650, current mu = 0.302) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x102a931d2 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/usr/local/Cellar/node/21.4.0/bin/node]
 2: 0x102c1dbfd v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/usr/local/Cellar/node/21.4.0/bin/node]
 3: 0x102c1db93 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/usr/local/Cellar/node/21.4.0/bin/node]
 4: 0x102db6a65 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/Cellar/node/21.4.0/bin/node]
 5: 0x102db59d4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/Cellar/node/21.4.0/bin/node]
 6: 0x102dad49f v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/Cellar/node/21.4.0/bin/node]
 7: 0x102dadc95 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/Cellar/node/21.4.0/bin/node]
 8: 0x102d95ab2 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/usr/local/Cellar/node/21.4.0/bin/node]
 9: 0x102d8ce42 v8::internal::MaybeHandle<v8::internal::SeqOneByteString> v8::internal::FactoryBase<v8::internal::Factory>::NewRawStringWithMap<v8::internal::SeqOneByteString>(int, v8::internal::Tagged<v8::internal::Map>, v8::internal::AllocationType) [/usr/local/Cellar/node/21.4.0/bin/node]
10: 0x102d8cda7 v8::internal::FactoryBase<v8::internal::Factory>::NewStringFromOneByte(v8::base::Vector<unsigned char const>, v8::internal::AllocationType) [/usr/local/Cellar/node/21.4.0/bin/node]
11: 0x102e91a13 v8::internal::JsonStringifier::Stringify(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/usr/local/Cellar/node/21.4.0/bin/node]
12: 0x102e91893 v8::internal::JsonStringify(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/usr/local/Cellar/node/21.4.0/bin/node]
13: 0x102c89128 v8::internal::Builtin_JsonStringify(int, unsigned long*, v8::internal::Isolate*) [/usr/local/Cellar/node/21.4.0/bin/node]
14: 0x102883c76 Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit [/usr/local/Cellar/node/21.4.0/bin/node]
✨  Done in 614.90s.

from gpt4-pdf-chatbot-langchain.

mowliv avatar mowliv commented on May 28, 2024

I found the above sometimes still failed. So I upgraded langchain to 0.1.30 (and had to adapt the code a bit). But that didn't do it, so I upgraded @pinecone-database/pinecone to 1.1.3, and now it seems to work. It was never a problem when I processed only 4000 PDF files. The problem only came up when I processed my full set of 6000. So I'm guessing there was a memory leak in pinecone that got resolved in a later version. (Tho I'm still verifying things... vector count is 1/2 of before so perhaps I'm not processing as much as I think.)

from gpt4-pdf-chatbot-langchain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.