Git Product home page Git Product logo

Comments (5)

mayooear avatar mayooear commented on May 15, 2024

Hi, thanks for the feedback.

Based on what you're saying the translation works well, but not for the page you want? What language is this?

I will add this feature of page numbers as a PR soon.

from gpt4-pdf-chatbot-langchain.

sdugoten avatar sdugoten commented on May 15, 2024

It's a Japanese light novel. You can try that here https://ufile.io/10rqqw7j

Basically, I feed the PDF into chatbot, and then have the prompt setup like "You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Then, I asked the question, "Can you translate page 1 of the PDF into English". The bot will translate some random page out of the PDF. If you try to ask chatbot to translate the whole PDF into English, it wont' work as well.

from gpt4-pdf-chatbot-langchain.

mayooear avatar mayooear commented on May 15, 2024

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

from gpt4-pdf-chatbot-langchain.

sdugoten avatar sdugoten commented on May 15, 2024

Generally openai's embeddings aren't great for multilingual.

If you ask it to translate from English to Japanese how is the performance?

You can use your provided court case PDF to test using my prompt.

"You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"

Using your provided PDF as an example, even if you specifically ask GPT to translate page 1 of the PDF, It will still pick one random page from the PDF and translate it into whatever language you asked. That's why I said it seems like it doesn't have the concept of page. It looks like it's not about multilingual, it's about how to explain to GPT , he has to understand page number, and able to pin point exactly the page that we refer to and use that as input for translation.

perhaps , you can add some debug coding on the result so that we can know which page GPT is currently looking at when we ask the question.

from gpt4-pdf-chatbot-langchain.

mayooear avatar mayooear commented on May 15, 2024

There is no concept of page because the chunks are currently split by character count. I will add a PR later to split the PDF docs by page number later.

from gpt4-pdf-chatbot-langchain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.