Comments (5)
Hi, thanks for the feedback.
Based on what you're saying the translation works well, but not for the page you want? What language is this?
I will add this feature of page numbers as a PR soon.
from gpt4-pdf-chatbot-langchain.
It's a Japanese light novel. You can try that here https://ufile.io/10rqqw7j
Basically, I feed the PDF into chatbot, and then have the prompt setup like "You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"
Then, I asked the question, "Can you translate page 1 of the PDF into English". The bot will translate some random page out of the PDF. If you try to ask chatbot to translate the whole PDF into English, it wont' work as well.
from gpt4-pdf-chatbot-langchain.
Generally openai's embeddings aren't great for multilingual.
If you ask it to translate from English to Japanese how is the performance?
from gpt4-pdf-chatbot-langchain.
Generally openai's embeddings aren't great for multilingual.
If you ask it to translate from English to Japanese how is the performance?
You can use your provided court case PDF to test using my prompt.
"You are an AI assistant providing translation service of the document. You are given the following extracted parts of a long document and a question"
Using your provided PDF as an example, even if you specifically ask GPT to translate page 1 of the PDF, It will still pick one random page from the PDF and translate it into whatever language you asked. That's why I said it seems like it doesn't have the concept of page. It looks like it's not about multilingual, it's about how to explain to GPT , he has to understand page number, and able to pin point exactly the page that we refer to and use that as input for translation.
perhaps , you can add some debug coding on the result so that we can know which page GPT is currently looking at when we ask the question.
from gpt4-pdf-chatbot-langchain.
There is no concept of page because the chunks are currently split by character count. I will add a PR later to split the PDF docs by page number later.
from gpt4-pdf-chatbot-langchain.
Related Issues (20)
- how do i ssl? HOT 3
- Conversion of code into Python HOT 1
- PineconeError: Error, message length too large: found 5453452 bytes, the limit is: 4194304 bytes HOT 6
- source output HOT 1
- How to include more than 4 results from Pinecone? HOT 3
- How to change the BaseUrl if I use a proxy HOT 1
- Can ChatGPT 3.5 be supported? HOT 6
- Text words overlay display HOT 1
- Does this project accepts image read from PDF? HOT 5
- Enhancement - ability to use a graph database such as neo4j instead of vector database HOT 1
- enhancement - integrate with llamaindex HOT 3
- Explain data ingestion code. HOT 4
- s HOT 1
- "TypeError: Cannot read properties of undefined (reading 'text')" HOT 1
- error TypeError: ids is not iterable HOT 1
- Add support for Pinecone Serverless HOT 5
- Error: Azure OpenAI API instance name not found HOT 3
- FetchError: request to https://api.openai.com/v1/embeddings failed HOT 1
- run "yarn run ingest" Japanese punctuation marks were converted to Korean HOT 1
- I get this error when I open my local server: Cannot read properties of undefined (reading 'text') HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt4-pdf-chatbot-langchain.