infiniflow / ragflow Goto Github PK

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

License: Apache License 2.0

Python 57.56% TypeScript 39.87% JavaScript 0.09% Less 2.27% Dockerfile 0.03% Shell 0.19%

data-pipelines deep-learning document-parser document-understanding information-retrieval llm llmops machine-learning nlp ocr orchestration pdf-to-text preprocessing rag retrieval-augmented-generation table-structure-recognition

ragflow's People

Contributors

Stargazers

Watchers

Forkers

yutiansut yuhaibao39 mlikewater shanshu1015 cike8899 kevinhush chonywang keystoneinfosec bettercallcaleb aiwenforgit sikkgit awmalka curiosity007 yukiman76 hbqdev ayourtch shinroo mstrar76 caitlynbyrne berkecanrizai jac0320 shivamsinha15 cygwynd asadal mbrukman papiguy trocker crystal19880520 hbcbh1999 kustomzone robin202208 kotthoff jingsong-yan zengjie suryatmodulus andy-huang mbayopanda zhanwenzhuo-github polya20 balasista mivanovitch learningcathd electricramblers hularious whitesmell tiheqaho lpai-org cloudinbluesky parsh78 regression-io droso101 kai-hubs jjhw smbale ai-mou techthiyanes tantailong chrysanthemum-boy zjnxyz mkygogo mvandermeulen xuanjiawang jack1981 onbncbjocp68898 bxb100 sbepstein brandonaday genostack vickzhang williamtran29 gryhkn shimura0 o7s8r6 kennyhuangml100 zbkhzd2005 misterypoem mostrub john-rice yuan505 eltociear rhinojosa cloudgeekpro incom-data turingbuilder codeaudit doscadesa bonashen azure-arc-0 allwavemedia ai-chaohu houkensjtu bigbrother666sh xxl4 derek-zl linecode ytten vcappuccio itsaquestion airman0730 ximidefeng

ragflow's Issues

[Bug]: pip install -r requirements.txt error!

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

aae84f6

Other environment information

Collecting accelerate==0.27.2 (from -r requirements.txt (line 1))
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 280.0/280.0 kB 6.4 MB/s eta 0:00:00
Requirement already satisfied: aiohttp==3.9.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (3.9.3)
Requirement already satisfied: aiosignal==1.3.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: annotated-types==0.6.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (0.6.0)
Collecting anyio==4.3.0 (from -r requirements.txt (line 5))
  Downloading anyio-4.3.0-py3-none-any.whl (85 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.6/85.6 kB 13.8 MB/s eta 0:00:00
Requirement already satisfied: argon2-cffi==23.1.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (23.1.0)
Requirement already satisfied: argon2-cffi-bindings==21.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (21.2.0)
Collecting Aspose.Slides==24.2.0 (from -r requirements.txt (line 8))
  Downloading Aspose.Slides-24.2.0-py3-none-manylinux1_x86_64.whl (88.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.7/88.7 MB 2.6 MB/s eta 0:00:00
Requirement already satisfied: attrs==23.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (23.2.0)
Collecting blinker==1.7.0 (from -r requirements.txt (line 10))
  Downloading blinker-1.7.0-py3-none-any.whl (13 kB)
Collecting cachelib==0.12.0 (from -r requirements.txt (line 11))
  Downloading cachelib-0.12.0-py3-none-any.whl (20 kB)
Requirement already satisfied: cachetools==5.3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (5.3.3)
Requirement already satisfied: certifi==2024.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (2024.2.2)
Requirement already satisfied: cffi==1.16.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (1.16.0)
Requirement already satisfied: charset-normalizer==3.3.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: click==8.1.7 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (8.1.7)
Collecting coloredlogs==15.0.1 (from -r requirements.txt (line 17))
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.0/46.0 kB 7.2 MB/s eta 0:00:00
Requirement already satisfied: cryptography==42.0.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (42.0.5)
Collecting dashscope==1.14.1 (from -r requirements.txt (line 19))
  Downloading dashscope-1.14.1-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 86.1 MB/s eta 0:00:00
Collecting datasets==2.17.1 (from -r requirements.txt (line 20))
  Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.7/536.7 kB 57.4 MB/s eta 0:00:00
Collecting datrie==0.8.2 (from -r requirements.txt (line 21))
  Downloading datrie-0.8.2.tar.gz (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 kB 9.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting demjson==2.2.4 (from -r requirements.txt (line 22))
  Downloading demjson-2.2.4.tar.gz (131 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.5/131.5 kB 17.2 MB/s eta 0:00:00
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Actual behavior

pip install -r -r requirements.txt

error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Expected behavior

pip install -r -r requirements.txt will be successful

Steps to reproduce

1. pip install -r requirements.txt
2. Error happened again.

Additional information

Collecting accelerate==0.27.2 (from -r requirements.txt (line 1))
Using cached accelerate-0.27.2-py3-none-any.whl (279 kB)
Requirement already satisfied: aiohttp==3.9.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (3.9.3)
Requirement already satisfied: aiosignal==1.3.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: annotated-types==0.6.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (0.6.0)
Collecting anyio==4.3.0 (from -r requirements.txt (line 5))
Using cached anyio-4.3.0-py3-none-any.whl (85 kB)
Requirement already satisfied: argon2-cffi==23.1.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (23.1.0)
Requirement already satisfied: argon2-cffi-bindings==21.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (21.2.0)
Collecting Aspose.Slides==24.2.0 (from -r requirements.txt (line 8))
Using cached Aspose.Slides-24.2.0-py3-none-manylinux1_x86_64.whl (88.7 MB)
Requirement already satisfied: attrs==23.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (23.2.0)
Collecting blinker==1.7.0 (from -r requirements.txt (line 10))
Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
Collecting cachelib==0.12.0 (from -r requirements.txt (line 11))
Using cached cachelib-0.12.0-py3-none-any.whl (20 kB)
Requirement already satisfied: cachetools==5.3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (5.3.3)
Requirement already satisfied: certifi==2024.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (2024.2.2)
Requirement already satisfied: cffi==1.16.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (1.16.0)
Requirement already satisfied: charset-normalizer==3.3.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: click==8.1.7 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (8.1.7)
Collecting coloredlogs==15.0.1 (from -r requirements.txt (line 17))
Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Requirement already satisfied: cryptography==42.0.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (42.0.5)
Collecting dashscope==1.14.1 (from -r requirements.txt (line 19))
Using cached dashscope-1.14.1-py3-none-any.whl (1.2 MB)
Collecting datasets==2.17.1 (from -r requirements.txt (line 20))
Using cached datasets-2.17.1-py3-none-any.whl (536 kB)
Collecting datrie==0.8.2 (from -r requirements.txt (line 21))
Using cached datrie-0.8.2.tar.gz (63 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting demjson==2.2.4 (from -r requirements.txt (line 22))
Using cached demjson-2.2.4.tar.gz (131 kB)
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

[Bug]: WARNING: can't find /ragflow/rag/res/broker.tm

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

c3b2a1

Other environment information

No response

Actual behavior

Continually print these warning

Expected behavior

No response

Steps to reproduce

The first time you startup the system by:
docker compose up

Additional information

No response

ROADMAP 2024

Features

Difference language prompt templates. @KevinHuSh
Product documents. @writinwaters
Difference language UI. #246
URL support: Capable of web crawling and the corresponding content extraction. #315
Support x-inference as model provider #299
OpenAI API compatibility #287
ETA of parsing files #328
Support doc files

RAG flows

Self RAG #365

Model integration

Ollama integration. #221
BCE embedding model #326
Cohere Command R embedding model #367
AWS Bedrock models #308

demo website problem：hint : 102 Tenant not found.

Describe your problem

demo website create knowledge：hint : 102 Tenant not found.

change language

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

change language

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Bug]: Empty excel file will raise exception

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

e3c24e6

Other environment information

No response

Actual behavior

None

Expected behavior

No response

Steps to reproduce

import an empty excel file into knowledge base.

Additional information

No response

[Bug]: Documents stop processing after uploading a PDF on demo.ragflow.io

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

36f2d7b

Other environment information

No response

Actual behavior

Documents stop processing after uploading a PDF on demo.ragflow.io

Expected behavior

No response

Steps to reproduce

Upload a PDF on demo.ragflow.io

Additional information

No response

[Question]: "Table 'rag_flow.knowledgebase' doesn't exist" and "Table 'rag_flow.document' doesn't exist"

Describe your problem

pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.knowledgebase' doesn't exist")

[Bug]: Parsing stuck at 0.62%

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

main

Other environment information

No response

Actual behavior

I signed up on the demo site and uploaded pdf and docx files. They are both stuck at 0.62% for over 10 minutes now and not moving.

Here is my config

Expected behavior

I would expect parsing to finish, I guess.

Steps to reproduce

1. Create an account on https://demo.ragflow.io/
2. Upload a document

Additional information

No response

[Question]: pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.task' doesn't exist")

Describe your problem

The command "docker compose -f docker-compose-CN.yml up -d" can run normally, but when I execute the command " docker logs -f ragflow-server". The exception occurred. Has anyone encountered a similar situation before?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/ragflow/rag/svr/task_broker.py", line 180, in
dispatch()
File "/ragflow/rag/svr/task_broker.py", line 64, in dispatch
rows = collect(tm)
^^^^^^^^^^^
File "/ragflow/rag/svr/task_broker.py", line 38, in collect
docs = DocumentService.get_newly_uploaded(tm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/ragflow/api/db/services/document_service.py", line 101, in get_newly_uploaded
return list(docs.dicts())
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 7243, in iter
self.execute()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2011, in inner
return method(self, database, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2082, in execute
return self._execute(database)
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2255, in _execute
cursor = database.execute(self)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3299, in execute
return self.execute_sql(sql, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3289, in execute_sql
with exception_wrapper:
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3059, in exit
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 192, in reraise
raise value.with_traceback(tb)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
result = self._query(query)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
conn.query(q)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
result.read()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
first_packet = self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
peewee.ProgrammingError: (1146, "Table 'rag_flow.document' doesn't exist")
[WARNING] Load term.freq FAIL!
[WARNING] Load term.freq FAIL!
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 114044.52it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 26564.91it/s]
Traceback (most recent call last):
Traceback (most recent call last):
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
cursor.execute(sql, params or ())
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
result = self._query(query)
result = self._query(query)
^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
conn.query(q)
conn.query(q)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
result.read()
result.read()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
first_packet = self.connection._read_packet()
first_packet = self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
err.raise_mysql_exception(self._data)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.task' doesn't exist")

[Bug]: unable to connect to es01 cluster

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

7def208

Other environment information

No response

Actual behavior

tail -f -n 100 logs/rag/es.log

Elasticsearch version: (8, 12, 1)
Fail to connect to es: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7ffa7bbce100>: Failed to resolve 'es01' ([Errno -3] Temporary failure in name resolution)))
Fail to connect to es: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7ffa7bbce790>: Failed to resolve 'es01' ([Errno -3] Temporary failure in name resolution)))

Expected behavior

No response

Steps to reproduce

After running the docker

Additional information

No response

能支持自定义接入LLM嘛

Describe your problem

如题，可以自己自定义接入openAI或其他LLM的能力嘛

[Question]: Failed to parse any local file

Describe your problem

After deployed with the pre-built docker images and started up the ragflow server, I could successfully access the ragflow web page, but failed to parse any pdf file in the knowlege base.

All configurations follow the official configuration, except for the service port of Minio in the ./docker/docker-compose.yml

minio:
image: quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z
container_name: ragflow-minio
command: server --console-address ":9001" /data
ports:
- 19000:9000
- 19011:9001
environment:
- MINIO_ROOT_USER=${MINIO_USER}

ERROR msg:

ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_0eea0066f16411eeadae0242ac150006]', ragflow_0eea0066f16411eeadae0242ac150006, index_or_alias)【Q】：{'match': {'doc_id': '9f5aac32f18611eeb9eb0242ac150006'}}
Fail put 55d6b0f0f16411ee90d40242ac150006/xxxxxx_my_test_file.pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /55d6b0f0f16411ee90d40242ac150006/26-Tesla%20Model%20X%E8%AF%8A%E6%96%AD%E5%AF%B9%E6%A0%87%E6%8A%A5%E5%91%8A20171013.pdf, request_id: 17C2B3E1E8FCF95A, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 55d6b0f0f16411ee90d40242ac150006, object_name: -xxxxxx_my_test_file.pdf

a problem while trying to process a search request in Elasticsearch

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

This happens on both demo site and a local deployment instance.

Actual behavior

on the page: https://demo.ragflow.io/knowledge/dataset?id=<...>
After added a dataset, and try to add text chunks to the dataset via the UI interface, the following error message is encoutered:
Possible issue is that the field 'create_time' in your index ragflow_15b4f374f2e011eeae1b0242ac180006 is a text field, and operations like sorting or aggregating require field data. However, field data is disabled by default on text fields to optimize performance.

BadRequestError(
"search_phase_execution_exception",
meta=ApiResponseMeta(
status=400,
http_version="1.1",
headers={
"X-elastic-product": "Elasticsearch",
"content-type": "application/vnd.elasticsearch+json;compatible-with=8",
"content-length": "2231",
},
duration=0.0018017292022705078,
node=NodeConfig(
scheme="http",
host="es01",
port=9200,
path_prefix="",
headers={
"user-agent": "elasticsearch-py/8.12.1 (Python/3.11.0; elastic-transport/8.12.0)"
},
connections_per_node=10,
request_timeout=10.0,
http_compress=False,
verify_certs=True,
ca_certs=None,
client_cert=None,
client_key=None,
ssl_assert_hostname=None,
ssl_assert_fingerprint=None,
ssl_version=None,
ssl_context=None,
ssl_show_warn=True,
_extras={},
),
),
body={
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": True,
"failed_shards": [
{
"shard": 0,
"index": "ragflow_15b4f374f2e011eeae1b0242ac180006",
"node": "90aM0LzhTSqdYA-X6yX5mg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
},
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
},
},
},
"status": 400,
},
)

Expected behavior

No response

Steps to reproduce

Add a new dataset via the WebUI (successful)
Add a new chunk to the newly created dataset (error).

This happens on both official demo site and a local deployment testing environment.

Additional information

No response

[Bug]: docker-compose failed!

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

newest

Other environment information

No response

Actual behavior

I have pulled the images successfully and do docker compose -f docker-compose-CN.yml up -d.

Expected behavior

No response

Steps to reproduce

[+] Running 6/8
 ⠿ Network docker_ragflow                                                                                                 Created                                                                       0.1s
 ⠿ Container ragflow-es-01                                                                                                Healthy                                                                      21.2s
 ⠿ Container ragflow-mysql                                                                                                Healthy                                                                      11.2s
 ⠿ Container ragflow-minio                                                                                                Started                                                                       1.7s
 ⠇ es01 Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.                                                                                 0.0s
 ⠿ Container ragflow-kibana                                                                                               Started                                                                      21.6s
 ⠿ Container ragflow-server                                                                                               Started                                                                      21.8s
 ⠇ kibana Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.                                                                               0.0s
(base) lk@lk:/media/lk/disk1/lk_git/6_NLPandCNN/LLM/ragflow/docker$ docker logs -f ragflow-server
[HUQIE]:Build default trie
[HUQIE]:Build default trie
[HUQIE]:Build default trie
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt



WARNING:root:Realtime synonym is disabled, since no redis connection.
WARNING:root:Realtime synonym is disabled, since no redis connection.
WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!
pytorch_model.bin:   7%|▋         | 94.4M/1.30G [00:29<06:09, 3.27MB/s]WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!
Traceback (most recent call last):
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
    cursor.execute(sql, params or ())
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
             ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query

Anyone can helps ? Thanks!



### Additional information

![screenshot1](https://github.com/infiniflow/ragflow/assets/20237650/246876fb-4737-4066-bae1-57605561a678)

It shows {"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"} in the website.

[Question]: Hello. Why “zipfile.BadZipFile: File is not a zip file”

Describe your problem

[Bug]: Index Not Found - when testing retrieval

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

2673be8

Other environment information

I've uploaded docs to the dataset, parsed and chunked successfully but testing the retrieval fails consistently. Using OpenAI model.

Actual behavior

Error in top right - 'Index Not Found'

Expected behavior

It should produce output from the LLM.

Steps to reproduce

Test anything in retrieval testing.

Additional information

No response

antd report that: Static function can not consume context like dynamic theme. Please use 'App' component instead.

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

f89c6c9

Other environment information

No response

Actual behavior

Expected behavior

No response

Steps to reproduce

![image](https://github.com/infiniflow/ragflow/assets/8089971/3b77d8ce-fc78-4006-8c4c-1e25f7d29fa6)

Additional information

No response

For any type of file, if the parsing method is general, the chunk token number needs to be displayed.

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

963533b

Other environment information

No response

Actual behavior

Expected behavior

For any type of file, if the parsing method is general, the chunk token number needs to be displayed.

Steps to reproduce

![image](https://github.com/infiniflow/ragflow/assets/8089971/640577f4-a7ad-4394-a22c-4ab4db336491)

Additional information

No response

[Question]: Dockerfile for ragflow-base image?

Describe your problem

Building the project truly from source would involve building all resources, including base image. Any chance ragflow-base dockerfile could be included in repo?

[Question]: Document process blocked at 80%

Describe your problem

local docker with latest image, the document process is blocked at 80%. LLM is ChatGLM and the API Key is set in the web ui.
Error log is:

Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES create index error ragflow_0196ca84f5a111ee80170242ac150006 ----BadRequestError(400, 'resource_already_exists_exception', 'index [ragflow_0196ca84f5a111ee80170242ac150006/gHToUXxJSNSqdLG9Yo0mNA] already exists')
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
Fail put 0beab266f5a111eeab0c0242ac150006/附件1：《好烤漆金牌造》销售工具话术.pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /0beab266f5a111eeab0c0242ac150006/%E9%99%84%E4%BB%B61%EF%BC%9A%E3%80%8A%E5%A5%BD%E7%83%A4%E6%BC%86%E9%87%91%E7%89%8C%E9%80%A0%E3%80%8B%E9%94%80%E5%94%AE%E5%B7%A5%E5%85%B7%E8%AF%9D%E6%9C%AF.pdf, request_id: 17C44D75219FE3C5, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 0beab266f5a111eeab0c0242ac150006, object_name: 附件1：《好烤漆金牌造》销售工具话术.pdf
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
Fail put 8568867ef5a411eebc050242ac150005/附件1：《好烤漆金牌造》销售工具话术.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /8568867ef5a411eebc050242ac150005, request_id: 17C44E3BAF64631C, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 8568867ef5a411eebc050242ac150005
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】：{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING

What might be the error?

[Bug]: call ChatGLM failed

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

No response

Actual behavior

Configure the LLM as ChatGLM and chat, got
ERROR: Completions.create() got an unexpected keyword argument 'presence_penalty'

Expected behavior

No response

Steps to reproduce

Deployed local docker environment.
Create a knowledge base, uploading some docs.
Config ChatGLM as LLM and config the ApiKey.
Then create an assistant with ChatGLM, chat with it, the error will happen.

Additional information

No response

[Question]: Performance of OCR

Describe your problem

ragflow is integrating with the OCR model of InfiniFlow/deepdoc. what's the performance of the text extraction and table structure extraction compare with the commercial OCR tools such as the text extraction of Azure and Aws.

[Question]: API docs?

Describe your problem

Couldn't find any swagger api on first glance to use this locally with an external code base and just as a RAG engine.

[Bug]: WARNING: can't find /ragflow/rag/res/broker.tm

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

7a36d25

Other environment information

pop_os 22.04
docker 26.0
Intel i7-12800h
32gb

Actual behavior

(base) hitesh@whiskey:~/ragflow/docker$ docker logs -f ragflow-server
[HUQIE]:Build default trie
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!

____                 ______ __

/ __ \ ____ _ ____ _ / // / _ __
/ // // __ // __ // / / // __ | | /| / /
/ , // // // // // / / // // /| |/ |/ /
// || _,/ _, /// // _/ |/|_/
/____/

ERROR:dashscope:Request: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation failed, status: 401, message: Invalid API-key provided.
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Running on all addresses (0.0.0.0)
Running on http://127.0.0.1:9380
Running on http://172.18.0.5:9380
INFO:werkzeug:Press CTRL+C to quit
WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm
WARNING: can't find /ragflow/rag/res/broker.tm

Expected behavior

____                 ______ __

/ __ \ ____ _ ____ _ / // / _ __
/ // // __ // __ // / / // __ | | /| / /
/ , // // // // // / / // // /| |/ |/ /
// || _,/ _, /// // _/ |/|_/
/____/

Running on all addresses (0.0.0.0)
Running on http://127.0.0.1:9380
Running on http://172.22.0.5:9380
INFO:werkzeug:Press CTRL+C to quit

Steps to reproduce

I followed the instructions for docker:

$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/docker
$ docker compose up -d

Additional information

No response

Index failure when parsing documents

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

c3b2d1

Other environment information

No response

Actual behavior

I test on https://demo.ragflow.io/, upload a pdf file. Index failure every time.

Page(13~25): [ERROR]Index failure!

Expected behavior

No response

Steps to reproduce

I test on https://demo.ragflow.io/,  upload a pdf file. Index failure every time.


Page(13~25): [ERROR]Index failure!

Additional information

No response

[Question]: RAGFlow - how to add json(l)/csv files containing an academic corpus and metadata

Describe your problem

Can somebody tell us how we can load a json(l) or csv file into the system?

[Bug]: Discord Invalid Invite

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

newest

Other environment information

No response

Actual behavior

The Discord link failed.

Expected behavior

No response

Steps to reproduce

Just click here
https://github.com/infiniflow/ragflow?tab=readme-ov-file#-community
The Discord link not work.

Additional information

No response

All documents in the knowledge base cannot be selected if they have not been parsed.

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

080cbd9

Other environment information

No response

Actual behavior

Expected behavior

All documents in the knowledge base cannot be selected if they have not been parsed.

Steps to reproduce

![image](https://github.com/infiniflow/ragflow/assets/8089971/e580162d-149d-42ed-881d-7123beb35458)

![image](https://github.com/infiniflow/ragflow/assets/8089971/cea5d535-8613-4f3f-8cd7-b5b19d43ecea)

Additional information

No response

如何溯源？

Describe your problem

请问如果精召回了3个chunk，都丢给大模型，最终是如何确认答案是基于哪个chunk回答的呢？

[Feature Request]: Better local LLM support

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

Local LLM, especially for LLAMA families should be easily integrated

Describe the feature you'd like

Support ollama

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Feature Request]: Support for Apple Silicon Mac

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

Hi.
It looks like the docker image and instructions are for Linux. I tried to run the docker compose on my M2 Mac, but I do get errors related to MySQL.

! mysql The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

 runtime: failed to create new OS thread (have 2 already; errno=22)



### Describe the feature you'd like

Apple Silicone support and clear instructions how to install in Mac

### Describe implementation you've considered

I tried to add     `platform: linux/amd64` to MySQL image deofinition in docker compose, but this didn't help.

### Documentation, adoption, use case

_No response_

### Additional information

_No response_

[Question]: pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.document' doesn't exist")

Describe your problem

docker compose down -v docker compose up
above command doesn't work

[Question]: Redundant database.log

Describe your problem

We've observed that the size of the database.log file increases rapidly, reaching gigabytes in a very short span of time. By using tail -f to monitor the file, we noticed it generates numerous entries similar to the ones below. Is there a way to suppress these logs?

Returning 140199553734352 to pool.
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 1712592641735, 2, 1, 64, 0])
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 0, 2, 0, 64, 0])
Returning 139777349632080 to pool.
Returning 140199553734352 to pool.
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 1712592641735, 2, 1, 64, 0])
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 0, 2, 0, 64, 0])
Returning 140199553734352 to pool.
Returning 139777349632080 to pool.

Deploy SSL for the demo site

Add support for ollama

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

Add support for ollama

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Question]: 输入docker logs -f ragflow-server报错 win10系统

Describe your problem

[Question]: How to use python deepdoc/vision/t_ocr.py -h after pip install -r requirement.txt ?

Describe your problem

After pip install -r requrement.txt, I just use python deepdoc/vision/t_ocr.py -h, it returned error messages below:
Traceback (most recent call last):
File "/deepdoc/vision/t_ocr.py", line 14, in
from deepdoc.vision.seeit import draw_box
ModuleNotFoundError: No module named 'deepdoc'
How i can use the deepdoc by cli?

DeepDoc对中文的支持怎么样

Describe your problem

DeepDoc对中文的支持怎么样，我看文档都是用的英文文档

Demo website gets stuck in the document parsing stage.

Describe your problem

After several attempts, it did not work.

请问系统架构图是使用什么工具绘制的

Describe your problem

请问系统架构图是使用什么工具绘制的

[Question]: How can I extract text automatically segmented as per the layout

Describe your problem

If there is a pdf with 2 columns with headings and tables. I want to extract the text/OCR result separately for individual layout segments. How can I do it directly just by using deepdoc?

[Question]: Can't update token usage for ***/EMBEDDING

Describe your problem

The info in ERROR.log as follows:

Fail put 29f4f2dcf21b11ee97630242c0a80006/AcademicGPT.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /29f4f2dcf21b11ee97630242c0a80006, request_id: 17C2EC907DD3BC46, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 29f4f2dcf21b11ee97630242c0a80006
Can't update token usage for d11309c4f0c111eea3da0242ac150005/EMBEDDING
Object of type ndarray is not JSON serializable
Traceback (most recent call last):
  File "/ragflow/api/apps/conversation_app.py", line 172, in completion
    ans = chat(dia, msg, **req)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/api/apps/conversation_app.py", line 215, in chat
    kbinfos = retrievaler.retrieval(" ".join(questions), embd_mdl, dialog.tenant_id, dialog.kb_ids, 1, dialog.top_n,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/nlp/search.py", line 314, in retrieval
    sres = self.search(req, index_name(tenant_id), embd_mdl)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/nlp/search.py", line 115, in search
    es_logger.info("【Q】: {}".format(json.dumps(s)))
                                    ^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/ragflow/api/utils/__init__.py", line 128, in default
    return json.JSONEncoder.default(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable
Can't update token usage for d11309c4f0c111eea3da0242ac150005/EMBEDDING
Object of type ndarray is not JSON serializable

How can I solved this?

[Feature Request]: Hello, if possible, we'd like to customize responses when there's no relevant content in the knowledge base.

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

When a user asks a question and there's no relevant content in the knowledge base, we can reply with a custom message.

For example:
Knowledge base: 1. Professional knowledge
User input: Hello, xxxxxx?
[No relevant content found]
Assistant output: Sorry, I'm unable to answer your question. You can submit a ticket at https://xxx.com.

Describe the feature you'd like

When a user asks a question and there's no relevant content in the knowledge base, we can reply with a custom message.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Question]: How to set `vm.max_map_count` in Windows

Describe your problem

The README.md show the way to set vm.max_map_count in Linux. But how to set it in Windows?

Refresh the login page and the language setting becomes invalid.

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

c829799

Other environment information

No response

Actual behavior

Refresh the login page and the language setting becomes invalid.

Expected behavior

No response

Steps to reproduce

Refresh the login page and the language setting becomes invalid.

Additional information

No response

[Bug]: Lost embedding model config in knowledgebase config

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

2673be8

Other environment information

No response

Actual behavior

As title describe

Expected behavior

No response

Steps to reproduce

Save knowledgebase configuration.
Load it again.
Embedding configuration dismissed.

Additional information

No response

[Bug]: Missing CONTRIBUTING.md

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

6cf0889

Other environment information

No response

Actual behavior

README.md mentions:

RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our Contribution Guidelines first.

But the main branch of ragflow does not contain CONTRIBUTING.md.

Expected behavior

CONTRIBUTING.md should exist.

Steps to reproduce

NA

Additional information

No response

[Bug]: Historical chats appear in the new user's chat box

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

No response

Actual behavior

I registered user yh, and have a chat as shown in pic1. After that, I registered a new user yh01 and found the history chat of user yh(pic2). I think it's a function bug.

pic1

pic2

Expected behavior

No response

Steps to reproduce

1.start the ragflow
2.config apikeys
3.registered user A and config model and start to chat
4.registered user B and start to chat

Additional information

By the way, there still some error message in ERROR.log as follows,

Fail put 77dd584cf57a11eebdea0242ac190005/LOMO.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /77dd584cf57a11eebdea0242ac190005, request_id: 17C43DDAB4AB4F18, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 77dd584cf57a11eebdea0242ac190005
Can't update token usage for c4c74360f54e11ee863b0242ac190005/EMBEDDING