Git Product home page Git Product logo

rag-demystified's People

Contributors

abhayram-a-nair avatar aravindhp avatar jarulraj avatar pchunduri6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rag-demystified's Issues

pydantic_core.ValidationError: 4 validation errors for SubQuestionBundleList

I am getting the following error when running:

 rag-demystified   main U:3  python complex_qa.py 
⏳ Connect to EvaDB...
✅ Connected to EvaDB...
Creating vector store for Toronto...
01-03-2024 10:39:19 WARNING[executor_utils:executor_utils.py:handle_if_not_exists:0094] Table: Toronto_features already exists
WARNING:evadb.utils.logging_manager:Table: Toronto_features already exists
01-03-2024 10:39:19 WARNING[create_index_executor:create_index_executor.py:_create_evadb_index:0119] Index Toronto_index already exists. It will be updated on existing table.
WARNING:evadb.utils.logging_manager:Index Toronto_index already exists. It will be updated on existing table.
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
Successfully created vector store for Toronto.
Creating vector store for Chicago...
01-03-2024 10:39:19 WARNING[executor_utils:executor_utils.py:handle_if_not_exists:0094] Table: Chicago_features already exists
WARNING:evadb.utils.logging_manager:Table: Chicago_features already exists
01-03-2024 10:39:19 WARNING[create_index_executor:create_index_executor.py:_create_evadb_index:0119] Index Chicago_index already exists. It will be updated on existing table.
WARNING:evadb.utils.logging_manager:Index Chicago_index already exists. It will be updated on existing table.
Successfully created vector store for Chicago.
Creating vector store for Houston...
01-03-2024 10:39:19 WARNING[executor_utils:executor_utils.py:handle_if_not_exists:0094] Table: Houston_features already exists
WARNING:evadb.utils.logging_manager:Table: Houston_features already exists
01-03-2024 10:39:19 WARNING[create_index_executor:create_index_executor.py:_create_evadb_index:0119] Index Houston_index already exists. It will be updated on existing table.
WARNING:evadb.utils.logging_manager:Index Houston_index already exists. It will be updated on existing table.
Successfully created vector store for Houston.
Creating vector store for Boston...
01-03-2024 10:39:20 WARNING[executor_utils:executor_utils.py:handle_if_not_exists:0094] Table: Boston_features already exists
WARNING:evadb.utils.logging_manager:Table: Boston_features already exists
01-03-2024 10:39:20 WARNING[create_index_executor:create_index_executor.py:_create_evadb_index:0119] Index Boston_index already exists. It will be updated on existing table.
WARNING:evadb.utils.logging_manager:Index Boston_index already exists. It will be updated on existing table.
Successfully created vector store for Boston.
Creating vector store for Atlanta...
01-03-2024 10:39:20 WARNING[executor_utils:executor_utils.py:handle_if_not_exists:0094] Table: Atlanta_features already exists
WARNING:evadb.utils.logging_manager:Table: Atlanta_features already exists
01-03-2024 10:39:20 WARNING[create_index_executor:create_index_executor.py:_create_evadb_index:0119] Index Atlanta_index already exists. It will be updated on existing table.
WARNING:evadb.utils.logging_manager:Index Atlanta_index already exists. It will be updated on existing table.
Successfully created vector store for Atlanta.
Question (enter 'exit' to exit): which city has the highest population?
🧠 Generating subquestions...
🤑 LLM call cost: $0.0006
Traceback (most recent call last):
  File "/home/aravindh/devel/ai/rag-demystified/complex_qa.py", line 167, in <module>
    subquestions_bundle_list, cost = generate_subquestions(question=question,
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag-demystified/subquestion_generator.py", line 76, in generate_subquestions
    subquestions_pydantic_obj = SubQuestionBundleList(**subquestions_list)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 4 validation errors for SubQuestionBundleList
subquestion_bundle_list.0.function
  Input should be 'vector_retrieval' or 'llm_retrieval' [type=enum, input_value='getPopulation', input_type=str]
subquestion_bundle_list.0.data_source
  Input should be 'Toronto', 'Chicago', 'Houston', 'Boston' or 'Atlanta' [type=enum, input_value='Wikipedia', input_type=str]
subquestion_bundle_list.1.function
  Input should be 'vector_retrieval' or 'llm_retrieval' [type=enum, input_value='getCityWithHighestPopulation', input_type=str]
subquestion_bundle_list.1.data_source
  Input should be 'Toronto', 'Chicago', 'Houston', 'Boston' or 'Atlanta' [type=enum, input_value='Wikipedia', input_type=str]

I had to apply the following patch to get this stage:

 rag-demystified   main U:3  git diff
diff --git complex_qa.py complex_qa.py
index 336725e..116d6db 100644
--- complex_qa.py
+++ complex_qa.py
@@ -155,7 +155,7 @@ if __name__ == "__main__":
 
     vector_stores = generate_vector_stores(cursor, wiki_docs)
 
-    llm_model = "gpt-35-turbo"
+    llm_model = "gpt-3.5-turbo"
     total_cost = 0
     while True:
         question_cost = 0
diff --git openai_utils.py openai_utils.py
index abaf187..e9bfe52 100644
--- openai_utils.py
+++ openai_utils.py
@@ -16,6 +16,7 @@ logger = logging.getLogger(__name__)
 
 OPENAI_PRICING = {
     'gpt-35-turbo': {'prompt': 0.0015, 'completion': 0.002},
+    'gpt-3.5-turbo-0613': {'prompt': 0.0015, 'completion': 0.002},
     'gpt-35-turbo-16k': {'prompt': 0.003, 'completion': 0.004},
     'gpt-4-0613': {'prompt': 0.03, 'completion': 0.06},
     'gpt-4-32k': {'prompt': 0.06, 'completion': 0.12},
diff --git requirements.txt requirements.txt
index df6523e..69ca9e4 100644
--- requirements.txt
+++ requirements.txt
@@ -1,6 +1,6 @@
 evadb[document]
-openai
-instructor
+openai==0.28.1
+instructor==0.2.11
 pydantic==2.4.0
 python-dotenv==1.0.0
 tiktoken

I made an attempt to migrate the project to OpenAI 1.6.1 but ran into a similar issue.

Code not running issue with EvaDB

Had issue after installaiton

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\Users\Home\rag-demystified\complex_qa.py", line 164, in
vector_stores = generate_vector_stores(cursor, wiki_docs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\Home\rag-demystified\complex_qa.py", line 32, in generate_vector_stores
cursor.query(f"DROP TABLE IF EXISTS {doc};").df()
File "c:\Users\Home\AppData\Local\Programs\Python\Python312\Lib\site-packages\evadb\interfaces\relational\relation.py", line 123, in df
batch = self.execute(drop_alias=drop_alias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\Home\AppData\Local\Programs\Python\Python312\Lib\site-packages\evadb\interfaces\relational\relation.py", line 141, in execute
result = execute_statement(self._evadb, self._query_node.copy())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\Home\AppData\Local\Programs\Python\Python312\Lib\site-packages\evadb\server\command_handler.py", line 53, in execute_statement
batch_list = list(output)
^^^^^^^^^^^^
File "c:\Users\Home\AppData\Local\Programs\Python\Python312\Lib\site-packages\evadb\executor\plan_executor.py", line 183, in execute_plan
raise ExecutorError(e)
evadb.executor.executor_utils.ExecutorError: Failed to drop the image table [WinError 3] The system cannot find the path specified: 'C:\Users\Home\rag-demystified\d97cc00e26f5961aa55eccb311c7b2a5'

enums are ignored when running example code

Hello, thanks for a very nice low-dependency write-up on an important topic. I came here from your comment on HN.

I spent yesterday playing around with your example code - both running it as it is and riffing off it with my own data and models - and am seeing the model fail to observe the constraints set by enums, leading to consistent (not just intermittent) Pydantic validation errors. In each case, the failure mode is the same: Pydantic validation errors on the enum field. It's trying to use made-up function names instead of those specified in the enum.

I'm on Python 3.11.6, instructor 0.2.8, pydantic_2.4.2 and pydantic_core 2.10.1. I tried executing both through a Jupyter notebook and as a script from the command line.

Appreciate any guidance you might be able to provide! Please let me know if there's any additional information I can provide which would be helpful.

Missing .env pre-requsites

I am getting the following error:

 main  python complex_qa.py
Could not load .env file or it is empty. Please check if it exists and is readable.

What is expected to be present in the .env file?

Handle Rate limit

You can use some retry function after:

response = openai.ChatCompletion.create(

like:

    logger = logging.getLogger("mylogger")  # create a logger named 'mylogger'
    logger.setLevel(logging.DEBUG)  # set the logger level to DEBUG

    @retry(
        wait=wait_random_exponential(min=1, max=60),
        stop=stop_after_attempt(20),
        after=after_log(logger, logging.DEBUG),
    )
    def completion_with_backoff(self, **kwargs):
        return openai.ChatCompletion.create(**kwargs)

Error creating vector store

I am running into the following error:

 rag-demystified   main  python complex_qa.py 
⏳ Connect to EvaDB...
✅ Connected to EvaDB...
Creating vector store for Toronto...
Traceback (most recent call last):
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/lexer.py", line 665, in lex
    yield lexer.next_token(lexer_state, parser_state)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches 'F' in the current parser context, at line 1 col 8

CREATE FUNCTION IF NOT EXISTS SentenceFeatureEx
       ^
Expected one of: 
        * INDEX
        * DATABASE
        * UDF
        * TABLE

Previous tokens: Token('CREATE', 'CREATE')


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aravindh/devel/ai/rag-demystified/complex_qa.py", line 156, in <module>
    vector_stores = generate_vector_stores(cursor, wiki_docs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag-demystified/complex_qa.py", line 29, in generate_vector_stores
    cursor.query(
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/evadb/interfaces/relational/db.py", line 442, in query
    stmt = parse_query(sql_query)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/evadb/parser/utils.py", line 142, in parse_query
    stmt = Parser().parse(query)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/evadb/parser/parser.py", line 38, in parse
    lark_output = self._lark_parser.parse(query_string)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/evadb/parser/lark_parser.py", line 49, in parse
    tree = self._parser.parse(query_string)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/lark.py", line 658, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/parser_frontends.py", line 104, in parse
    return self.parser.parse(stream, chosen_start, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/parsers/lalr_parser.py", line 42, in parse
    return self.parser.parse(lexer, start)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/parsers/lalr_parser.py", line 88, in parse
    return self.parse_from_state(parser_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/parsers/lalr_parser.py", line 111, in parse_from_state
    raise e
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/parsers/lalr_parser.py", line 100, in parse_from_state
    for token in state.lexer.lex(state):
  File "/home/aravindh/devel/ai/rag/lib64/python3.11/site-packages/lark/lexer.py", line 674, in lex
    raise UnexpectedToken(token, e.allowed, state=parser_state, token_history=[last_token], terminals_by_name=self.root_lexer.terminals_by_name)
lark.exceptions.UnexpectedToken: Unexpected token Token('ID', 'FUNCTION') at line 1, column 8.
Expected one of: 
        * TABLE
        * DATABASE
        * INDEX
        * UDF
Previous tokens: [Token('CREATE', 'CREATE')]

I am on Fedora 39 running the code in a python 3.11.6 venv with all the requirements satisfied.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.