Comments (2)
To address the inconsistency in node usage between Elasticsearch and ChromaDB for VectorStoreIndex
in the DenseXRetrievalPack
class, you need to ensure that the node_dict
in the RecursiveRetriever
includes all nodes and sub_nodes. This will prevent the "ValueError: Query id not found" error. Here is the modified code:
class DenseXRetrievalPack(BaseLlamaPack):
def __init__(
self,
documents: List[Document],
proposition_llm: Optional[LLM] = None,
query_llm: Optional[LLM] = None,
embed_model: Optional[BaseEmbedding] = None,
text_splitter: TextSplitter = SentenceSplitter(),
vector_store: Optional[ElasticsearchStore] = None,
similarity_top_k: int = 4,
) -> None:
"""Init params."""
self._proposition_llm = proposition_llm or OpenAI(
model="gpt-3.5-turbo",
temperature=0.1,
max_tokens=750,
)
embed_model = embed_model or OpenAIEmbedding(embed_batch_size=128)
nodes = text_splitter.get_nodes_from_documents(documents)
sub_nodes = self._gen_propositions(nodes)
all_nodes = nodes + sub_nodes
all_nodes_dict = {n.node_id: n for n in all_nodes}
service_context = ServiceContext.from_defaults(
llm=query_llm or OpenAI(),
embed_model=embed_model,
num_output=self._proposition_llm.metadata.num_output,
)
if os.path.exists('./chroma_db'):
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
self.vector_index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
else:
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
self.vector_index = VectorStoreIndex(
all_nodes, service_context=service_context, show_progress=True, storage_context=storage_context, store_nodes_override=True
)
self.retriever = RecursiveRetriever(
"vector",
retriever_dict={
"vector": self.vector_index.as_retriever(
similarity_top_k=similarity_top_k
)
},
node_dict=all_nodes_dict, # Ensure all nodes are included here
)
self.query_engine = RetrieverQueryEngine.from_args(
self.retriever, service_context=service_context
)
async def _aget_proposition(self, node: TextNode) -> List[TextNode]:
"""Get proposition."""
inital_output = await self._proposition_llm.apredict(
PROPOSITIONS_PROMPT, node_text=node.text
)
outputs = inital_output.split("\n")
all_propositions = []
for output in outputs:
if not output.strip():
continue
if not output.strip().endswith("]"):
if not output.strip().endswith('"') and not output.strip().endswith(
","
):
output = output + '"'
output = output + " ]"
if not output.strip().startswith("["):
if not output.strip().startswith('"'):
output = '"' + output
output = "[ " + output
try:
propositions = json.loads(output)
except Exception:
# fallback to yaml
try:
propositions = yaml.safe_load(output)
except Exception:
# fallback to next output
continue
if not isinstance(propositions, list):
continue
all_propositions.extend(propositions)
assert isinstance(all_propositions, list)
nodes = [TextNode(text=prop) for prop in all_propositions if prop]
return [IndexNode.from_text_node(n, node.node_id) for n in nodes]
def _gen_propositions(self, nodes: List[TextNode]) -> List[TextNode]:
"""Get propositions."""
sub_nodes = asyncio.run(
run_jobs(
[self._aget_proposition(node) for node in nodes],
show_progress=True,
workers=8,
)
)
# Flatten list
return [node for sub_node in sub_nodes for node in sub_node]
def get_modules(self) -> Dict[str, Any]:
"""Get modules."""
return {
"query_engine": self.query_engine,
"retriever": self.retriever,
}
def run(self, query_str: str, **kwargs: Any) -> RESPONSE_TYPE:
"""Run the pipeline."""
return self.query_engine.query(query_str)
This modification ensures that the node_dict
in the RecursiveRetriever
includes all nodes and sub_nodes, which should prevent the "ValueError: Query id not found" error [1].
To continue talking to Dosu, mention @dosu.
from llama_index.
@LikhithRishi you've already opened this 3 times my guy 😅
from llama_index.
Related Issues (20)
- [Question]: How to insert/delete document to/from VectorStoreIndex when using IngestionPipeline? HOT 2
- Compatibility issue between Qdrant and DSPy when Qdrant is used as the VectorStoreIndex's storage context HOT 5
- [Question]: AttributeError: 'property' object has no attribute 'context_window' HOT 1
- [Question]: The created knowledge graph does not have edge relationships neo4j HOT 12
- [Documentation]: Some of the URL Not Working HOT 3
- [Question]: Unable to understand how document storage works in case nodes are deleted HOT 1
- [Documentation]: Broken 'Examples' Link HOT 3
- [Feature Request]: Add a notebook to show llamaindex agent works with graphRAG and Vertex AI
- [Bug]: File rename error in llama-index-finetuning/llama_index/finetuning/mistralai/utils.py HOT 1
- [Question]: How to enable "Calling function" print out after querying from Multi-Document Agent example HOT 3
- [Question]: Access LLM's response object CompleteResponse() attribute `additional_kwarg` in RAG HOT 2
- [Bug]: Error in initializing neo4j HOT 2
- Indexes cannot be created correctly using the MilvusVectorStore. HOT 12
- How should the dim parameter value of MilvusVectorStore be calculated? HOT 4
- [Bug]: ERROR: Failed building wheel for pystemmer HOT 1
- How to deploy open-source embedding models in auto-merging retriever: ValueError: shapes (1024,) and (384,) not aligned: 1024 (dim 0) != 384 (dim 0) HOT 2
- [Bug]: No module named 'llama_index.llms.openai.base HOT 1
- [Bug]: [OpenAILike] Cannot use llm_chat_callback on an instance without a callback_manager attribute HOT 4
- [Feature Request]: Version pinning for sub packages HOT 2
- I wonder how to use llama_index to retrieve the Milvus collection after it is created and indexed using the MilvusVectorStore. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.