Comments (3)
To resolve the issue with MilvusVectorStore
so that the text is queried but not included in the metadata, you can modify the query
method to exclude the text_key
from the metadata when constructing the TextNode
. Here is the relevant part of the query
method with the necessary changes:
def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
"""Query index for top k most similar nodes.
Args:
query_embedding (List[float]): query embedding
similarity_top_k (int): top k most similar nodes
doc_ids (Optional[List[str]]): list of doc_ids to filter by
node_ids (Optional[List[str]]): list of node_ids to filter by
output_fields (Optional[List[str]]): list of fields to return
embedding_field (Optional[str]): name of embedding field
"""
if query.mode != VectorStoreQueryMode.DEFAULT:
raise ValueError(f"Milvus does not support {query.mode} yet.")
expr = []
output_fields = ["*"]
# Parse the filter
if query.filters is not None:
expr.extend(_to_milvus_filter(query.filters))
# Parse any docs we are filtering on
if query.doc_ids is not None and len(query.doc_ids) != 0:
expr_list = ['"' + entry + '"' for entry in query.doc_ids]
expr.append(f"{self.doc_id_field} in [{','.join(expr_list)}]")
# Parse any nodes we are filtering on
if query.node_ids is not None and len(query.node_ids) != 0:
expr_list = ['"' + entry + '"' for entry in query.node_ids]
expr.append(f"{MILVUS_ID_FIELD} in [{','.join(expr_list)}]")
# Limit output fields
if query.output_fields is not None:
output_fields = query.output_fields
# Convert to string expression
string_expr = ""
if len(expr) != 0:
string_expr = " and ".join(expr)
# Perform the search
res = self.milvusclient.search(
collection_name=self.collection_name,
data=[query.query_embedding],
filter=string_expr,
limit=query.similarity_top_k,
output_fields=output_fields,
search_params=self.search_config,
)
logger.debug(
f"Successfully searched embedding in collection: {self.collection_name}"
f" Num Results: {len(res[0])}"
)
nodes = []
similarities = []
ids = []
# Parse the results
for hit in res[0]:
if not self.text_key:
node = metadata_dict_to_node(
{
"_node_content": hit["entity"].get("_node_content", None),
"_node_type": hit["entity"].get("_node_type", None),
}
)
else:
try:
text = hit["entity"].get(self.text_key)
except Exception:
raise ValueError(
"The passed in text_key value does not exist "
"in the retrieved entity."
)
metadata = {
key: hit["entity"].get(key) for key in output_fields if key != self.text_key
}
node = TextNode(text=text, metadata=metadata)
nodes.append(node)
similarities.append(hit["distance"])
ids.append(hit["id"])
return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)
In this modified code, the metadata
dictionary is constructed by excluding the text_key
from the output_fields
. This ensures that the text is queried but not included in the metadata [1][2].
To continue talking to Dosu, mention @dosu.
from llama_index.
Hello @RasmusAntons! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.
To use MilvusVectorStore
with a text_key
and additional metadata without including the text itself in the metadata, you can set the text_key
parameter to the desired key and ensure that the output_fields
parameter includes only the metadata fields you want to retrieve. Here is an example of how to set it up:
from llama_index.vector_stores.milvus import MilvusVectorStore
# Setup MilvusVectorStore
vector_store = MilvusVectorStore(
dim=1536,
collection_name="your_collection_name",
uri="http://milvus_address:port",
token="your_milvus_token_here",
overwrite=True,
text_key="your_text_key", # Specify the key where the text is stored
output_fields=["metadata_field1", "metadata_field2"] # Specify only the metadata fields you want
)
In this setup:
text_key
is set to"your_text_key"
, which is the key where the text is stored.output_fields
is set to["metadata_field1", "metadata_field2"]
, which includes only the metadata fields you want to retrieve, excluding the text itself [1].
Regarding your question about dynamically modifying output_fields
during a query, the query
method in MilvusVectorStore
allows for this. If query.output_fields
is provided, it will use that; otherwise, it will fall back to the instance's self.output_fields
if it is not empty. If neither is provided, it defaults to returning all fields (["*"]
). Here is the relevant code snippet:
def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
...
output_fields = ["*"]
# Limit output fields
if query.output_fields is not None:
output_fields = query.output_fields
elif len(self.output_fields) > 0:
output_fields = self.output_fields
...
This shows that the output_fields
parameter can indeed be dynamically modified during a query [1].
To achieve your goal, you can set the output_fields
in the constructor to only the metadata fields and then set query.output_fields
to the metadata fields plus the text field when performing a query. This way, the text field will be included in the query results but not in the metadata.
To continue talking to Dosu, mention @dosu.
from llama_index.
In this setup:
* `text_key` is set to `"your_text_key"`, which is the key where the text is stored. * `output_fields` is set to `["metadata_field1", "metadata_field2"]`, which includes only the metadata fields you want to retrieve, excluding the text itself [[1]](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-milvus/llama_index/vector_stores/milvus/base.py).
This is the behaviour I expected, but - unless I'm doing something wrong - this will always fail because line 456 requires the text_key
to be included in the output_fields
. If I try this, the text is not queried at all and creating the text node fails.
from llama_index.
Related Issues (20)
- PropertyGraphIndex : ValidationError: 1 validation error for SimpleLLMPathExtractor , llm Cant instantiate abstract class LLM without an implementation for abstract methods 'achat','acomplete','astream_chat' etc HOT 4
- [Question]: Does `IngestionPipeline` simply skips over the cached documents instead of retrieving them from cache? HOT 2
- [Bug]: TreeSummarize Llamacpp llm Requested tokens exceed context window HOT 2
- [Bug]: `IngestionPipeline` with docstore not outputting nodes when loaded from local cache and using a new docstore HOT 3
- [Question]: Not able to install llama-index-storage-chat-store-azure HOT 3
- [Bug]: Query id error while running elasticsearch with same index name HOT 3
- Agent enters infinite loop , even max_iteration=3 wont work HOT 1
- [Question]: I want to store the chat_history in a postgresql that is installed locally on my machine. HOT 3
- [Question]: Metadata filtering in chat engine. HOT 5
- [Question]: Llama Index with local Embedding Model Dont Finish HOT 5
- [Question]: nested agents HOT 2
- [Bug]: OpenAIAgent.astream_chat() does not work HOT 2
- [Question]: Further explanation of total embedding token usage HOT 3
- [Bug]: Nebulagraph - upsert_triplet() function has empty string obj HOT 2
- [Bug]: Getting deprecation message for Neo4jPropertyGraphStore HOT 4
- [Question]: Why does Bedrock prints out prompt and other unnecessary symbols? HOT 2
- [Question]: Retriever search depth in a custom node relationship environment HOT 4
- [Bug]: function_call does not generate langfuse subspans for a custom AsyncBaseTool class HOT 7
- [Bug]: AttributeError: 'Anthropic' object has no attribute 'beta' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.