Question Validation <

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

In this setup: <div class="snippet-clipboard-content notranslate posi

[Question]: How to use MilvusVectorStore with text_key and additional metadata? about llama_index HOT 3 OPEN

RasmusAntons commented on June 29, 2024

[Question]: How to use MilvusVectorStore with text_key and additional metadata?

from llama_index.

Comments (3)

dosubot commented on June 29, 2024 3

To resolve the issue with MilvusVectorStore so that the text is queried but not included in the metadata, you can modify the query method to exclude the text_key from the metadata when constructing the TextNode. Here is the relevant part of the query method with the necessary changes:

def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    """Query index for top k most similar nodes.

    Args:
        query_embedding (List[float]): query embedding
        similarity_top_k (int): top k most similar nodes
        doc_ids (Optional[List[str]]): list of doc_ids to filter by
        node_ids (Optional[List[str]]): list of node_ids to filter by
        output_fields (Optional[List[str]]): list of fields to return
        embedding_field (Optional[str]): name of embedding field
    """
    if query.mode != VectorStoreQueryMode.DEFAULT:
        raise ValueError(f"Milvus does not support {query.mode} yet.")

    expr = []
    output_fields = ["*"]

    # Parse the filter
    if query.filters is not None:
        expr.extend(_to_milvus_filter(query.filters))

    # Parse any docs we are filtering on
    if query.doc_ids is not None and len(query.doc_ids) != 0:
        expr_list = ['"' + entry + '"' for entry in query.doc_ids]
        expr.append(f"{self.doc_id_field} in [{','.join(expr_list)}]")

    # Parse any nodes we are filtering on
    if query.node_ids is not None and len(query.node_ids) != 0:
        expr_list = ['"' + entry + '"' for entry in query.node_ids]
        expr.append(f"{MILVUS_ID_FIELD} in [{','.join(expr_list)}]")

    # Limit output fields
    if query.output_fields is not None:
        output_fields = query.output_fields

    # Convert to string expression
    string_expr = ""
    if len(expr) != 0:
        string_expr = " and ".join(expr)

    # Perform the search
    res = self.milvusclient.search(
        collection_name=self.collection_name,
        data=[query.query_embedding],
        filter=string_expr,
        limit=query.similarity_top_k,
        output_fields=output_fields,
        search_params=self.search_config,
    )

    logger.debug(
        f"Successfully searched embedding in collection: {self.collection_name}"
        f" Num Results: {len(res[0])}"
    )

    nodes = []
    similarities = []
    ids = []

    # Parse the results
    for hit in res[0]:
        if not self.text_key:
            node = metadata_dict_to_node(
                {
                    "_node_content": hit["entity"].get("_node_content", None),
                    "_node_type": hit["entity"].get("_node_type", None),
                }
            )
        else:
            try:
                text = hit["entity"].get(self.text_key)
            except Exception:
                raise ValueError(
                    "The passed in text_key value does not exist "
                    "in the retrieved entity."
                )

            metadata = {
                key: hit["entity"].get(key) for key in output_fields if key != self.text_key
            }
            node = TextNode(text=text, metadata=metadata)

        nodes.append(node)
        similarities.append(hit["distance"])
        ids.append(hit["id"])

    return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

In this modified code, the metadata dictionary is constructed by excluding the text_key from the output_fields. This ensures that the text is queried but not included in the metadata ^[1]^[2].

To continue talking to Dosu, mention @dosu.

from llama_index.

dosubot commented on June 29, 2024 2

Hello @RasmusAntons! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.

To use MilvusVectorStore with a text_key and additional metadata without including the text itself in the metadata, you can set the text_key parameter to the desired key and ensure that the output_fields parameter includes only the metadata fields you want to retrieve. Here is an example of how to set it up:

from llama_index.vector_stores.milvus import MilvusVectorStore

# Setup MilvusVectorStore
vector_store = MilvusVectorStore(
    dim=1536,
    collection_name="your_collection_name",
    uri="http://milvus_address:port",
    token="your_milvus_token_here",
    overwrite=True,
    text_key="your_text_key",  # Specify the key where the text is stored
    output_fields=["metadata_field1", "metadata_field2"]  # Specify only the metadata fields you want
)

In this setup:

text_key is set to "your_text_key", which is the key where the text is stored.
output_fields is set to ["metadata_field1", "metadata_field2"], which includes only the metadata fields you want to retrieve, excluding the text itself ^[1].

Regarding your question about dynamically modifying output_fields during a query, the query method in MilvusVectorStore allows for this. If query.output_fields is provided, it will use that; otherwise, it will fall back to the instance's self.output_fields if it is not empty. If neither is provided, it defaults to returning all fields (["*"]). Here is the relevant code snippet:

def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    ...
    output_fields = ["*"]

    # Limit output fields
    if query.output_fields is not None:
        output_fields = query.output_fields
    elif len(self.output_fields) > 0:
        output_fields = self.output_fields
    ...

This shows that the output_fields parameter can indeed be dynamically modified during a query ^[1].

To achieve your goal, you can set the output_fields in the constructor to only the metadata fields and then set query.output_fields to the metadata fields plus the text field when performing a query. This way, the text field will be included in the query results but not in the metadata.

To continue talking to Dosu, mention @dosu.

from llama_index.

RasmusAntons commented on June 29, 2024

In this setup:

* `text_key` is set to `"your_text_key"`, which is the key where the text is stored.

* `output_fields` is set to `["metadata_field1", "metadata_field2"]`, which includes only the metadata fields you want to retrieve, excluding the text itself [[1]](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-milvus/llama_index/vector_stores/milvus/base.py).

This is the behaviour I expected, but - unless I'm doing something wrong - this will always fail because line 456 requires the text_key to be included in the output_fields. If I try this, the text is not queried at all and creating the text node fails.

from llama_index.

[Question]: How to use MilvusVectorStore with text_key and additional metadata? about llama_index HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent