Git Product home page Git Product logo

Comments (3)

dosubot avatar dosubot commented on June 29, 2024 3

To resolve the issue with MilvusVectorStore so that the text is queried but not included in the metadata, you can modify the query method to exclude the text_key from the metadata when constructing the TextNode. Here is the relevant part of the query method with the necessary changes:

def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    """Query index for top k most similar nodes.

    Args:
        query_embedding (List[float]): query embedding
        similarity_top_k (int): top k most similar nodes
        doc_ids (Optional[List[str]]): list of doc_ids to filter by
        node_ids (Optional[List[str]]): list of node_ids to filter by
        output_fields (Optional[List[str]]): list of fields to return
        embedding_field (Optional[str]): name of embedding field
    """
    if query.mode != VectorStoreQueryMode.DEFAULT:
        raise ValueError(f"Milvus does not support {query.mode} yet.")

    expr = []
    output_fields = ["*"]

    # Parse the filter
    if query.filters is not None:
        expr.extend(_to_milvus_filter(query.filters))

    # Parse any docs we are filtering on
    if query.doc_ids is not None and len(query.doc_ids) != 0:
        expr_list = ['"' + entry + '"' for entry in query.doc_ids]
        expr.append(f"{self.doc_id_field} in [{','.join(expr_list)}]")

    # Parse any nodes we are filtering on
    if query.node_ids is not None and len(query.node_ids) != 0:
        expr_list = ['"' + entry + '"' for entry in query.node_ids]
        expr.append(f"{MILVUS_ID_FIELD} in [{','.join(expr_list)}]")

    # Limit output fields
    if query.output_fields is not None:
        output_fields = query.output_fields

    # Convert to string expression
    string_expr = ""
    if len(expr) != 0:
        string_expr = " and ".join(expr)

    # Perform the search
    res = self.milvusclient.search(
        collection_name=self.collection_name,
        data=[query.query_embedding],
        filter=string_expr,
        limit=query.similarity_top_k,
        output_fields=output_fields,
        search_params=self.search_config,
    )

    logger.debug(
        f"Successfully searched embedding in collection: {self.collection_name}"
        f" Num Results: {len(res[0])}"
    )

    nodes = []
    similarities = []
    ids = []

    # Parse the results
    for hit in res[0]:
        if not self.text_key:
            node = metadata_dict_to_node(
                {
                    "_node_content": hit["entity"].get("_node_content", None),
                    "_node_type": hit["entity"].get("_node_type", None),
                }
            )
        else:
            try:
                text = hit["entity"].get(self.text_key)
            except Exception:
                raise ValueError(
                    "The passed in text_key value does not exist "
                    "in the retrieved entity."
                )

            metadata = {
                key: hit["entity"].get(key) for key in output_fields if key != self.text_key
            }
            node = TextNode(text=text, metadata=metadata)

        nodes.append(node)
        similarities.append(hit["distance"])
        ids.append(hit["id"])

    return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

In this modified code, the metadata dictionary is constructed by excluding the text_key from the output_fields. This ensures that the text is queried but not included in the metadata [1][2].

To continue talking to Dosu, mention @dosu.

from llama_index.

dosubot avatar dosubot commented on June 29, 2024 2

Hello @RasmusAntons! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.

To use MilvusVectorStore with a text_key and additional metadata without including the text itself in the metadata, you can set the text_key parameter to the desired key and ensure that the output_fields parameter includes only the metadata fields you want to retrieve. Here is an example of how to set it up:

from llama_index.vector_stores.milvus import MilvusVectorStore

# Setup MilvusVectorStore
vector_store = MilvusVectorStore(
    dim=1536,
    collection_name="your_collection_name",
    uri="http://milvus_address:port",
    token="your_milvus_token_here",
    overwrite=True,
    text_key="your_text_key",  # Specify the key where the text is stored
    output_fields=["metadata_field1", "metadata_field2"]  # Specify only the metadata fields you want
)

In this setup:

  • text_key is set to "your_text_key", which is the key where the text is stored.
  • output_fields is set to ["metadata_field1", "metadata_field2"], which includes only the metadata fields you want to retrieve, excluding the text itself [1].

Regarding your question about dynamically modifying output_fields during a query, the query method in MilvusVectorStore allows for this. If query.output_fields is provided, it will use that; otherwise, it will fall back to the instance's self.output_fields if it is not empty. If neither is provided, it defaults to returning all fields (["*"]). Here is the relevant code snippet:

def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    ...
    output_fields = ["*"]

    # Limit output fields
    if query.output_fields is not None:
        output_fields = query.output_fields
    elif len(self.output_fields) > 0:
        output_fields = self.output_fields
    ...

This shows that the output_fields parameter can indeed be dynamically modified during a query [1].

To achieve your goal, you can set the output_fields in the constructor to only the metadata fields and then set query.output_fields to the metadata fields plus the text field when performing a query. This way, the text field will be included in the query results but not in the metadata.

To continue talking to Dosu, mention @dosu.

from llama_index.

RasmusAntons avatar RasmusAntons commented on June 29, 2024

In this setup:

* `text_key` is set to `"your_text_key"`, which is the key where the text is stored.

* `output_fields` is set to `["metadata_field1", "metadata_field2"]`, which includes only the metadata fields you want to retrieve, excluding the text itself [[1]](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-milvus/llama_index/vector_stores/milvus/base.py).

This is the behaviour I expected, but - unless I'm doing something wrong - this will always fail because line 456 requires the text_key to be included in the output_fields. If I try this, the text is not queried at all and creating the text node fails.

from llama_index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.