Comments (5)
Hi @hopkins385
Yes, you are right, it is a typo in docs, thanks for pointing it out! We will update it soon
If you will have any other problems with filters feel free to reach us out or take a look at tests e.g. nested filter example
from qdrant-client.
Also it should be filter=models.Filter(must=[models.FieldCondition....
, and not just filter=(must=...)
from qdrant-client.
cool. happy I was able to help.
Maybe someone can explain shortly which approach would be more suitable for my usecase.
Scenario: In a collection are multiple "documents" (langchain) clustered by custom metadata field "media_id".
Problem: I want to filter the documents to say maybe just 2 out of 5 by passing an array of media_ids.
Which approach would be better?
Sidenote: For me its not clear if the database is first looking for "similar" vectors and after that the result is filtered,
or if first the filter is applied and after that "similar" vectors are located
Approach 1:
def get_qdrant_documents(query: str, collection_name: str, media_ids: List[str] | None) -> List[Document]:
# ...
filtr = models.Filter(
must=[]
)
if media_ids is not None:
filtr.must.append(
models.FieldCondition(
key="metadata.media_id",
match=models.MatchAny(any=media_ids)
)
)
return qdrant.similarity_search(
query=query,
filter=filtr,
)
Approach 2:
def get_qdrant_documents(query: str, collection_name: str, media_ids: List[str] | None) -> List[Document]:
filtr = models.Filter(
must=[
models.NestedCondition(
nested=models.Nested(
key="metadata",
filter=models.Filter(
must=[]
)
)
)
]
)
if media_ids is not None:
for media_id in media_ids:
filtr.must[0].nested.filter.must.append(
models.FieldCondition(
key="media_id",
match=models.MatchValue(value=media_id)
)
)
return qdrant.similarity_search(
query=query,
filter=filtr,
)
from qdrant-client.
Thank you for spotting, updating docs.
from qdrant-client.
Hello, @hopkins385
I guess the second filter is not what you want since, should
has to be used instead of must
.
I think in your case both approaches are not really different.
Explanation for the difference between filters like metadata.media_id
and those build with models.Nested
can be found here docs
About sidenote:
It is a bit more complex than just pre-filter or post-filter. Qdrant has an advanced query planner which helps to perform queries efficiently. We applied certain modifications to hnsw algorithm itself as well.
This blogpost can help you better understand Qdrant internals.
from qdrant-client.
Related Issues (20)
- Unable to close grpc_channel. Connection was interrupted on the server side HOT 12
- grpc options are not parsed correctly when https is set
- PointStruct is very slow HOT 8
- update scoring in local mode in discovery api HOT 1
- Missing import statement in documentation (Get Started) HOT 2
- Local Qdrant db Error on loading: KeyError: '__pydantic_fields_set__' HOT 4
- query_text param not working for qdrant_client.search HOT 8
- Upgrade fastembed version from 0.1.1 to 0.2.1 (latest) HOT 3
- Deleting points by ID not working HOT 3
- Trigger nighly tests against latest qdrant dev build
- Tracking issue: local mode for Qdrant v1.8 HOT 3
- Feature Request: Progress bar for batch upload_points function HOT 2
- Check vectors for NaN in local mode HOT 2
- qdrant_client.get_fastembed_vector_params() with upload_collection HOT 4
- Python Application Crashes on Attempting to Retrieve Non-existent Collection via QdrantClient in GRPC Mode HOT 2
- Add note about batching into README.md HOT 1
- grpc.PointStruct.PayloadEntry errror HOT 2
- How to upload collection asynchronous HOT 2
- Feature Request: Add ability to have properties/metadata for a collection
- qdrant_client.QdrantClient never returns HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qdrant-client.