Comments (4)
this could be great feature for custom sharding
from qdrant-client.
Hi @raulcarlomagno,
One of our concerns is that if we support providing shard key per point, it will be easier to shoot yourself in the foot in terms of performance.
e.g. if shard_keys=["a", "b", "c", "d", "a", "b", "c", "d"]
and batch size = 2
, we would need to send 8 requests, 1 per point.
It also makes batching mechanism much more complex, and will probably lead to performance degradation.
from qdrant-client.
just in case: you don't need to construct the final batches, as you've said you only need to split data by shard keys
so it is a call to upload_collection per shard key, the final batching is still handled internally
regarding the situation with ids - it is done this way to preserve api idempotent from the server side
from qdrant-client.
the use case is
i have much data (60 million records of more than 300 dimensions, unordered, not previously sharded)
and i can't split them before, i read them in a mixed shard stream, and maybe when processing one batch, i have 2 vectors for a shard key, and 1500 for another shard key, so i have to perform small quantity batches just for really small shard keys, instead of sending them all to qdrant server, and let qdrant to split them to each shard
from qdrant-client.
Related Issues (20)
- qdrant_client.get_fastembed_vector_params() with upload_collection HOT 4
- Python Application Crashes on Attempting to Retrieve Non-existent Collection via QdrantClient in GRPC Mode HOT 2
- Add note about batching into README.md HOT 1
- grpc.PointStruct.PayloadEntry errror HOT 2
- How to upload collection asynchronous HOT 2
- Feature Request: Add ability to have properties/metadata for a collection
- qdrant_client.QdrantClient never returns HOT 1
- Datetime timezone parsing inconsistency HOT 1
- investigate local mode close HOT 1
- client method to recover from snapshots
- Make httpx client aware of timeouts passed to methods HOT 2
- Socket error using Windows and REST
- UnexpectedResponse: Unexpected Response: 400 (Bad Request) Raw response content: b'{"status":{"error":"Format error in JSON body: data did not match any variant of untagged enum PointInsertOperations"},"time":0.0}' HOT 2
- Datetime inconsistency between Qdrant local and remote HOT 2
- add `key` parameter to `set_payload` HOT 2
- add sparse embed support from fastembed 0.2.3 HOT 1
- high score with empty document string HOT 6
- Upload batch function of rest uploader is always attempting to upsert points max_retries (3) times. HOT 2
- ValueError: could not broadcast input array from shape (768,) into shape (384,) HOT 1
- client.get_collection(...) throws Pydantic validation error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qdrant-client.