Comments (7)
👍 That makes sense. We will work on an import API for Typesense.
from typesense.
Our import endpoint already supports an update action.
from typesense.
@dpastoor Thank you for the suggestion. I am also glad to hear that you are liking Typesense! Here's something you can do for now to work around this limitation:
If you are crawling a website on host A
and inserting into Typesense on host B
, the inefficiency is largely in the network I/O. To speed up things, you can simply populate the Typesense server "locally" in host A
. After you are done crawling, stop Typesense, zip and copy the data directory over to your Typesense host B
.
from typesense.
Thanks for the suggestion, I'm on the same computer, so that's not the issue. To be clear performance has not been the problem. I'm using nodejs for the insertions so I'm more worried that once I scale I will have 1000s of concurrent posts hitting the typesense server within a second or so leaves more chances for something to go wrong than a single batch insert.
Basically right now I have a
Data.forEach(d => client.collections().documents().insert(d))
Whereas I'd like to just be able to insert the whole array in one shot.
Do you know what kind of post throughout I could safely expect before needing to consider bounding the requests?
from typesense.
Throughput is a factor of:
- How much you can parallelize from the client. This would be the greatest bottleneck as doing I/O sequentially record-by-record would be slow.
- The document size being indexed.
- The underlying hardware (for e.g. CPU and SSD vs HDD).
- Size of total dataset and other search/delete operations that will be happening at the same time.
If you can give me a sense of that, I can provide further details. Happy to chat on the specifics of your use case offline: kishore at wreally dot com
from typesense.
The client is highly parallelized, in that since its javascript, as it iterates it is doing so asyncronously. As I mentioned before, I'm not actually worried about typesense performance at this point, I'm more concerned with spawning too many requests at once. Likewise, I would like to be able to use this to restore from backups. For example, exporting a collection to back up, then re-importing it as one api call - if I export like so, https://typesense.org/api/#export-collection currently must iterate over the exported collection and insert 1 at a time. I'd like to just make one call
An example of using this from the hypothetical (import) api:
let companyCollection = client.collections('companies').documents().export()
client.collections('companies').documents().import(companyCollection)
from typesense.
For the same reason, would there be an 'updateMany' function in the future?
from typesense.
Related Issues (20)
- Index problems with a special Icelandic character HOT 4
- Synonyms like k8s <-> kubernetes aren't working with enabled stemming HOT 3
- Typesense on start rejects override rule with empty query HOT 1
- Modify attribute like enabled_nested_fields on existing collection HOT 1
- Support package signing for DEB/RPM package
- [Joins] join with no filter results in the error "Filter value cannot be empty". HOT 1
- [Feature Request] Tracking popular filters and sorts HOT 1
- Adding support to load a conversation from an external system (beyond the TTL) for RAG
- Using ticks (`) prevents prefix matching from working
- Using 'sort' in the field name interferes with the TS parser
- RAM spike when selecting final page HOT 1
- Allow special characters for the facet range labels
- Question around `vector_query` param HOT 1
- [Joins] Support for sorting and limiting the number of items in a joined collection. HOT 2
- Range facet values not returned when filtering
- bazel build failure HOT 2
- Not able to add data into a nested indexed field when optional flag is false in the schema HOT 2
- [Feature Request] Support querying joined fields HOT 2
- Prefix search in filter_by not getting all the results
- Cloudflare AI APIs for semantic search
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from typesense.