Git Product home page Git Product logo

Comments (7)

kishorenc avatar kishorenc commented on May 14, 2024 3

👍 That makes sense. We will work on an import API for Typesense.

from typesense.

kishorenc avatar kishorenc commented on May 14, 2024 1

Our import endpoint already supports an update action.

from typesense.

kishorenc avatar kishorenc commented on May 14, 2024

@dpastoor Thank you for the suggestion. I am also glad to hear that you are liking Typesense! Here's something you can do for now to work around this limitation:

If you are crawling a website on host A and inserting into Typesense on host B, the inefficiency is largely in the network I/O. To speed up things, you can simply populate the Typesense server "locally" in host A. After you are done crawling, stop Typesense, zip and copy the data directory over to your Typesense host B.

from typesense.

dpastoor avatar dpastoor commented on May 14, 2024

Thanks for the suggestion, I'm on the same computer, so that's not the issue. To be clear performance has not been the problem. I'm using nodejs for the insertions so I'm more worried that once I scale I will have 1000s of concurrent posts hitting the typesense server within a second or so leaves more chances for something to go wrong than a single batch insert.

Basically right now I have a

Data.forEach(d => client.collections().documents().insert(d))

Whereas I'd like to just be able to insert the whole array in one shot.

Do you know what kind of post throughout I could safely expect before needing to consider bounding the requests?

from typesense.

kishorenc avatar kishorenc commented on May 14, 2024

Throughput is a factor of:

  1. How much you can parallelize from the client. This would be the greatest bottleneck as doing I/O sequentially record-by-record would be slow.
  2. The document size being indexed.
  3. The underlying hardware (for e.g. CPU and SSD vs HDD).
  4. Size of total dataset and other search/delete operations that will be happening at the same time.

If you can give me a sense of that, I can provide further details. Happy to chat on the specifics of your use case offline: kishore at wreally dot com

from typesense.

dpastoor avatar dpastoor commented on May 14, 2024

The client is highly parallelized, in that since its javascript, as it iterates it is doing so asyncronously. As I mentioned before, I'm not actually worried about typesense performance at this point, I'm more concerned with spawning too many requests at once. Likewise, I would like to be able to use this to restore from backups. For example, exporting a collection to back up, then re-importing it as one api call - if I export like so, https://typesense.org/api/#export-collection currently must iterate over the exported collection and insert 1 at a time. I'd like to just make one call

An example of using this from the hypothetical (import) api:

let companyCollection = client.collections('companies').documents().export()
client.collections('companies').documents().import(companyCollection)

from typesense.

HananoshikaYomaru avatar HananoshikaYomaru commented on May 14, 2024

For the same reason, would there be an 'updateMany' function in the future?

from typesense.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.