Git Product home page Git Product logo

Comments (16)

Ngalstyan4 avatar Ngalstyan4 commented on June 9, 2024 1

Thanks for the info @heyufeng666888.

We assume that the HNSW index fits in memory and will always be cached by postgres. This is typical assumption for all kinds of postgres indexes. If index is not in memory, both insert and search queries become extremely slow.

In Lantern, inserts from multiple threads are indeed not any faster right now because of our implementation limitations. This will be improved soon, however!

Is the current speed of insertions unacceptable for you?
Do you think you could share your use-case, storage, throughput and latency requirements?

from lantern.

Ngalstyan4 avatar Ngalstyan4 commented on June 9, 2024

Hi @heyufeng666888, Could you report the throughput numbers you are getting?

Insert throughput should not be very low. I will investigate, once I have more details!

You can get even faster throughput if you insert into the table before creating the vector index, and create the vector index externally, as described here: https://docs.lantern.dev/lantern-cli/lantern-index#run-index-creation
This allows potentially offloading the index creation load from your main database instance.

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

@Ngalstyan4 It is indeed fast to insert data directly without an index. But my business does not allow indexing to be established after inserting data, as the formal business involves both inbound and retrieval operations. Therefore, I have already established the index before inserting data. At this time, the speed of inserting data is very slow, and it takes 3336 seconds for 20 threads of 100k data

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

My vector field is already available and features have been extracted. My current business is to create an index in advance and then insert data, which has a low throughput

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

Because my feature model has already been generated, I may need a requirement for asynchronous index construction to ensure that the index construction does not affect the insertion of data

from lantern.

var77 avatar var77 commented on June 9, 2024

Hi @heyufeng666888 , can you check this Jupyter notebook on your machine? I have inserted 100k 1536 dimensional vectors in ~700 seconds on my Macbook Pro using single connection.

Do you have any particular example with more details, so we can help better?

from lantern.

Ngalstyan4 avatar Ngalstyan4 commented on June 9, 2024

Hi @heyufeng666888,

Did you get a chance to check out the notebook @var77 shared above?

Please let us know if you are still having performance issues. Having more details would definitely help us address the issue more quickly, assuming it still exists.

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

Hi @Ngalstyan4,
Did you first create an hnsw index for the vector field before inserting the data, or did you insert the vector field directly without an hnsw index? What are your testing steps?

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

@Ngalstyan4 @var77
Let me test it

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

Hi @Ngalstyan4 @var77 I tested the throughput using @var77's notebook, which is still very low. May I know the configuration of Postgres and the version of lantern tested?

{'platform': 'Darwin', 'platform-release': '23.1.0', 'platform-version': 'Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:27 PDT 2023; root:xnu-10002.41.9~6/RELEASE_X86_64', 'architecture': 'x86_64', 'processor': 'i386', 'ram': '16 GB', 'cores': 8}

Inserted 10000 items - speed 14.33975296961922 item/s
Inserted 20000 items - speed 13.36656929866538 item/s

from lantern.

var77 avatar var77 commented on June 9, 2024

Hi @heyufeng666888 sorry for inconvenience.

I have used the latest version of Lantern (built from source) with Postgres 15 installed with homebrew (the postgres settings were the defaults for me).
Can you make sure that the lantern is built in release version (you can clone the repo and run cmake and make install)?

Also you can try to increase shared_buffers from postgres configs to 20% of your memory and maybe set maintenance_work_mem to like 1GB

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

@var77 lantern_v0.0.5?

from lantern.

var77 avatar var77 commented on June 9, 2024

Yes @heyufeng666888

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

@var77 Postgres installed on your macbookpro?

from lantern.

var77 avatar var77 commented on June 9, 2024

Yes @heyufeng666888 via homebrew

from lantern.

heyufeng666888 avatar heyufeng666888 commented on June 9, 2024

Hi @var77
I previously tested low throughput on mechanical hard drives, but there was a significant improvement in testing on MAC solid-state drives.

I tested the throughput of a single thread on the Mac and it is indeed the same as yours, but the processing speed of multiple threads and single threads is the same and there is no improvement.

from lantern.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.