Git Product home page Git Product logo

Comments (1)

jbellis avatar jbellis commented on May 25, 2024

TL;DR Lucene's hand-unrolled code seems to actually be worse than Jake's simplified versions. Lucene nosimd:simd ratio is 1.37. JVector nosimd:simd is 1.45. Jake was seeing ~1.25 but it looks like that's just differences in hardware.

ChatGPT does the math: https://chat.openai.com/share/2419df9d-3159-4225-b719-7682f82d0768

Lucene without simd:

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
HNSW   M=16 ef=160: top 100/1 recall 0.7093, build 27.70s, query 1.37s. 40402092 nodes visited
HNSW   M=16 ef=160: top 100/2 recall 0.7725, build 27.70s, query 1.85s. 68021786 nodes visited
HNSW   M=16 ef=200: top 100/1 recall 0.7327, build 34.89s, query 1.11s. 39631050 nodes visited
HNSW   M=16 ef=200: top 100/2 recall 0.7865, build 34.89s, query 1.85s. 68973674 nodes visited
HNSW   M=16 ef=400: top 100/1 recall 0.7640, build 63.68s, query 1.23s. 44485000 nodes visited
HNSW   M=16 ef=400: top 100/2 recall 0.8114, build 63.68s, query 2.05s. 77989640 nodes visited
HNSW   M=16 ef=600: top 100/1 recall 0.7724, build 91.02s, query 1.26s. 46114612 nodes visited
HNSW   M=16 ef=600: top 100/2 recall 0.8183, build 91.02s, query 2.17s. 81109964 nodes visited
HNSW   M=16 ef=800: top 100/1 recall 0.7748, build 116.84s, query 1.30s. 46717632 nodes visited
HNSW   M=16 ef=800: top 100/2 recall 0.8211, build 116.84s, query 2.21s. 82387410 nodes visited
HNSW   M=24 ef=160: top 100/1 recall 0.7511, build 45.21s, query 1.30s. 48685892 nodes visited
HNSW   M=24 ef=160: top 100/2 recall 0.8062, build 45.21s, query 2.29s. 86242262 nodes visited
HNSW   M=24 ef=200: top 100/1 recall 0.7721, build 53.83s, query 1.41s. 52491700 nodes visited
HNSW   M=24 ef=200: top 100/2 recall 0.8220, build 53.83s, query 2.49s. 93662468 nodes visited
HNSW   M=24 ef=400: top 100/1 recall 0.8072, build 95.76s, query 1.70s. 63279642 nodes visited
HNSW   M=24 ef=400: top 100/2 recall 0.8522, build 95.76s, query 2.95s. 113996650 nodes visited
HNSW   M=24 ef=600: top 100/1 recall 0.8167, build 134.56s, query 1.78s. 67251422 nodes visited
HNSW   M=24 ef=600: top 100/2 recall 0.8603, build 134.56s, query 3.11s. 121447002 nodes visited

Lucene with simd:

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
HNSW   M=16 ef=160: top 100/1 recall 0.7085, build 20.54s, query 1.04s. 37141336 nodes visited
HNSW   M=16 ef=160: top 100/2 recall 0.7716, build 20.54s, query 1.34s. 64743348 nodes visited
HNSW   M=16 ef=200: top 100/1 recall 0.7334, build 25.56s, query 0.86s. 39514380 nodes visited
HNSW   M=16 ef=200: top 100/2 recall 0.7869, build 25.56s, query 1.46s. 68945092 nodes visited
HNSW   M=16 ef=400: top 100/1 recall 0.7638, build 47.40s, query 0.95s. 44509010 nodes visited
HNSW   M=16 ef=400: top 100/2 recall 0.8112, build 47.40s, query 1.62s. 78002356 nodes visited
HNSW   M=16 ef=600: top 100/1 recall 0.7722, build 68.20s, query 0.99s. 46143760 nodes visited
HNSW   M=16 ef=600: top 100/2 recall 0.8181, build 68.20s, query 1.64s. 81123014 nodes visited
HNSW   M=16 ef=800: top 100/1 recall 0.7745, build 88.20s, query 0.99s. 46701082 nodes visited
HNSW   M=16 ef=800: top 100/2 recall 0.8211, build 88.20s, query 1.66s. 82386926 nodes visited
HNSW   M=24 ef=160: top 100/1 recall 0.7524, build 29.63s, query 1.04s. 48835936 nodes visited
HNSW   M=24 ef=160: top 100/2 recall 0.8060, build 29.63s, query 1.75s. 86302652 nodes visited
HNSW   M=24 ef=200: top 100/1 recall 0.7725, build 36.51s, query 1.09s. 52468796 nodes visited
HNSW   M=24 ef=200: top 100/2 recall 0.8223, build 36.51s, query 1.89s. 93635118 nodes visited
HNSW   M=24 ef=400: top 100/1 recall 0.8071, build 68.52s, query 1.30s. 63247954 nodes visited
HNSW   M=24 ef=400: top 100/2 recall 0.8521, build 68.52s, query 2.24s. 114010318 nodes visited
HNSW   M=24 ef=600: top 100/1 recall 0.8168, build 98.29s, query 1.37s. 67252014 nodes visited
HNSW   M=24 ef=600: top 100/2 recall 0.8603, build 98.29s, query 2.39s. 121441292 nodes visited

JVector without simd:

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Index   M=16 ef=160: top 100/1 recall 0.7140, build 30.20s, query 4.91s. 185528320 nodes visited
Index   M=16 ef=160: top 100/2 recall 0.7720, build 30.20s, query 8.29s. 323914800 nodes visited
Index   M=16 ef=200: top 100/1 recall 0.7241, build 38.65s, query 5.20s. 207153550 nodes visited
Index   M=16 ef=200: top 100/2 recall 0.7853, build 38.65s, query 9.00s. 355127830 nodes visited
Index   M=16 ef=400: top 100/1 recall 0.7622, build 70.34s, query 5.82s. 229773510 nodes visited
Index   M=16 ef=400: top 100/2 recall 0.8115, build 70.34s, query 10.01s. 397866900 nodes visited
Index   M=16 ef=600: top 100/1 recall 0.7718, build 98.29s, query 6.02s. 237238330 nodes visited
Index   M=16 ef=600: top 100/2 recall 0.8192, build 98.29s, query 10.40s. 412561070 nodes visited
Index   M=16 ef=800: top 100/1 recall 0.7749, build 126.83s, query 6.09s. 240236370 nodes visited
Index   M=16 ef=800: top 100/2 recall 0.8224, build 126.83s, query 10.54s. 419315380 nodes visited
Index   M=24 ef=160: top 100/1 recall 0.7473, build 46.57s, query 6.30s. 250193170 nodes visited
Index   M=24 ef=160: top 100/2 recall 0.8051, build 46.57s, query 11.04s. 438624360 nodes visited
Index   M=24 ef=200: top 100/1 recall 0.7700, build 57.13s, query 6.76s. 268582770 nodes visited
Index   M=24 ef=200: top 100/2 recall 0.8220, build 57.13s, query 11.84s. 475761230 nodes visited
Index   M=24 ef=400: top 100/1 recall 0.8069, build 102.66s, query 8.10s. 324259460 nodes visited
Index   M=24 ef=400: top 100/2 recall 0.8512, build 102.66s, query 14.42s. 578344450 nodes visited
Index   M=24 ef=600: top 100/1 recall 0.8167, build 143.75s, query 8.46s. 341072130 nodes visited
Index   M=24 ef=600: top 100/2 recall 0.8598, build 143.75s, query 15.14s. 611984800 nodes visited

JVector with simd:

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Index   M=16 ef=160: top 100/1 recall 0.7127, build 19.85s, query 3.70s. 187676330 nodes visited
Index   M=16 ef=160: top 100/2 recall 0.7728, build 19.85s, query 6.26s. 326377860 nodes visited
Index   M=16 ef=200: top 100/1 recall 0.7232, build 25.49s, query 3.90s. 206051240 nodes visited
Index   M=16 ef=200: top 100/2 recall 0.7839, build 25.49s, query 6.76s. 354166730 nodes visited
Index   M=16 ef=400: top 100/1 recall 0.7620, build 47.78s, query 4.32s. 228629970 nodes visited
Index   M=16 ef=400: top 100/2 recall 0.8105, build 47.78s, query 7.47s. 396610530 nodes visited
Index   M=16 ef=600: top 100/1 recall 0.7715, build 69.82s, query 4.44s. 236785010 nodes visited
Index   M=16 ef=600: top 100/2 recall 0.8187, build 69.82s, query 7.74s. 412374650 nodes visited
Index   M=16 ef=800: top 100/1 recall 0.7759, build 90.16s, query 4.50s. 239333220 nodes visited
Index   M=16 ef=800: top 100/2 recall 0.8222, build 90.16s, query 7.85s. 418157600 nodes visited
Index   M=24 ef=160: top 100/1 recall 0.7464, build 29.46s, query 4.67s. 252084040 nodes visited
Index   M=24 ef=160: top 100/2 recall 0.8061, build 29.46s, query 8.26s. 441173410 nodes visited
Index   M=24 ef=200: top 100/1 recall 0.7694, build 35.74s, query 5.04s. 271780770 nodes visited
Index   M=24 ef=200: top 100/2 recall 0.8218, build 35.74s, query 8.95s. 478582210 nodes visited
Index   M=24 ef=400: top 100/1 recall 0.8082, build 69.16s, query 5.98s. 323595720 nodes visited
Index   M=24 ef=400: top 100/2 recall 0.8521, build 69.16s, query 10.64s. 577337550 nodes visited
Index   M=24 ef=600: top 100/1 recall 0.8166, build 99.86s, query 6.29s. 341908910 nodes visited
Index   M=24 ef=600: top 100/2 recall 0.8596, build 99.86s, query 11.29s. 613498820 nodes visited

from jvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.