Git Product home page Git Product logo

pynear's Introduction

Personal Projects

  • gpt-pdf-organizer: an automatic pdf file organizer based on llm api prompts such as chat gpt.
  • pynear: an optimized C++ library with python bindings for KNN search.
  • spline surface 2d: an efficient implementation of spline surface by using coon's patch for interpolating grid coordinates.
  • big-o-estimator: a trivial implementation of a time complexity estimator for algorithms
  • offscreen renderer: a template code for RTT (Render To Texture) using offscreen framebuffer object

Data

  • knowledge base: knowledge base repository for several subjects such as math and programming.

Config

pynear's People

Contributors

dobatymo avatar pablocael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dobatymo

pynear's Issues

Pypi package

Since we are building wheels on CI already, it would be easy to upload them to PyPI automatically through Github actions as well. We need to agree on a name though. Should we use pyvptree (it's still available on PyPI https://pypi.org/project/pyvptree/) or use another name, maybe more generic related to nearest neighbor search?

Improve Binary Indices performance

We can make use of the fact that binary distances have limited states and try to use some similar to a BK-tree or have a map between possible pairs and distances to speed up binary index calculation.

Feature request: more distance metrics

It would be great to support more distance metrics, especially some which cannot be emulated by pre/post-processing of the data.

I would also like to see the Haversine metric (https://en.wikipedia.org/wiki/Haversine_formula). This one can be emulated by transforming spherical latitude longitude coordinates to Cartesian first, but since this transform is a bit cumbersome, and also the memory usage increases from 2 to 3 dimensions it would be useful to use it directly. It's useful in GPS applications.

I can help implementing them, but my SIMD/AVX knowledge is very limited.

Update installation instruction to use pypi repo

Readmee install instructions is pointing to local build package by running pip install .

We can add a build section containing current instructions and a separate install section that will have pypi repo instructions.

Squared Euclidean not metric

Hi, great library!

But I think it’s incorrect to use the squared Euclidean optimization since vp trees require a real metric which I think the squared distance is not.

Fix Binary indices number of bits

Since binary indices must be used with proper data dimension, we have two options:

  1. Select proper implementation based on dimension
  2. Name indices for each size like BinaryIndex256, BinaryIndex64 and assert dimension * 8 = size

This is because hamming distances are optimized by the size of the bits:

Right now we use fixed 256 bits for hammind:

int64_t dist_hamming(const arrayli &p1, const arrayli &p2) {

    return hamming<256>(reinterpret_cast<const uint64_t *>(&p1[0]), reinterpret_cast<const uint64_t *>(&p2[0]));
}

So users can just set wrong data dimension and it will fail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.