Top-K Entity Resolution for Apache Spark. The algorithm is described in the paper "Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing" of Vasilis Verroios and Hector Garcia-Molina of Stanford University, available here. Some of code of Adaptive LSH is based on pyspark-lsh project, an implementation of the classic LSH tecnique.
ajaykumarr123 / pyspark-adalsh Goto Github PK
View Code? Open in Web Editor NEWThis project forked from dr-pato/pyspark-adalsh
PySpark implementation of Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing
License: GNU General Public License v3.0