Git Product home page Git Product logo

consistent-weighted-sampling's Introduction

Improved Consistent Weighted Sampling

As datasets become larger and more high-dimensional, it becomes increasingly important to find data representations that allow compact storage and efficient distance computation and retrieval. Improved Consistent Weighted Sampling(ICWS, ICDM, 2010) is the state-of-the-art of the sampling methods in weighted sets. There are some variants of ICWS which decreases the time of sampling or increase the quality of samples.

This repository implements ICWS and its variants.

Build Instruction

$ git clone [email protected]:yuuun/Consistent-Weighted-Sampling.git
$ cd Consistent-Weighted-Sampling
$ pip3 install -r requirements.txt
$ mkdir dataset

After Downloading datasets to 'dataset/' file, you can run the file by the following command

$ python3 main.py

Datasets

  • Datasets can be downloaded in LIBSVM

Input File Data Formats

[label1] [idx1_1]:[weight1_1] [idx1_2]:[weight1_2] ...
[label2] [idx2_1]:[weight2_1] [idx2_2]:[weight2_2] ...
[label3] [idx3_1]:[weight3_1] [idx3_2]:[weight3_2] ...

Example

1 1:5 3:7 10:9
2 2:6 3:1 8:10 

Models

  • ICWS
    • state of the CWS method that improves the effectiveness and efficiency
  • 0-bit CWS
    • improves the space efficiency and GJS estimation time
  • CCWS
    • improves the time efficiency of ICWS by removing the cacluation of logarithm
  • PCWS
    • replace one of the two gamma distributions with uniform distribution which improves both time and space complexities in ICWS
  • I2CWS
    • hashes yk and zk separately which makes k' and yk' dependent
  • BCWS
    • applies OPH on CWS and increased the time efficiency dramatically

Experiments

The classification accuracy of Generalized Jaccard Similarity

Classification Accuracy
Mnist 93.6%

Evaluation

  • Hashing Time

  • Classification Accuracy

  • Precision(the criteria of Jaccard Similarity)

  • The experiments will be depicted as follows,

[Name of Method] | [Dataset]([Number of Samples])

Time

Method Mnist(100) Mnist(500)
ICWS 175.4s 869.6s
0-bit CWS 169.3s 842.1s
CCWS 129.2s 647.4s
PCWS 154.4s 767.0s
I2CWS 162.3s 805.7s
BCWS 6.3s 17.9s

Accuracy

Method Mnist(100) Mnist(500)
ICWS 88.7% 92.6%
0-bit CWS 90.7% 94.1%
CCWS 79.3% 81.8%
PCWS 90.6% 93.2%
I2CWS 89.7% 92.5%
BCWS 84.3% 84.4%

Precision@10

Method Mnist(100) Mnist(500)
ICWS 59.9% 80.4%
0-bit CWS 61.6% 80.5%
CCWS 35.43% 40.8%
PCWS 60.9% 79.5%
I2CWS 59.2% 78.8%
BCWS 46.4% 46.4%

Precision@100

Method Mnist(100) Mnist(500)
ICWS 71.1% 86.8%
0-bit CWS 71.8% 86.3%
CCWS 48.7% 54.8%
PCWS 70.5% 85.8%
I2CWS 69.9% 84.8%
BCWS 57.5% 52.8%

References

consistent-weighted-sampling's People

Contributors

yuuun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.