Git Product home page Git Product logo

multimedia-indexing's People

Contributors

kleinmind avatar lefman avatar manosetro avatar markzampoglou avatar tzeikob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multimedia-indexing's Issues

Docker Container

Hey guys,

Your project looks great but I was wondering if there is any docker-compose or dockerfile for multimedia-indexing ? or social sensor ?

Thanks in advance.

Cheers,
Luc Michalski

Berkeley DB is too slow compared to recent DB engines

Symas's results showed that their LMDB is much faster than Berkeley DB, while InfluxDB proved that Facebook's RocksDB performed best overall.

Facebook RocksDB officially supports Java.
https://github.com/facebook/rocksdb/tree/master/java
https://github.com/facebook/rocksdb/wiki/RocksJava-Basics

<dependency>
    <groupId>org.rocksdb</groupId>
    <artifactId>rocksdbjni</artifactId>
    <version>3.9.1</version>
</dependency>

Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB

RocksDB Performance Benchmarks

image

Symas On-Disk Microbenchmark

Symas In-Memory Microbenchmark

  • On disk, 4000 byte values, Physical Server
    image

image

  • In memory, larger data set
    image

image

  • In memory, small data set
    image

image

Sum of within cluster distances is always 0.0

It takes too much time to compute the sum of within cluster distances which always displays as 0.0. Disabling it would speed up the learning process.

Thu Apr 23 10:40:05 CST 2015: Iter 1 Sum of within cluster distances: 0.0
Thu Apr 23 10:41:06 CST 2015: Iter 2 Sum of within cluster distances: 0.0
Thu Apr 23 10:41:39 CST 2015: Iter 3 Sum of within cluster distances: 0.0
Thu Apr 23 10:42:11 CST 2015: Iter 4 Sum of within cluster distances: 0.0
Thu Apr 23 10:42:40 CST 2015: Iter 5 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:05 CST 2015: Iter 6 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:28 CST 2015: Iter 7 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:50 CST 2015: Iter 8 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:11 CST 2015: Iter 9 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:31 CST 2015: Iter 10 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:52 CST 2015: Iter 11 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:11 CST 2015: Iter 12 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:32 CST 2015: Iter 13 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:52 CST 2015: Iter 14 Sum of within cluster distances: 0.0
Thu Apr 23 10:46:13 CST 2015: Iter 15 Sum of within cluster distances: 0.0

Kryo could be used for much easier and more efficient serialization

The highly popular in-memory big data computation platform Spark uses Kryo as its faster serialization library. The major benefit besides speed is the ability to easily manage complex models. A multi-dimensional array or a nested Java object graph could be saved to and loaded from a single file. I have successfully used Kyro to serialize other algorithms' models one of which included a few hundred smaller sub-models.

 public static <T> void save(T model, String path)
      throws FileNotFoundException {
    Kryo kryo = new Kryo();
    Output output = new Output(new FileOutputStream(path));
    kryo.writeObject(output, model);
    output.close();
  }

  public static <T> T load(String path, Class<T> classT) throws FileNotFoundException {
    Kryo kryo = new Kryo();
    Input input = new Input(new FileInputStream(path));
    T model = kryo.readObject(input, classT);
    input.close();
    return model;
  }  

With Kryo, it is no longer necessary to pass in number of centroids and cenroid length any more. All the information lives with the data in the self-containing model file. The users don't have remember how big are their many models any more.

public static double[][][] readQuantizers(String[] filenames, int[] numCentroids, int centroidLength)
            throws IOException {
        int numQuantizers = filenames.length;
        double[][][] quantizers = new double[numQuantizers][][];
        for (int i = 0; i < numQuantizers; i++) {
            quantizers[i] = AbstractFeatureAggregator.readQuantizer(filenames[i], numCentroids[i],
                    centroidLength);
        }
        return quantizers;

    }

becomes

ModelUtils.load(productQuantizationFilePath, ProductQuantizatizer.class);
class ProductQuantizatizer {
    double[][][] data;
    int[] numCentroids;
    int[] centroidLengths;
    // empty constructor for Kryo
    ProductQuantizatizer() {
    }

    // constructor generated from the fields using Eclipse Source generation

    // getters and setters generated from the fields using Eclipse Source generation
}

Optimized Product Quantization and Locally Optimized Product Quantization

In 2013, there are two important improvements of Product Quantization. Optimized Product Quantization non-parametric solution [2] was equivalent to the Cartesian k-means [1] and performed better than PQ. In 2014, Locally Optimized Product Quantization [3] further improved upon OPQ.

image
SIFT1B with 64-bit codes, K=213=8192 and w=64. For Multi-D-ADC, K=214 and T=100K.

image
SIFT1B with 128-bit codes and K=213=8192 (resp. K=214) for single index (resp. multi-index). For IVFADC+R and LOPQ+R, mโ€ฒ=8, w=64.

  1. Mohammad Norouzi, David J. Fleet. Cartesian k-means. IEEE Computer Vision and Pattern Recognition (CVPR), 2013.
  2. Optimized Product Quantization, by Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun, in TPAMI.
  3. Y. Kalantidis, Y. Avrithis. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. In Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, Ohio, June 2014.

A major bug found in pca calculating the means adjustment vector

Means vector must be equal to a vector where each component must be the average of the corresponding column components of the matrix A, instead we ended with a vector where each component is equal to the last row component divided by the number of samples. See more in class at row 146.

Google Guava MinMaxPriorityQueue is a faster bounded priority queue with excellent Javadoc

SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation
"On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X)."

Selecting top k items from a list efficiently in Java / Groovy
PriorityQueue: 300ms
Guava Ordering: 170ms

Google Guava MinMaxPriorityQueue
"A min-max priority queue can be configured with a maximum size. If so, each time the size of the queue exceeds that value, the queue automatically removes its greatest element according to its comparator (which might be the element that was just added). This is different from conventional bounded queues, which either block or reject new elements when full."

The image names indexed in FolderIndexingMT may be incorrect

In gr.iti.mklab.visual.examples.FolderIndexingMT, the image names are incorrect if the original filenames contain more than one dot or don't end with ".jpg".

 if (imvr != null) {
        String name = imvr.getImageName();
        name = name.split("\\.")[0] + ".jpg";
        if (imvr.getExceptionMessage() == null) {
          // vectorization completed with success!s
          double[] vector = imvr.getImageVector();
          if (index.indexVector(name, vector)) {

In gr.iti.mklab.visual.examples.UrlIndexingMT, the image names are kept as what they are.

            if (imvr != null) {
                String name = imvr.getImageName();
                double[] vector = imvr.getImageVector();
                if (index.indexVector(name, vector)) {

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.