mklab-iti / multimedia-indexing Goto Github PK
View Code? Open in Web Editor NEWA framework for large-scale feature extraction, indexing and retrieval.
License: Apache License 2.0
A framework for large-scale feature extraction, indexing and retrieval.
License: Apache License 2.0
The classes in the gr.iti.mklab.visual.quantization
package train the various models used by the classes in the gr.iti.mklab.visual.datastructures
package. It's hard to construct the data pipeline of training and testing the same complete search structure.
Hey guys,
Your project looks great but I was wondering if there is any docker-compose or dockerfile for multimedia-indexing ? or social sensor ?
Thanks in advance.
Cheers,
Luc Michalski
Symas's results showed that their LMDB is much faster than Berkeley DB, while InfluxDB proved that Facebook's RocksDB performed best overall.
Facebook RocksDB officially supports Java.
https://github.com/facebook/rocksdb/tree/master/java
https://github.com/facebook/rocksdb/wiki/RocksJava-Basics
<dependency>
<groupId>org.rocksdb</groupId>
<artifactId>rocksdbjni</artifactId>
<version>3.9.1</version>
</dependency>
Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB
RocksDB Performance Benchmarks
It takes too much time to compute the sum of within cluster distances which always displays as 0.0. Disabling it would speed up the learning process.
Thu Apr 23 10:40:05 CST 2015: Iter 1 Sum of within cluster distances: 0.0
Thu Apr 23 10:41:06 CST 2015: Iter 2 Sum of within cluster distances: 0.0
Thu Apr 23 10:41:39 CST 2015: Iter 3 Sum of within cluster distances: 0.0
Thu Apr 23 10:42:11 CST 2015: Iter 4 Sum of within cluster distances: 0.0
Thu Apr 23 10:42:40 CST 2015: Iter 5 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:05 CST 2015: Iter 6 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:28 CST 2015: Iter 7 Sum of within cluster distances: 0.0
Thu Apr 23 10:43:50 CST 2015: Iter 8 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:11 CST 2015: Iter 9 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:31 CST 2015: Iter 10 Sum of within cluster distances: 0.0
Thu Apr 23 10:44:52 CST 2015: Iter 11 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:11 CST 2015: Iter 12 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:32 CST 2015: Iter 13 Sum of within cluster distances: 0.0
Thu Apr 23 10:45:52 CST 2015: Iter 14 Sum of within cluster distances: 0.0
Thu Apr 23 10:46:13 CST 2015: Iter 15 Sum of within cluster distances: 0.0
The highly popular in-memory big data computation platform Spark uses Kryo as its faster serialization library. The major benefit besides speed is the ability to easily manage complex models. A multi-dimensional array or a nested Java object graph could be saved to and loaded from a single file. I have successfully used Kyro to serialize other algorithms' models one of which included a few hundred smaller sub-models.
public static <T> void save(T model, String path)
throws FileNotFoundException {
Kryo kryo = new Kryo();
Output output = new Output(new FileOutputStream(path));
kryo.writeObject(output, model);
output.close();
}
public static <T> T load(String path, Class<T> classT) throws FileNotFoundException {
Kryo kryo = new Kryo();
Input input = new Input(new FileInputStream(path));
T model = kryo.readObject(input, classT);
input.close();
return model;
}
With Kryo, it is no longer necessary to pass in number of centroids and cenroid length any more. All the information lives with the data in the self-containing model file. The users don't have remember how big are their many models any more.
public static double[][][] readQuantizers(String[] filenames, int[] numCentroids, int centroidLength)
throws IOException {
int numQuantizers = filenames.length;
double[][][] quantizers = new double[numQuantizers][][];
for (int i = 0; i < numQuantizers; i++) {
quantizers[i] = AbstractFeatureAggregator.readQuantizer(filenames[i], numCentroids[i],
centroidLength);
}
return quantizers;
}
becomes
ModelUtils.load(productQuantizationFilePath, ProductQuantizatizer.class);
class ProductQuantizatizer {
double[][][] data;
int[] numCentroids;
int[] centroidLengths;
// empty constructor for Kryo
ProductQuantizatizer() {
}
// constructor generated from the fields using Eclipse Source generation
// getters and setters generated from the fields using Eclipse Source generation
}
In 2013, there are two important improvements of Product Quantization. Optimized Product Quantization non-parametric solution [2] was equivalent to the Cartesian k-means [1] and performed better than PQ. In 2014, Locally Optimized Product Quantization [3] further improved upon OPQ.
SIFT1B with 64-bit codes, K=213=8192 and w=64. For Multi-D-ADC, K=214 and T=100K.
SIFT1B with 128-bit codes and K=213=8192 (resp. K=214) for single index (resp. multi-index). For IVFADC+R and LOPQ+R, mโฒ=8, w=64.
Means vector must be equal to a vector where each component must be the average of the corresponding column components of the matrix A, instead we ended with a vector where each component is equal to the last row component divided by the number of samples. See more in class at row 146.
fgrep -xf IVFPQ.java PQ.java
showed more than 400 common lines while IVFPQ had 700+ lines and PQ had 500+ lines. Using PQ as a member of IVFPQ will lead to much simpler implementation of IVFPQ.
SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation
"On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X)."
Selecting top k items from a list efficiently in Java / Groovy
PriorityQueue: 300ms
Guava Ordering: 170ms
Google Guava MinMaxPriorityQueue
"A min-max priority queue can be configured with a maximum size. If so, each time the size of the queue exceeds that value, the queue automatically removes its greatest element according to its comparator (which might be the element that was just added). This is different from conventional bounded queues, which either block or reject new elements when full."
In gr.iti.mklab.visual.examples.FolderIndexingMT
, the image names are incorrect if the original filenames contain more than one dot or don't end with ".jpg".
if (imvr != null) {
String name = imvr.getImageName();
name = name.split("\\.")[0] + ".jpg";
if (imvr.getExceptionMessage() == null) {
// vectorization completed with success!s
double[] vector = imvr.getImageVector();
if (index.indexVector(name, vector)) {
In gr.iti.mklab.visual.examples.UrlIndexingMT
, the image names are kept as what they are.
if (imvr != null) {
String name = imvr.getImageName();
double[] vector = imvr.getImageVector();
if (index.indexVector(name, vector)) {
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.