Git Product home page Git Product logo

imagecluster's Introduction

About

Package for comparing and clustering images by content. We use a pre-trained deep convolutional neural network for calculating image fingerprints, which are then used to cluster similar images.

Install

$ git clone https://github.com/elcorto/imagecluster.git
$ cd imagecluster
$ pip3 install -e .

or

$ python3 setup develop --prefix=...

Usage

We use a pre-trained keras NN model. The weights will be downloaded once by keras automatically upon first import and placed into ~/.keras/models/.

See imagecluster.main.main() for a usage example.

If there is no fingerprints database, it will first run all images through the NN model and calculate fingerprints. Then it will cluster the images based on the fingerprints and a similarity index sim=0...1 (more details below).

Example session:

>>> from imagecluster import main
>>> main.main('/path/to/testpics/', sim=0.5)
no fingerprints database /path/to/testpics/imagecluster/fingerprints.pk found
running all images through NN model ...
/path/to/testpics/DSC_1061.JPG
/path/to/testpics/DSC_1080.JPG
...
/path/to/testpics/DSC_1087.JPG
clustering ...
cluster dir: /path/to/testpics/imagecluster/clusters
cluster size : ncluster
2 : 7
3 : 2
4 : 4
5 : 1
10 : 1

Have a look at the clusters (as dirs with symlinks to the relevant files):

$ tree /path/to/testpics/imagecluster/clusters
/path/to/testpics/imagecluster/clusters
├── cluster_with_10
│   └── cluster_0
│       ├── DSC_1068.JPG -> /path/to/testpics/DSC_1068.JPG
│       ├── DSC_1070.JPG -> /path/to/testpics/DSC_1070.JPG
│       ├── DSC_1071.JPG -> /path/to/testpics/DSC_1071.JPG
│       ├── DSC_1072.JPG -> /path/to/testpics/DSC_1072.JPG
│       ├── DSC_1073.JPG -> /path/to/testpics/DSC_1073.JPG
│       ├── DSC_1074.JPG -> /path/to/testpics/DSC_1074.JPG
│       ├── DSC_1075.JPG -> /path/to/testpics/DSC_1075.JPG
│       ├── DSC_1076.JPG -> /path/to/testpics/DSC_1076.JPG
│       ├── DSC_1077.JPG -> /path/to/testpics/DSC_1077.JPG
│       └── DSC_1078.JPG -> /path/to/testpics/DSC_1078.JPG
├── cluster_with_2
│   ├── cluster_0
│   │   ├── DSC_1037.JPG -> /path/to/testpics/DSC_1037.JPG
│   │   └── DSC_1038.JPG -> /path/to/testpics/DSC_1038.JPG
│   ├── cluster_1
│   │   ├── DSC_1053.JPG -> /path/to/testpics/DSC_1053.JPG
│   │   └── DSC_1054.JPG -> /path/to/testpics/DSC_1054.JPG
│   ├── cluster_2
│   │   ├── DSC_1046.JPG -> /path/to/testpics/DSC_1046.JPG
│   │   └── DSC_1047.JPG -> /path/to/testpics/DSC_1047.JPG
...

If you run this again on the same directory, only the clustering will be repeated.

Methods

Clustering and similarity index

We use hierarchical clustering (imagecluster.cluster()). The image fingerprints (4096-dim vectors) are compared using a distance metric and similar images are put together in a cluster. The threshold for what counts as similar is defined by a similarity index.

We use the similarity index sim=0...1 to define the height at which we cut through the dendogram tree built by the hierarchical clustering. sim=0 is the root of the dendogram where there is only one node (= all images in one cluster). sim=1 is equal to the top of the dendogram tree, where each image is its own cluster. By varying the index between 0 and 1, we thus increase the number of clusters from 1 to the number of images.

However, note that we only report clusters with at least 2 images, such that sim=1 will in fact produce no results at all (unless there are completely identical images).

Image fingerprints

The original goal was to have a clustering based on classification of image content such as: image A this an image of my kitchen; image B is also an image of my kitchen, only from a different angle and some persons in the foreground, but the information (this is my kitchen) is the same. This is a feature-detection task which relies on the ability to recognize the content of the scene, regardless of other scene parameters (like view angle, color, light, ...). It turns out that we can use deep convolutional neural networks (convnets) for the generation of good feature vectors, e.g. a feature vector that always encodes the information "my kitchen", since deep nets, once trained on many different images, have developed an internal representation of objects like chair, boat, car .. and kitchen. Simple image hashing, which we used previously, is rather limited in that respect. It only does a very pedestrian smoothing / low-pass filtering to reduce the noise and extract the "important" parts of the image. This helps to find duplicates and almost-duplicates in a collection of photos.

To this end, we use a pre-trained NN (VGG16 as implemented by Keras). The network was trained on ImageNet and is able to categorize images into 1000 classes (the last layer has 1000 nodes). We chop off the last layer (thanks for the hint!) and use the activations of the second to last fully connected layer (4096 nodes) as image fingerprints (numpy 1d array of shape (4096,)).

The package can detect images which are rather similar, e.g. the same scene photographed twice or more with some camera movement in between, or a scene with the same background and e.g. one person exchanged. This was also possible with image hashes.

Now with NN-based fingerprints, we also cluster all sorts of images which have, e.g. mountains, tents, or beaches, so this is far better. However, if you run this on a large collection of images which contain images with tents or beaches, then the system won't recognize that certain images belong together because they were taken on the same trip, for instance. All tent images will be in one cluster, and so will all beaches images. This is probably b/c in this case, the human classification of the image works by looking at the background as well. A tent in the center of the image will always look the same, but it is the background which makes humans distinguish the context. The problem is: VGG16 and all the other popular networks have been trained on ridiculously small images of 224x224 size because of computational limitations, where it is impossible to recognize background details. Another point is that the background image triggers the activation of meta-information associated with that background in the human -- data which wasn't used when training ImageNet, of course. Thus, one way to improve things would be to re-train the network using this information. But then one would have labeled all images by hand again.

Tests

Run nosetests3 (nosetests for Python3, Linux).

imagecluster's People

Contributors

elcorto avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.