Git Product home page Git Product logo

images4's Introduction

Find similar images with Go (LATEST VERSION)

Near duplicates and resized images can be found with the package. No dependencies.

Demo: similar pictures search and clustering (deployed from).

Semantic versions:

  • v4 (/images4) - this repository, latest recommended,
  • v3 (/images3),
  • v1/v2 (/images).

All versions will be kept available indefinitely.

Release note (v4): simplified func Icon; more than 2x reduction of icon memory footprint; removal of all dependencies; removal of hashes (a separate new package imagehash can be used for fast large scale preliminary search); fixed GIF support; new func IconNN. The main improvements in v4 are package simplification and memory footprint reduction.

Key functions

Open supports JPEG, PNG and GIF. But other image types can be used through third-party decoders, because input for func Icon is Golang image.Image.

Icon produces "image hashes" called "icons", which will be used for comparision.

Similar gives a verdict whether 2 images are similar with well-tested default thresholds.

EucMetric can be used instead, when you need different precision or want to sort by similarity. Func PropMetric can be used for customization of image proportion threshold.

Go doc for code reference.

Example of comparing 2 images

package main

import (
	"fmt"
	"github.com/vitali-fedulov/images4"
)

func main() {

	// Photos to compare.
	path1 := "1.jpg"
	path2 := "2.jpg"

	// Open files (discarding errors here).
	img1, _ := images4.Open(path1)
	img2, _ := images4.Open(path2)

	// Icons are compact image representations (image "hashes").
	// Name "hash" is not used intentionally.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Comparison.
	// Images are not used directly. Icons are used instead,
	// because they have tiny memory footprint and fast to compare.
	if images4.Similar(icon1, icon2) {
		fmt.Println("Images are similar.")
	} else {
		fmt.Println("Images are distinct.")
	}
}

Algorithm

Detailed explanation, also as a PDF.

Summary: Images are resized in a special way to squares of fixed size called "icons". Euclidean distance between the icons is used to give the similarity verdict. Also image proportions are used to avoid matching images of distinct shape.

Customization suggestions

To increase precision you can either use your own thresholds in func EucMetric (and PropMetric) OR generate icons for image sub-regions and compare those icons.

To speedup file processing you may want to generate icons for available image thumbnails. Specifically, many JPEG images contain EXIF thumbnails, you could considerably speedup the reads by using decoded thumbnails to feed into func Icon. External packages to read thumbnails: 1 and 2. A note of caution: in rare cases there could be issues with thumbnails not matching image content. EXIF standard specification: 1 and 2.

To search in very large image collections (billions or more), use preliminary hash-table-based filtering with package imagehash.

images4's People

Contributors

vitali-fedulov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.