Git Product home page Git Product logo

videohash's Introduction


The Python package for near duplicate video detection

Build Status Build Status Build Status codecov Total alerts Language grade: Python pypi Downloads GitHub lastest commit PyPI - Python Version


⭐️ Introduction

Videohash is a Python package for detecting near-duplicate videos (Perceptual Video Hashing). It can take any input video and generate a 64-bit equivalent hash value. Videohash is way more faster than comparing the imagehash values of individual frames of the video and more reliable than hashing keyframes.

The video-hash-values for identical or near-duplicate videos are the same or similar, implying that if the video is resized (upscaled/downscaled), transcoded, watermark added/removed, stabilized, color changed, frame rate changed, changed aspect ratio, cropped, black-bars added or removed, the hash-value should remain unchanged or not vary substantially.

How the hash values are calculated

  • In layman's terms : Every one second, a frame from the input video is extracted, the frames are shrunk to a 144x144 pixel square, a collage is constructed that contains all of the resized frames(square-shaped), the collage's wavelet hash is the video hash value for the original input video.

When not to use Videohash

  • Videohash cannot be used to verify whether one video is a part of another (video fingerprinting). If the video is reversed or rotated by a substantial angle (greater than 10 degrees), Videohash will not provide the same or similar hash result, but you can always reverse the video manually and generate the hash value for reversed video.

How to compare the video hash values stored in a database


🏗 Installation

To use this software, you must have FFmpeg installed. Please read how to install FFmpeg if you don't already know how.

Install videohash

Upgrade pip

python3 -m pip install --upgrade pip

If you do not want to upgrade pip and the installation fails try appending --prefer-binary to the following installation command(s).

  • Install from the PyPi (recommended):
pip install videohash
  • Install directly from the GitHub repository (NOT recommended):
pip install git+https://github.com/akamhy/videohash.git

🌱 Features

  • It is fast!
  • Generate videohash of a video directly from its URL(uses yt-dlp) or its path.
  • Can be used as the core of a scalable Near Duplicate Video Retrieval (NDVR) system.
  • The end-user can access the image representation(the collage) of the video.
  • A videohash instance can be compared to a 64-bit stored hash, its hex representation, bitlist, and other videohash instances.

🚀 Usage

In the following usage example the first three instance of VideoHash class are computing the hash for the same video(not same as in checksum) and the last one is a different video.

>>> from videohash import VideoHash
>>> # video: Artemis I Hot Fire Test
>>> url1 = "https://www.youtube.com/watch?v=PapBjpzRhnA"
>>> videohash1 = VideoHash(url=url1)
>>>
>>> videohash1.hash # video hash value of the file, value is same as str(videohash1)
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>>
>>> #VIDEO:Artemis I Hot Fire Test
>>> url2="https://raw.githubusercontent.com/akamhy/videohash/main/assets/rocket.mkv"
>>> videohash2 = VideoHash(url=url2)
>>> videohash2.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash2.hash_hex
'0x341fefff8f780000'
>>> videohash2.hash_hex
'0x341fefff8f780000'
>>> videohash1 - videohash2
0
>>> videohash1 == videohash2
True
>>> videohash1 == "0b0011010000011111111011111111111110001111011110000000000000000000"
True
>>> videohash1 != videohash2
False
>>> path3 = "/home/akamhy/Downloads/rocket.mkv" #VIDEO: Artemis I Hot Fire Test
>>> videohash3 = VideoHash(path=path3)
>>> videohash3.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash3 - videohash2
0
>>> videohash3 == videohash1
True
>>> url4 = "https://www.youtube.com/watch?v=_T8cn2J13-4" #VIDEO: How We Are Going to the Moon
>>> videohash4 = VideoHash(url=url4)
>>> videohash4.hash_hex
'0x7cffff000000eff0'
>>> videohash4 - "0x7cffff000000eff0"
0
>>> videohash4.hash
'0b0111110011111111111111110000000000000000000000001110111111110000'
>>> videohash4 - videohash2
34
>>> videohash4 != videohash2
True

Run the above code @ https://replit.com/@akamhy/videohash-usage-2xx-example-code-for-video-hashing#main.py

Extended Usage : https://github.com/akamhy/videohash/wiki/Extended-Usage

API Reference : https://github.com/akamhy/videohash/wiki/API-Reference


🙏 Credits


🛡 License

License: MIT

Copyright (c) 2021 Akash Mahanty. See license for details.

The VideoHash logo was created by iconolocode. See license for details.

Videos are from NASA and are in the public domain.

NASA copyright policy states that "NASA material is not protected by copyright unless noted".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.