Git Product home page Git Product logo

deltagopher's Introduction

deltagopher

Command line utility to get the delta between 2 file. Includes signature, delta and patch commands.

Usage

generate a signature and a delta

./deltagopher --block-size 256 old.txt new.txt

It will display the diff, but deleted and to copy content won't be displayed. If you want a full report see the next command.

-  [ 647-657 ] ----------
+  [ 726-789 ] um vulputate gravida. Vestibulum ante ipsum primis in faucibus 
c  [ 779-789 ] ----------

where -, + and c - to delete, to insert and to copy,

generate a signature and a delta WITH FULL REPORT (!increases the size of generated files!)

./deltagopher --full --block-size 256 old.txt new.txt

generate a signature only

./deltagopher signature --block-size 256 --filename old.txt

generate a delta only

./deltagopher delta --sigfile signature.yml new.txt

Installation

You need a Golang version 1.16 or greater installed on your PC.

  1. To install dependencies, run from root directory:
go get
  1. To build the CLI
go build

Approach

deltagopher uses the Rolling Hash hash function to find a delta, it provides a way to efficiently traverse the sequence and calculate a new hash in O(1) time using the old hash. The total time for calculating a delta is O(n old file) + O(n changed file), where n in a length of the file.

Here is how it's done:

  1. Obtain a Signature from the first file. Signature represents a hashed chunks of the file with some metadata of chunks size, hashing algorythms and so on. Each chunk hashed in 2 ways - weak and strong. Weak hash is a adler32 checksum, strong hash - md5 hash.
type Checksum struct {
	WeakCheaksum   uint32   `yaml:"weak"`
	StrongChecksum [16]byte `yaml:"strong"`
	Start          int      `yaml:"start"`
	End            int      `yaml:"end"`
}
  1. Signature can be saved into a file (json/yml), and used to generate a Delta. Delta represents the difference between a new changed version of the file and the Signature of old file.
type Delta struct {
    Inserted []*SingleDelta `yaml:"insert,omitempty"`
    Deleted  []*SingleDelta `yaml:"delete,omitempty"`
    Copied   []*SingleDelta `yaml:"copy,omitempty"`
}
type SingleDelta struct {
    WeakCheaksum   uint32    `yaml:"weak,omitempty"`
    StrongChecksum *[16]byte `yaml:"strong,omitempty"`
    Start          int       `yaml:"start"`
    End            int       `yaml:"end"`
    
    // For Inserted
    DiffBytes []byte         `yaml:"diff,omitempty"`
}
  1. Delta is generated using the following algorithm:
    1. Load the first chunk of the file into buffer length Signature.blockSize
    2. Generate the weak hash for the chunk
    3. Search for this hash in the Signature.checksums from old file
      1. If it exists in Signature - it's a potential candidate to be a hash of the same text as for an old file. To see if it's exactly the text we had in the old file we compare strong hashes:
        1. if md5 hash old == md5 hash new => mathing, to be copied from the old file.
        2. if md5 hash old != md5 hash new => consider this block as new and to be inserted
        3. If it's not presented in old hashes - this block of the file to be inserted to the Delta
        4. Seek the reader pointer to size of the chunk to skip the chunk we already processed.
      2. If not - start rolling across the file content using Rolling Hash (adler32). On each step generate a new hash of the sliding window (it moves byte by byte), search for it in old checksums and generate a Delta on the fly.
    4. If some of the old hashes weren't found in the new file at all - they are to be deleted.

deltagopher's People

Contributors

boringmary avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.