Git Product home page Git Product logo

diff-match-patch's Introduction

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

  1. Diff:
    • Compare two blocks of plain text and efficiently return a list of differences.
    • Diff Demo
  2. Match:
    • Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
    • Match Demo
  3. Patch:
    • Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
    • Patch Demo

Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.

Reference

Languages

Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.

A standardized speed test tracks the relative performance of diffs in each language.

Algorithms

This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.

diff-match-patch's People

Contributors

alur avatar dmaclach avatar ingvarc avatar krzysiek84 avatar llimeht avatar neilfraser avatar nikolas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diff-match-patch's Issues

JavaScript implementation crashes on Unicode code points

I stumbled upon this project from a bug in a downstream project that uses this library, Codiad.

The following function throws an exception:

function testPatchUnicode() {
  var cp = '\uD800\uDDE4'; // U+101E4; cannot put directly in source file
  var patches = dmp.patch_make(cp + cp + cp + cp + cp + 'a', cp + cp + cp + cp + cp + 'ab');
  dmp.patch_toText(patches);
}

In general, any string that contains a supplemental code point, which are much more common recently with the rise of emoji, causes diff indices to be offset by some number of code points. This leads to strange or undefined behavior when applying the outputted patches.

This is a rather serious bug that is quietly affecting any downstream project that uses this library.

I think the best fix would be to rewrite the patch-to-string function to operate entirely in code point space instead of JavaScript's default code unit space.

This might also affect non-JavaScript implementations; I haven't looked.

P.S. I am on Google's i18n team and have seen issues like this before.

bug in diff_cleanupMerge() in C#

I tried to do the https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs in C#.

This is the code that I used (copied from the existing private List<Diff> diff_lineMode() method):

public List<Diff> diff_lineMode(string text1, string text2)
{
	// Scan the text on a line-by-line basis first.
	var a = diff_linesToChars(text1, text2);
	var lineText1 = (string)a[0];
	var lineText2 = (string)a[1];
	var lineArray = (List<string>)a[2];
	var diffs = diff_main(lineText1, lineText2, false);

	// Convert the diff back to original text.
	diff_charsToLines(diffs, lineArray);

	// Eliminate freak matches (e.g. blank lines)
	diff_cleanupSemantic(diffs);

	return diffs;
}

But when I compared these texts...

text1 = "
country|description\r\n
CN|CHINA\r\n
PH|PHILIPPINES\r\n
JP|JAPAN\r\n
UK|UNITED KINGDOM\r\n
USA|U.S.A.\r\n
ZA|SOUTH AFRICA
"
text2 = "
country|description\r\n
CN|REPUBLIC OF CHINA\r\n
PH|REPUBLIC OF THE PHILIPPINES\r\n
JP|JAPAN\r\n
UK|U.K.\r\n
USA|UNITED STATES OF AMERICA\r\n
ZA|S. AFRICA
"

The result is this:

image

The result should be something like this (except for the JP|JAPAN part which should be EQUAL):

image

I traced the code and found that the error is happening after the diff_cleanupMerge() method is called inside the diff_cleanupSemantic() method, at the lines of code which looks like this:

//// Normalize the diff.
if (changes)
{
	diff_cleanupMerge(diffs);
}

Perhaps there is a bug in the diff_cleanupMerge() method when used to compare texts line by line.

Thanks for creating this tool, and thank you for the one who will fix the bug :)

Add Line work diff example for Python 3 in the Wiki

The page is not publicaly editable: https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs

import textwrap
import diff_match_patch

class DiffMatchPatch(diff_match_patch.diff_match_patch):

    def diff_prettyText(self, diffs):
        """Convert a diff array into a pretty Text report.
        Args:
          diffs: Array of diff tuples.
        Returns:
          Text representation.
        """
        results_diff = []

        def parse(sign):
            return "\n" if len(results_diff) else "", \
                    textwrap.indent( "%s" % text, sign, lambda line: True )

        # print(diffs)
        for (op, text) in diffs:

            if op == self.DIFF_INSERT:
                results_diff.append( "%s%s" % parse( "+ " ) )

            elif op == self.DIFF_DELETE:
                results_diff.append( "%s%s" % parse( "- " ) )

            elif op == self.DIFF_EQUAL:
                results_diff.append(textwrap.indent("%s" % text.lstrip('\n'), "  "))

        return "".join(results_diff)

expected = "Hello World"
actual = "Hi World"

diff_match = DiffMatchPatch()
diff_struct = diff_match.diff_linesToChars(expected, actual)

lineText1 = diff_struct[0] # .chars1
lineText2 = diff_struct[1] # .chars2
lineArray = diff_struct[2] # .lineArray

diffs = diff_match.diff_main( lineText1, lineText2, False )
diff_match.diff_charsToLines( diffs, lineArray )
diff_match.diff_cleanupSemantic( diffs )

# Prints:
# - Hello World
# + Hi World
print( diff_match.diff_prettyText(diffs) )

A more complex example from:

  1. https://stackoverflow.com/questions/52682351/how-to-wrap-correctly-the-unit-testing-diff
import re
import textwrap
import diff_match_patch

class DiffMatchPatch(diff_match_patch.diff_match_patch):

    def diff_prettyText(self, diffs):
        """Convert a diff array into a pretty Text report.
        Args:
          diffs: Array of diff tuples.
        Returns:
          Text representation.
        """
        results_diff = []
        cut_next_new_line = [False]
        # print('\ndiffs:\n%s\n' % diffs)

        operations = (self.DIFF_INSERT, self.DIFF_DELETE)

        def parse(sign):
            # print('new1:', text.encode( 'ascii' ))

            if text:
                new = text

            else:
                return ''

            new = textwrap.indent( "%s" % new, sign, lambda line: True )

            # force the diff change to show up on a new line for highlighting
            if len(results_diff) > 0:
                new = '\n' + new

            if new[-1] == '\n':

                if op == self.DIFF_INSERT and next_text and new[-1] == '\n' and next_text[0] == '\n':
                    cut_next_new_line[0] = True;

                    # Avoids a double plus sign showing up when the diff has the element (1, '\n')
                    if len(text) > 1: new = new + '%s\n' % sign

            elif next_op not in operations and next_text and next_text[0] != '\n':
                new = new + '\n'

            # print('new2:', new.encode( 'ascii' ))
            return new

        for index in range(len(diffs)):
            op, text = diffs[index]
            if index < len(diffs) - 1: 
                next_op, next_text = diffs[index+1]
            else:
                next_op, next_text = (0, "")

            if op == self.DIFF_INSERT:
                results_diff.append( parse( "+ " ) )

            elif op == self.DIFF_DELETE:
                results_diff.append( parse( "- " ) )

            elif op == self.DIFF_EQUAL:
                # print('new3:', text.encode( 'ascii' ))
                text = textwrap.indent(text, "  ")

                if cut_next_new_line[0]:
                    cut_next_new_line[0] = False
                    text = text[1:]

                results_diff.append(text)
                # print('new4:', text.encode( 'ascii' ))

        return "".join(results_diff)

    def diff_linesToWords(self, text1, text2, delimiter=re.compile('\n')):
        """
            Split two texts into an array of strings.  Reduce the texts to a string
            of hashes where each Unicode character represents one line.

            95% of this function code is copied from `diff_linesToChars` on:
                https://github.com/google/diff-match-patch/blob/895a9512bbcee0ac5a8ffcee36062c8a79f5dcda/python3/diff_match_patch.py#L381

            Copyright 2018 The diff-match-patch Authors.
            https://github.com/google/diff-match-patch
            Licensed under the Apache License, Version 2.0 (the "License");
            you may not use this file except in compliance with the License.
            You may obtain a copy of the License at
              http://www.apache.org/licenses/LICENSE-2.0

            Args:
                text1: First string.
                text2: Second string.
                delimiter: a re.compile() expression for the word delimiter type

            Returns:
                Three element tuple, containing the encoded text1, the encoded text2 and
                the array of unique strings.  The zeroth element of the array of unique
                strings is intentionally blank.
        """
        lineArray = []  # e.g. lineArray[4] == "Hello\n"
        lineHash = {}   # e.g. lineHash["Hello\n"] == 4

        # "\x00" is a valid character, but various debuggers don't like it.
        # So we'll insert a junk entry to avoid generating a null character.
        lineArray.append('')

        def diff_linesToCharsMunge(text):
            """Split a text into an array of strings.  Reduce the texts to a string
            of hashes where each Unicode character represents one line.
            Modifies linearray and linehash through being a closure.
            Args:
                text: String to encode.
            Returns:
                Encoded string.
            """
            chars = []
            # Walk the text, pulling out a substring for each line.
            # text.split('\n') would would temporarily double our memory footprint.
            # Modifying text would create many large strings to garbage collect.
            lineStart = 0
            lineEnd = -1
            while lineEnd < len(text) - 1:
                lineEnd = delimiter.search(text, lineStart)

                if lineEnd:
                    lineEnd = lineEnd.start()

                else:
                    lineEnd = len(text) - 1

                line = text[lineStart:lineEnd + 1]

                if line in lineHash:
                    chars.append(chr(lineHash[line]))
                else:
                    if len(lineArray) == maxLines:
                        # Bail out at 1114111 because chr(1114112) throws.
                        line = text[lineStart:]
                        lineEnd = len(text)
                    lineArray.append(line)
                    lineHash[line] = len(lineArray) - 1
                    chars.append(chr(len(lineArray) - 1))
                lineStart = lineEnd + 1
            return "".join(chars)

        # Allocate 2/3rds of the space for text1, the rest for text2.
        maxLines = 666666
        chars1 = diff_linesToCharsMunge(text1)
        maxLines = 1114111
        chars2 = diff_linesToCharsMunge(text2)
        return (chars1, chars2, lineArray)

def myCoolDiff(diffMode, expected, actual):
    """
        `diffMode` whether `characters diff=0`, `words diff=1` or `lines diff=2` will be used.
    """
    diff_match = DiffMatchPatch()

    if diffMode == 0:
        diffs = diff_match.diff_main(expected, actual)

    else:
        diff_struct = diff_match.diff_linesToWords(expected, actual,
                re.compile(r'\b') if diffMode == 1 else re.compile(r'\n') )

        lineText1 = diff_struct[0] # .chars1;
        lineText2 = diff_struct[1] # .chars2;
        lineArray = diff_struct[2] # .lineArray;

        diffs = diff_match.diff_main(lineText1, lineText2, False);
        diff_match.diff_charsToLines(diffs, lineArray);
        diff_match.diff_cleanupSemantic(diffs)

    return diff_match.diff_prettyText(diffs)

expected = "1. Duplicated target language name defined in your grammar on: [@-1,63:87='Abstract Machine Language'<__ANON_3>,3:19]\n" \
        "2. Duplicated master scope name defined in your grammar on: [@-1,138:147='source.sma'<__ANON_3>,5:20]"

actual = "1. Duplicated target language name defined in your grammar on: free_input_string\n" \
        "  text_chunk_end  Abstract Machine Language\n" \
        "\n" \
        "2. Duplicated master scope name defined in your grammar on: free_input_string\n" \
        "  text_chunk_end  source.sma" \

diffsCharacthers = myCoolDiff( 0, expected, actual )
diffsWords = myCoolDiff( 1, expected, actual )
diffsLines = myCoolDiff( 2, expected, actual )

print( "Characters diff: \n%s" % diffsCharacthers )
print( "\nWords diff: \n%s" % diffsWords )
print( "\nLines diff: \n%s" % diffsLines )

image

cleanupSemantic doesnt return anything

The last part of the example doesn't work because cleanupSemantic doesn't return the result.

I looked at the code and indeed the function doesn't return anything.

MS Word v/s diff-match-patch comparison

Hello,
I’m trying to compare two pieces of strings using this library. However, in some cases the returned diff is not as expected. Here’s an illustration: I compared two strings using MS Word. I created string 2 by removing a sentence from string 1 and adding another sentence to it. Comparison in MS Word correctly reflects the change as strike through/underline for removed/added text. See Fig 1 in attachment

Using diff_match_patch, I got a different comparison. See Fig. 2. Here the deletion is reflected as two separate changes and addition as another two changes. This was when I used semantic cleanup.
I could get the expected result by using efficiency based cleanup (edit cost = 25) as shown in Fig. 3. However, I believe setting edit cost to 25 will also not give me desired results when there are large changes between two strings. Is there a way to achieve results like MS word for any length of string with any number of changes made in string 2? Will it be by setting a large value to edit cost?
What is the max value supported ?

Diff Issue.docx
[text used for comparison is included in attachment]

Thanks
VinayakB

Diff'ing bytes

Is it possible to diff bytes in the Python3 library. I'm not able to.

jhogan@bastion:/usr/lib/python3/dist-packages/diff_match_patch$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from diff_match_patch import diff_match_patch
>>> from uuid import uuid4
>>> dmp = diff_match_patch()
>>> dmp.diff_main(uuid4().bytes, uuid4().bytes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 136, in diff_main
    self.diff_cleanupMerge(diffs)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 936, in diff_cleanupMerge
    text_delete += diffs[pointer][1]
TypeError: Can't convert 'bytes' object to str implicitly

Note, I'm using the Ubuntu version of the software (the python3-diff-match-patch package). Also note that that the stack trace can be different but it always causes an exception on the same line. For example:

>>> dmp.diff_main(uuid4().bytes, uuid4().bytes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 129, in diff_main
    diffs = self.diff_compute(text1, text2, checklines, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 196, in diff_compute
    return self.diff_bisect(text1, text2, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 351, in diff_bisect
    return self.diff_bisectSplit(text1, text2, x1, y1, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 377, in diff_bisectSplit
    diffs = self.diff_main(text1a, text2a, False, deadline)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 136, in diff_main
    self.diff_cleanupMerge(diffs)
  File "/usr/lib/python3/dist-packages/diff_match_patch/diff_match_patch.py", line 936, in diff_cleanupMerge
    text_delete += diffs[pointer][1]

I assume this is because each call is using random input (UUID's)

Is the solution to convert the binary data to strings first:

dmp.diff_main(str(uuid4().bytes), str(uuid4().bytes))

Compatibility of the diff-match-patch

Have a couple of questions with respect to library

  1. Does it work with IE 11, Edge, Chrome?
  2. Does it support rich text comparison with non-English languages - German etc. If so is there a list of languages it supports.
  3. Does it support comparison if the rich text that i am comparing has tags within it ?
  4. The text that i am getting from the database has html in it <textarea> this is my name </textarea> . It has to be compared with <textarea> this is not my name and it is not library</textarea>. The comparison should compare only the text piece if it.

The result of the patch is strange.

image

I want the result of <div>ba</div> when I apply the above patch to <div>b</div>.

However, the result only displays <div>a</div> as shown below.

Is this correct?

image

Demo not updated to latest version

Hello, thank you very much for this tool and demo.
The current demo version does not support the most recent version compressed available in the master tree.

Maven support for Java module

Is adding support for Maven for Java module is in scope for this project? That should greatly help adopt this library easily by users.

I can help contribute the initial version if you going to PR.

del and ins tag are not recognized by Swing components

Hello,

When using diff_prettyHtml it appears that the generated html is not recognized by Swing components.

I fixed it by replacing del and ins tag by span and adding the style attibutes text-decoration:underline when adding and text-decoration:line-through when deleting.

Here is the complete code:

case INSERT:
        html.append("<span style=\"background-color:#e6ffe6;text-decoration:underline\">")
            .append(text)
            .append("</span>");
        break;
case DELETE:
        html.append("<span style=\"background-color:#ffe6e6;text-decoration:line-through\">")
            .append(text)
            .append("</span>");
        break;

I hope this will help people using your awesome project.

Undefined var in Javascript code

Hi!
I'm using the current version of your Javascript code and the diff function causes a

diffs[j][1] = text.join('');
^

TypeError: Cannot set property '1' of undefined
at diff_match_patch.diff_charsToLines_ (diff_match_patch.js:539:17)

This is a sample code to reproduce the error in Node.js

        var dm = new DiffMatchPatch();
        var semantic=false,efficiency=false;
        
        var ms_start = (new Date()).getTime();

        var t1 = 
        "Let's enjoy right here where we at\n" + 
        "Who knows where this road is supposed to lead\n" + 
        "We got nothin' but time\n" + 
        "As long as you're right here next to me\n" + 
        "Everything's gonna be alright\n";


        var t2 = 
        "Let's enjoy right here where we at\n" + 
        "Who knows where this road supposed to lead?\n" + 
        "We got nothin' but time\n" + 
        "As long as you're right here next to me\n" + 
        "Everythin's gonna be alright\n" + 
        "If it's meant to be, it'll be, it'll be\n";


        var d = dm.diff_main(t1, t2);
        var ms_end = (new Date()).getTime();

        if (semantic) {
            dm.diff_cleanupSemantic(d);
        }
        else if (efficiency) {
            dm.diff_cleanupEfficiency(d);
        }

        var ds = dm.diff_prettyHtml(d);
        console.log(d);
        console.log(ds);

This seems to be the same as in #19

Javascript line diff breaks beyond 65K lines

I try using The google diff-match-path library from nodejs for line diffs:
https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs. I get wrong patches when in sum the lines of both inputs goes beyond 65,536 (2^16) lines.

Is that a bug (in my code or diff-match-patch), or am I hitting a known limitation of javascript/nodejs? Anything I can do to use d-m-p with larger files?

This script reproduces the problem

var diff_match_patch = require("diff-match-patch")

// function copied from google wiki 
// https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs
function diff_lineMode(text1, text2) {
  var dmp = new diff_match_patch();
  var a = dmp.diff_linesToChars_(text1, text2);
  var lineText1 = a.chars1;
  var lineText2 = a.chars2;
  var lineArray = a.lineArray;
  var diffs = dmp.diff_main(lineText1, lineText2, false);
  dmp.diff_charsToLines_(diffs, lineArray);
  return diffs;
}

// reproduce problem by diffing string with many lines to "abcd"
for (let size = 65534; size < 65538; size += 1) {
  let text1 = "";
  for (let i = 0; i < size; i++) {
    text1 += i + "\n";
  }

  var patches = diff_lineMode(text1, "abcb")
  console.log("######## Size: " + size + ": patches " + patches.length)
  for (let i = 0; i < patches.length; i++) {
    // patch[0] is action, patch[1] is value
    var action = patches[i][0] < 0 ? "remove" : (patches[i][0] > 0 ? "add" : "keep")
    console.log("patch" + i + ": " + action + "\n" + patches[i][1].substring(0, 10))
  }
}

Giving these outputs (using substring in code above to shorten outputs):

######## Size: 65534: patches 2
patch0: remove
0
1
2
3
4

patch1: add
abcb
######## Size: 65535: patches 2
patch0: remove
0
1
2
3
4

patch1: add

######## Size: 65536: patches 2
patch0: keep
0

patch1: remove
1
2
3
4
5

######## Size: 65537: patches 3
patch0: remove
0

patch1: keep
1

patch2: remove
2
3
4
5
6

Using

$ node --version v6.3.1
cat package.json
{
  "name": "dmp_bug",
  "version": "1.0.0",
  "description": "reproduce issue with diff match patch",
  "main": "dmpbug.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "diff-match-patch": "^1.0.4"
  }
}

JavaScript: Diff is not iterable

There was a change in JS version cd60d24#diff-5270d640a6c9c1b0590326b029d71ec8R76 from plain Array to a diff_match_patch.Diff Object that's trying to emulate Array.

The new object is not iterable, which messes up for example with Array destructing:

const a = dmp.diff_main('abc', 'ab123c', false);
const [eq, str] = a[0]; // => Uncaught TypeError: a[0] is not iterable
  1. was this change necessary? Tested that plain array works just fine with current version
  2. To really emulate array here, adding [Symbol.iterator] would do the trick, but its browser support is questionable

Continuous Integration

It would help if we sets up continuous integration, to make sure that pull requests won’t break anything.

%0A in output

I encountered this in Java (using patch_toText(patch_make(...)), but I can see it on the JavaScript demo page, too, and I saw a past Google Code discussion touch on it for a Python usage, plus some Go code working around it.

Enter the following into the demo page:

Old version:

a
a

New version:

a
a
a

(I entered both with trailing newlines, but the behavior is similar without.)

Compute Patch gives:

@@ -1,4 +1,6 @@
 a%0Aa%0A
+a%0A

I would have expected something more like:

$ diff -u <(echo a; echo a) <(echo a; echo a; echo a) | tail -n +3
@@ -1,2 +1,3 @@
 a
 a
+a

(Hmm, I notice only now that diff-match-match disagrees with diff -u about the number of lines involved. (At least, I think that's what the ,4/,6 and ,2/,3 are?))

As a workaround, I can replace %0A sequences with the empty string, but as best I can tell, even that will leave various other characters escaped.

I'm a little confused as to why URL encoding comes into this at all. I would expect for Patch.toString() to avoid all that, instead looking like this:

        text.append(aDiff.text.replaceAll("\n", ""))                                                                                                          
            .append("\n");                    

(It seems strange that I need to remove internal \n characters at all. I haven't dug into it.)

Line Diff causes an exception

Running the example usage from here on the javascript version:

  const text1="this\n is\n a\n test\n";
  const text2="this\n wasn't\n a\n test\n";
  // Example code "as is":
  var dmp = new diff_match_patch();
  var a = dmp.diff_linesToChars_(text1, text2);
  var lineText1 = a[0];
  var lineText2 = a[1];
  var lineArray = a[2];
  var diffs = dmp.diff_main(lineText1, lineText2, false);
  dmp.diff_charsToLines_(diffs, lineArray);

Causes this exception:

Uncaught Error: Null input. (diff_main)
at diff_match_patch.diff_main (:2:190)
at :6:19

C# port API refactoring (Renaming)

If PR #24 makes sense, I would like to continue and make a refactoring of C# API.

First of all it would be great to do something like grand renaming and change all names of methods and classes to C# standardized style: e.g. diff_main -> DiffMain with appropriate method documentation features, because for now it looks like a direct port from Java.

Does it make sense?

Reverse/Unpatch capability

Is there any possibility of reversing a diff/patch, or unpatch, to go backwards?

For example:

LinkedList<DiffMatchPatch.Diff> diffA_B = diffMatchPatch.diffMain(a, b);
LinkedList<DiffMatchPatch.Patch> patchA_B = diffMatchPatch.patchMake(a, diffA_B);
String rebuiltA = (String) diffMatchPatch.patchApply(patchBToA, b)[0];

// unapply
String rebuiltB = (String) diffMatchPath.unpatch(patchBToA, rebuiltA)[0];

From what I can gather, there may not be enough information in the diff/patch to unapply, but I was hoping to see if this was suggested in the past or if there is an alternate approach to reversing a diff/patch without having to create a new diff/patch from B -> A.

Thanks!

Would you welcome a typescript version?

Right now I'm working with a vendorized copy of diff-match-patch as part of CoCalc's code base. I'm considering converting diff-match-patch to typescript as the rest of cocalc's repository is switching over to typescript.

However, I view the source of truth for the implementation to be here in this repository (just like the npm package for diff-match-patch from @JackuB et. al. defers to here). Instead of modifying it "internally", I'd like to contribute upstream here.

I would assume we'd still make the compile target match what's documented in the wiki:

it works in Netscape 4, Internet Explorer 5.5, and all browsers released since the year 2000

As part of this, I'd keep it closely aligned with @types/diff-match-patch.

Port C++ code to use Qt5 and its QStringRef

Qt4 is no longer supported by Qt.

Apart from that, porting to Qt5 should also improve the performance, as it has added QStringRef class and QString::leftRef, QString::midRef and QString::rightRef methods. I see that the code splits QStrings into substrings quite a lot, and it does so by making copies, but with QStringRef you can avoid making copies since QStringRef substrings reference the original QString.

Extension to multi-diff

Dear all,

Thanks for sharing a source code text diff.

As API has been developed to identify differences among only two textual strings.

I would like to identify difference set, which is common among more than two textual strings.

Will it be possible? How can I extend it?

Please let me know about it.

Word level comparison instead of character level

For example:
String_1 : I included an apple in my basket.
String_2: I introduced apple to my basketball friend.

The resulted diffs are:
screen shot 2018-09-15 at 5 37 27 pm

I wonder if there are ways to mark the word "included" entirely as deletion and then "introduced" as an insertion, instead of breaking them up into differing or non-differing characters.

related to extra character ¶...

In my requirement I don’t want to display this "¶"extra character.

Based on my findings I found that specific code has been placed in diff_match_patch_uncompressed.js file.

But I don't want to display it during comparison, So can I remove it from file or it is placed purpose fully.

patch format doubt

When I generate a patch file in Linux Ubuntu, I get some cases:
@@ -12,12 +13,35 @@ OBJS =
or @@ ... @@ ..\n
the second @@ has some chars after it sometimes.

Is the patch file wrong? But I generate it in Linux System using diff command.
if it is correct, these code don't support the case:
diff_match_patch.java:2225

    Pattern patchHeader
        = Pattern.compile("^@@ -(\\d+),?(\\d*) \\+(\\d+),?(\\d*) @@$");

diff doesn't consider \n as part of diff from where the text is removed

Hi,

I am using diff_match_patch cpp version and running into an issue where the API returns wrong diff between two texts if a line is deleted and that line had an empty line after it.
I am enclosing a screen shot of the wrong diff given by API in diff demo. The diff doesn't represent the correct LF removed.

I would appreciate if someone can take a look.

Thanks.

Example:

Text1:
Aa
Bb

Cc
Dd

Text2:
Aa

Cc
Dd

diffmatchpatch_issue

Possible to get diff and use diff2html

I tried use diff2html render diff result but sometimes the result is not correct,
Is this possible?
https://github.com/rtfpessoa/diff2html

       let patchUntil = new diff_match_patch();
        let diffMain = patchUntil.diff_main(firstDescribe, secondDescribe);
        patchUntil.diff_cleanupSemantic(diffMain);
        let patchMake = patchUntil.patch_make(firstDescribe, secondDescribe);
        let patchToText = patchUntil.patch_toText(patchMake);
        // let patches = patchUntil.patch_fromText(patchToText);
        // let app=patchUntil.patch_apply(patches, firstDescribe);
        let strInput = "--- compare\n+++ compare\n" + patchToText;

        let outputHtml = Diff2Html.getPrettyHtml(strInput, {
          inputFormat: "diff",
          outputFormat: this.outputFormat,
          matching: "lines"
        });
        this.outputHtml = outputHtml;

Nuget

Good afternoon,

I'd love to be able to pull in the C# version of this library into a project I am working on, but see no way to do that via the package ecosystem of .NET/Visual Studio (e.g. Nuget.) Would the recommended way currently be to simply clone this repo and pull down the C# file we need?

dart version have something wrong, please fix it...

like
"message": "'Diff.==' ('(Diff) → bool') isn't a valid override of 'Object.==' ('(dynamic) →

"message": "A value of type 'Operation' can't be assigned to a variable of type 'int'.",
"startColumn": 27,
"endColumn": 54,

...

Diffing sequence of tokens instead sequence of characters

I need to diff a sequence of tokens instead of a sequence of characters. Is there a solution for it?
So basically every token is either a sequence of characters, or some object, it doesn't matter, as long as you can compare two tokens and say if they are equal or not.

So I need the same solution of diff-match-patch, but a more generalized one: instead of restricting the algorithm to work on a sequence of characters, to work on a sequence of any kind of (comparable) objects.

Is the python difflib library as efficient as diff-match-patch?

Avoid LinkedList in Java implementation

diff_match_patch.java uses LinkedList in a number of places, but benchmarks suggest this is likely less performant than ArrayList or other alternatives. ErrorProne goes so far as to consider it obsolete.

Unfortunately some usages of LinkedList are surfaced in the API itself, so they cannot safely be changed to ArrayList or List, but we can introduce siblings that don't use LinkedList and (optionally) deprecate the existing methods.

How to get char positions of changed text in the original text

I want to get char positions of changed text along with change tuples. Please help if anyone implemented this?

Example in Python:

import diff_match_patch as dmp_module

dmp = dmp_module.diff_match_patch()
diff = dmp.diff_main("Hello World.", "Goodbye World.")
# Result: [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")]
#Desired Result:[(-1,"Hell",0,3,x,x),(0,"o",4,4,1,1),(1,"odbye",x,x,2,6),(0," World",6,11,7,13)]

Here in Desired Result "x" means don't care in "delete" and "insertion" operations.

One way I thought about this is, get the changed text from tuple and find index from original text(this will be starting position") and add length of the changed text to index found, but i think this will fail in case of repeated substring. If implementation is available in other languages also will be helpful for me.

Thanks.

Incorrect results after applying patch

Hello,
I observe reproducible error: after applying patch result string is not same as one which was used creating the patch.
Setup: I have 2 JS files (bundles built with Metro Bundler from ReactNative). File2 is next version of File1 (couple of words changed in src file).
I create patch with Python3 implementation of diff-match-patch, then I serialize it to string.
Then, patch is red from string with Java and I'm applying patch with Java implementation of diff-match-patch.
Expected result: after applying patch to File1 result string will be equal to File2.
Actual result: there is a difference in 1 symbol.
Test project can be found here.

No response for diff_charsToLines_

Try this sample and change path of diff_match_patch.js

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
  <TITLE>Diff, Match and Patch: Demo of Diff</TITLE>
  <SCRIPT SRC="./diff_match_patch.js"></SCRIPT>
</HEAD>

<BODY>
<H1>Diff, Match and Patch</H1>
<H2>Demo of Diff</H2>

<P>Diff takes two texts and finds the differences.  This implementation works on a character by character basis.
The result of any diff may contain 'chaff', irrelevant small commonalities which complicate the output.
A post-diff cleanup algorithm factors out these trivial commonalities.</P>

<SCRIPT>
var dmp = new diff_match_patch();

function launch() {
  var text1 = document.getElementById('text1').value;
  var text2 = document.getElementById('text2').value;
  dmp.Diff_Timeout = parseFloat(document.getElementById('timeout').value);
  dmp.Diff_EditCost = parseFloat(document.getElementById('editcost').value);

  var ms_start = (new Date()).getTime();
      var a = dmp.diff_linesToChars_(text1, text2);
      var lineText1 = a.chars1;
      var lineText2 = a.chars2;
      var lineArray = a.lineArray;
  var d = dmp.diff_main(text1, text2, false);
      dmp.diff_charsToLines_(d, lineArray);
  var ms_end = (new Date()).getTime();

  if (document.getElementById('semantic').checked) {
    dmp.diff_cleanupSemantic(d);
  }
  if (document.getElementById('efficiency').checked) {
    dmp.diff_cleanupEfficiency(d);
  }
    alert(d);
  /*var ds = dmp.diff_prettyHtml(d);
  document.getElementById('outputdiv').innerHTML = ds + '<BR>Time: ' + (ms_end - ms_start) / 1000 + 's';*/
}
</SCRIPT>

<FORM action="#" onsubmit="return false">
<TABLE WIDTH="100%"><TR>
  <TD WIDTH="50%">
<H3>Text Version 1:</H3>
<TEXTAREA ID="text1" STYLE="width: 100%" ROWS=10>I am the very model of a modern Major-General,
I've information vegetable, animal, and mineral,
I know the kings of England, and I quote the fights historical,
From Marathon to Waterloo, in order categorical.</TEXTAREA></TD>
  <TD WIDTH="50%">
<H3>Text Version 2:</H3>
<TEXTAREA ID="text2" STYLE="width: 100%" ROWS=10>I am the very model of a cartoon individual,
My animation's comical, unusual, and whimsical,
I'm quite adept at funny gags, comedic theory I have read,
From wicked puns and stupid jokes to anvils that drop on your head.</TEXTAREA></TD>
</TR></TABLE>

<H3>Diff timeout:</H3>
<P><INPUT TYPE="text" SIZE=3 MAXLENGTH=5 VALUE="1" ID="timeout"> seconds<BR>
If the mapping phase of the diff computation takes longer than this, then the computation
is truncated and the best solution to date is returned.  While guaranteed to be correct,
it may not be optimal.  A timeout of '0' allows for unlimited computation.</P>

<H3>Post-diff cleanup:</H3>
<DL>
<DT><INPUT TYPE="radio" NAME="cleanup" ID="semantic" CHECKED>
<LABEL FOR="semantic">Semantic Cleanup</LABEL></DT>
<DD>Increase human readability by factoring out commonalities which are likely to be
coincidental.</DD>
<DT><INPUT TYPE="radio" NAME="cleanup" ID="efficiency">
<LABEL FOR="efficiency">Efficiency Cleanup</LABEL>,
edit cost: <INPUT TYPE="text" SIZE=3 MAXLENGTH=5 VALUE="4" ID="editcost">
<DD>Increase computational efficiency by factoring out short commonalities which are
not worth the overhead.  The larger the edit cost, the more agressive the cleanup.</DD>
<DT><INPUT TYPE="radio" NAME="cleanup" ID="raw">
<LABEL FOR="raw">No Cleanup</LABEL></DT>
<DD>Raw output.</DD>
</DL>

<P><INPUT TYPE="button" onClick="launch()" VALUE="Compute Diff"></P>
</FORM>

<DIV ID="outputdiv"></DIV>

<HR>
Back to <A HREF="https://github.com/google/diff-match-patch">Diff, Match and Patch</A>

</BODY>
</HTML>

Did I missed anything?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.