sergeyt / parse-diff Goto Github PK

View Code? Open in Web Editor NEW

84.0 4.0 68.0 768 KB

Unified diff parser for nodejs and browser

License: MIT License

JavaScript 100.00%

parse-diff's Introduction

parse-diff

Simple unified diff parser for JavaScript

JavaScript Usage Example

var parse = require('parse-diff');
var diff = ''; // input diff string
var files = parse(diff);
console.log(files.length); // number of patched files
files.forEach(function(file) {
	console.log(file.chunks.length); // number of hunks
	console.log(file.chunks[0].changes.length) // hunk added/deleted/context lines
	// each item in changes is a string
	console.log(file.deletions); // number of deletions in the patch
	console.log(file.additions); // number of additions in the patch
});

parse-diff's People

Contributors

Stargazers

Watchers

parse-diff's Issues

Meteor package

package should work everywhere on server and client

Newline message are counted in linenumbers

Hi @sergeyt

Thanks for creating this library!

I've noticed that if you have a diff like this:

Index: file.txt
===================================================================
--- file.txt
+++ file.txt
@@ -1,3 +1,2 @@
 Paragraph1
+Paragraph2
\ No newline at end of file
-Paragraph2
-Paragraph3
\ No newline at end of file

The newline messages will increment the linecount for ln, ln1 and ln2 variables, even though it's not a valid line. I have a tiny patch to fix this, but I wanted to let you know.

Issue handling a binary change with spaces and quotes in name

Hi there! Thanks for your work on this, I've just had to put down work on this bug for the second, so I thought I'd leave breadcrumbs for me/someone next to pick up. Basically the from and to aren't set correctly when the filename has a few different attributes which I'm seeing in danger/danger-js#807

Here's a failing test for it:

	it 'should parse file names for changed binaries with spaces in their names', ->
		diff = """
diff --git a/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]
index fc72ba34b..ec373e9a4 100644
Binary files a/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] and b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] differ
"""
		files = parse diff
		expect(files.length).to.be(1)
		file = files[0]
		expect(file.from).to.be("Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]")
		expect(file.to).to.be("Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]")

Fails with:

  1) diff parser
       should parse file names for changed binaries with spaces in their names :
-     Error: expected 'Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\' by default as [email protected] b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\'' to equal 'Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\' by default as [email protected]'

I think a regex is hitting the ' and calling it the end of the day somewhere for the file 😄

Feature Request: rename parsing

Hey so I was wondering if I can add rename parsing to the tool?

I was thinking we can read rename ... lines and set a boolean files.renamed to true. Willing to make the PR.

Perhaps also a field for the similarity index?

Something along the lines of adding this test:

  it("should parse rename diff", function () {
    const diff = `\
diff --git a/test.txt b/text2.txt
similarity index 100%
rename from test.txt
rename to text2.txt
`;
    const files = parse(diff);
    expect(files.length).toBe(1);
    const [file] = files;
    expect(file.from).toBe("test.txt");
    expect(file.to).toBe("test2.txt");
    expect(file.chunks.length).toBe(0);
    expect(file.renamed).toBe(true);
  });

If a hunk contains just one line, only its start line number appears. Otherwise its line numbers look like ‘start,count’

Parsing this single line diff:

diff --git a/file1 b/file1
new file mode 100644
index 0000000..db81be4
--- /dev/null
+++ b/file1
@@ -0,0 +1 @@
+line1

produces the following result:

[
  {
    "lines": [
      {
        "type": "normal",
        "normal": true,
        "ln1": 0,
        "ln2": 0,
        "content": "@@ -0,0 +1 @@"
      },
      {
        "type": "add",
        "add": true,
        "ln": 1,
        "content": "+line1`"
      }
    ],
    "deletions": 0,
    "additions": 1,
    "new": true,
    "index": [
      "0000000..db81be4"
    ],
    "from": "\/dev\/null" ,
    "to": "file1"
  }
]

The first line should be a chunk line instead of a normal line:

[
  {
    "lines": [
      {
        "type": "chunk",
        "chunk": true,
        "content": "@@ -0,0 +1 @@"
      },
      {
        "type": "add",
        "add": true,
        "ln": 1,
        "content": "+line1`"
      }
    ],
    "deletions": 0,
    "additions": 1,
    "new": true,
    "index": [
      "0000000..db81be4"
    ],
    "from": "\/dev\/null" ,
    "to": "file1"
  }
]

Adjusting the chunk regex to handle single line formats should fix the issue.

Report binary diff

If a diff contains binary files, it produces messages like:

diff --git a/screenshots/split-view.png b/screenshots/split-view.png
index 1c352c2..e1fb381 100644
Binary files a/screenshots/split-view.png and b/screenshots/split-view.png differ

It could be nice if parse-diff gives a binary: true flag in file object

Wrong result when paring some diff that contains deleted sql comment starting with `--`

Here is a test diff input to reproduce the bug

diff --git a/test.sql b/test.sql
index 305feaa..70865e7 100644
--- a/test.sql
+++ b/test.sql
@@ -1,10 +1,9 @@
--- Person
-create table person
-( id                   		bigint,
-  primary key ( id ) );
-
-
 -- Photo
 create table photo
 ( id                      bigint,
   primary key (id ) );
+
+-- Person
+create table person
+( id                   		bigint,
+  primary key ( id ) );

And the parsed result is

[
  {
    "chunks": [
      {
        "content": "@@ -1,10 +1,9 @@",
        "changes": [
          {
            "type": "del",
            "del": true,
            "ln": 1,
            "content": "-create table person"
          },
          {
            "type": "del",
            "del": true,
            "ln": 2,
            "content": "-( id                   \t\tbigint,"
          },
          {
            "type": "del",
            "del": true,
            "ln": 3,
            "content": "-  primary key ( id ) );"
          },
          {
            "type": "del",
            "del": true,
            "ln": 4,
            "content": "-"
          },
          {
            "type": "del",
            "del": true,
            "ln": 5,
            "content": "-"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 6,
            "ln2": 1,
            "content": " -- Photo"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 7,
            "ln2": 2,
            "content": " create table photo"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 8,
            "ln2": 3,
            "content": " ( id                      bigint,"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 9,
            "ln2": 4,
            "content": "   primary key (id ) );"
          },
          {
            "type": "add",
            "add": true,
            "ln": 5,
            "content": "+"
          },
          {
            "type": "add",
            "add": true,
            "ln": 6,
            "content": "+-- Person"
          },
          {
            "type": "add",
            "add": true,
            "ln": 7,
            "content": "+create table person"
          },
          {
            "type": "add",
            "add": true,
            "ln": 8,
            "content": "+( id                   \t\tbigint,"
          },
          {
            "type": "add",
            "add": true,
            "ln": 9,
            "content": "+  primary key ( id ) );"
          }
        ],
        "oldStart": 1,
        "oldLines": 10,
        "newStart": 1,
        "newLines": 9
      }
    ],
    "deletions": 0,
    "additions": 0,
    "from": "test.sql",
    "to": "test.sql",
    "index": [
      "305feaa..70865e7",
      "100644"
    ]
  },
  {
    "chunks": [],
    "deletions": 5,
    "additions": 5,
    "from": "Person"
  }
]

which is wrong.

Move to JavaScript?

Here's a JS version powered by decaffeinate. It would need some manual cleanup.

// parses unified diff
// http://www.gnu.org/software/diffutils/manual/diffutils.html#Unified-Format
export default function(input) {
    if (!input) { return []; }
    if (input.match(/^\s+$/)) { return []; }

    let lines = input.split('\n');
    if (lines.length === 0) { return []; }

    let files = [];
    let file = null;
    let ln_del = 0;
    let ln_add = 0;
    let current = null;

    let start = function(line) {
        file = {
            chunks: [],
            deletions: 0,
            additions: 0
        };
        files.push(file);

        if (!file.to && !file.from) {
            let fileNames = parseFile(line);

            if (fileNames) {
                file.from = fileNames[0];
                return file.to = fileNames[1];
            }
        }
    };

    let restart = function() {
        if (!file || file.chunks.length) { return start(); }
    };

    let new_file = function() {
        restart();
        file.new = true;
        return file.from = '/dev/null';
    };

    let deleted_file = function() {
        restart();
        file.deleted = true;
        return file.to = '/dev/null';
    };

    let index = function(line) {
        restart();
        return file.index = line.split(' ').slice(1);
    };

    let from_file = function(line) {
        restart();
        return file.from = parseFileFallback(line);
    };

    let to_file = function(line) {
        restart();
        return file.to = parseFileFallback(line);
    };

    let chunk = function(line, match) {
        let oldStart;
        let newStart;
        ln_del = oldStart = +match[1];
        let oldLines = +(match[2] || 0);
        ln_add = newStart = +match[3];
        let newLines = +(match[4] || 0);
        current = {
            content: line,
            changes: [],
            oldStart, oldLines, newStart, newLines
        };
        return file.chunks.push(current);
    };

    let del = function(line) {
        current.changes.push({type:'del', del:true, ln:ln_del++, content:line});
        return file.deletions++;
    };

    let add = function(line) {
        current.changes.push({type:'add', add:true, ln:ln_add++, content:line});
        return file.additions++;
    };

    let noeol = '\\ No newline at end of file';
    let normal = function(line) {
        if (!file) { return; }
        return current.changes.push({
            type: 'normal',
            normal: true,
            ln1: line === noeol ? ln_del++ : undefined,
            ln2: line === noeol ? ln_add++ : undefined,
            content: line
        });
    };

    let schema = [
        // todo beter regexp to avoid detect normal line starting with diff
        [/^\s+/, normal],
        [/^diff\s/, start],
        [/^new file mode \d+$/, new_file],
        [/^deleted file mode \d+$/, deleted_file],
        [/^index\s[\da-zA-Z]+\.\.[\da-zA-Z]+(\s(\d+))?$/, index],
        [/^---\s/, from_file],
        [/^\+\+\+\s/, to_file],
        [/^@@\s+\-(\d+),?(\d+)?\s+\+(\d+),?(\d+)?\s@@/, chunk],
        [/^-/, del],
        [/^\+/, add]
    ];

    let parse = function(line) {
        for (let i = 0; i < schema.length; i++) {
            let p = schema[i];
            let m = line.match(p[0]);
            if (m) {
                p[1](line, m);
                return true;
            }
        }
        return false;
    };

    for (let i = 0; i < lines.length; i++) {
        let line = lines[i];
        parse(line);
    }

    return files;
};

var parseFile = function(s) {
    if (!s) { return; }

    let fileNames = s.split(' ').slice(-2);
    fileNames.map((fileName, i) => fileNames[i] = fileName.replace(/^(a|b)\//, ''));

    return fileNames;
};

// fallback function to overwrite file.from and file.to if executed
var parseFileFallback = function(s) {
    s = ltrim(s, '-');
    s = ltrim(s, '+');
    s = s.trim();
    // ignore possible time stamp
    let t = (/\t.*|\d{4}-\d\d-\d\d\s\d\d:\d\d:\d\d(.\d+)?\s(\+|-)\d\d\d\d/).exec(s);
    if (t) { s = s.substring(0, t.index).trim(); }
    // ignore git prefixes a/ or b/
    if (s.match((/^(a|b)\//))) { return s.substr(2); } else { return s; }
};

var ltrim = function(s, chars) {
    s = makeString(s);
    if (!chars && trimLeft) { return trimLeft.call(s); }
    chars = defaultToWhiteSpace(chars);
    return s.replace(new RegExp(`^${chars}+`), '');
};

var makeString = function(s) { if (s === null) { return ''; } else { return s + ''; } };

var { trimLeft } = String.prototype;

var defaultToWhiteSpace = function(chars) {
  if (chars === null) { return '\\s'; }
  if (chars.source) { return chars.source; }
  return `[${escapeRegExp(chars)}]`;
 };

var escapeRegExp = s => makeString(s).replace(/([.*+?^=!:${}()|[\]\/\\])/g, '\\$1');

No from, to when a new empty file is in the diff

When the diff consists of an empty, and new file, the resulting object does not have from, to properties that usually are there.

To reproduce, parse this diff:

diff --git a/a b/a
new file mode 100644
index 0000000..e69de29
diff --git a/a.s b/a.s
new file mode 100644
index 0000000..e69de29
diff --git a/a.txt b/a.txt
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/a.txt
@@ -0,0 +1 @@
+a

to get:

[ { chunks: [],
    deletions: 0,
    additions: 0,
    new: true,
    index: [ '0000000..e69de29' ] },
  { chunks: [],
    deletions: 0,
    additions: 0,
    new: true,
    index: [ '0000000..e69de29' ] },
  { chunks: [ [Object] ],
    deletions: 0,
    additions: 1,
    new: true,
    index: [ '0000000..7898192' ],
    from: '/dev/null',
    to: 'a.txt' }]

I think it makes sense for them to have from: '/dev/null' and to: fileName.

Should quotes in file names in ---/+++ lines be removed?

Ie. should this test pass?

  it("should parse diff with single line quote escaped file names", function () {
    const diff = `
diff --git "a/file \\"space\\"" "b/file \\"space\\""
index 9daeafb..88bd214 100644
--- "a/file \\"space\\""  
+++ "b/file \\"space\\""  
@@ -1 +1 @@
-test
+test\n1234
`;
    const files = parse(diff);
    expect(files.length).toBe(1);
    const [file] = files;
    expect(file.from).toBe(`file \\"space\\"`);
    expect(file.to).toBe(`file \\"space\\"`);
  });

Repro:

Running this script will generate the diff above.

#!/bin/bash

git init > /dev/null
echo test > "file \"space\""
git add . > /dev/null
git commit -m "add file" > /dev/null
git tag -a BEFORE -m "" > /dev/null
echo "test\n1234" > "file \"space\""
git add . > /dev/null
git commit -m "update file" > /dev/null
git diff BEFORE

ubunutu v20.04
node v14.16.0
npm v6.14.11
git v2.25.1

P.S. I am willing to make a PR to fix it (if it is a bug).