Git Product home page Git Product logo

parse-diff's Introduction

Build Status Total downloads

NPM

parse-diff

Simple unified diff parser for JavaScript

JavaScript Usage Example

var parse = require('parse-diff');
var diff = ''; // input diff string
var files = parse(diff);
console.log(files.length); // number of patched files
files.forEach(function(file) {
	console.log(file.chunks.length); // number of hunks
	console.log(file.chunks[0].changes.length) // hunk added/deleted/context lines
	// each item in changes is a string
	console.log(file.deletions); // number of deletions in the patch
	console.log(file.additions); // number of additions in the patch
});

parse-diff's People

Contributors

417-72ki avatar alikhamesy avatar bittrance avatar dab0mb avatar dependabot[bot] avatar fsahmad avatar just-boris avatar leon19 avatar lukasz-szulborski avatar mikedidomizio avatar mironiasty avatar scottopherson avatar sergeyt avatar sheerun avatar stevethomp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

parse-diff's Issues

Newline message are counted in linenumbers

Hi @sergeyt

Thanks for creating this library!

I've noticed that if you have a diff like this:

Index: file.txt
===================================================================
--- file.txt
+++ file.txt
@@ -1,3 +1,2 @@
 Paragraph1
+Paragraph2
\ No newline at end of file
-Paragraph2
-Paragraph3
\ No newline at end of file

The newline messages will increment the linecount for ln, ln1 and ln2 variables, even though it's not a valid line. I have a tiny patch to fix this, but I wanted to let you know.

Issue handling a binary change with spaces and quotes in name

Hi there! Thanks for your work on this, I've just had to put down work on this bug for the second, so I thought I'd leave breadcrumbs for me/someone next to pick up. Basically the from and to aren't set correctly when the filename has a few different attributes which I'm seeing in danger/danger-js#807

Here's a failing test for it:

	it 'should parse file names for changed binaries with spaces in their names', ->
		diff = """
diff --git a/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]
index fc72ba34b..ec373e9a4 100644
Binary files a/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] and b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected] differ
"""
		files = parse diff
		expect(files.length).to.be(1)
		file = files[0]
		expect(file.from).to.be("Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]")
		expect(file.to).to.be("Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects 'home' by default as [email protected]")

Fails with:

  1) diff parser
       should parse file names for changed binaries with spaces in their names :
-     Error: expected 'Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\' by default as [email protected] b/Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\'' to equal 'Artsy_Tests/ReferenceImages/ARTopMenuViewControllerSpec/selects \'home\' by default as [email protected]'

I think a regex is hitting the ' and calling it the end of the day somewhere for the file 😄

Feature Request: rename parsing

Hey so I was wondering if I can add rename parsing to the tool?

I was thinking we can read rename ... lines and set a boolean files.renamed to true. Willing to make the PR.

Perhaps also a field for the similarity index?

Something along the lines of adding this test:

  it("should parse rename diff", function () {
    const diff = `\
diff --git a/test.txt b/text2.txt
similarity index 100%
rename from test.txt
rename to text2.txt
`;
    const files = parse(diff);
    expect(files.length).toBe(1);
    const [file] = files;
    expect(file.from).toBe("test.txt");
    expect(file.to).toBe("test2.txt");
    expect(file.chunks.length).toBe(0);
    expect(file.renamed).toBe(true);
  });

`chunk` lines are parsed as `normal` when parsing diffs w/ single line files

chunk lines are parsed incorrectly when parsing diffs with single line files.

http://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed%20Unified

If a hunk contains just one line, only its start line number appears. Otherwise its line numbers look like ‘start,count’

Parsing this single line diff:

diff --git a/file1 b/file1
new file mode 100644
index 0000000..db81be4
--- /dev/null
+++ b/file1
@@ -0,0 +1 @@
+line1

produces the following result:

[
  {
    "lines": [
      {
        "type": "normal",
        "normal": true,
        "ln1": 0,
        "ln2": 0,
        "content": "@@ -0,0 +1 @@"
      },
      {
        "type": "add",
        "add": true,
        "ln": 1,
        "content": "+line1`"
      }
    ],
    "deletions": 0,
    "additions": 1,
    "new": true,
    "index": [
      "0000000..db81be4"
    ],
    "from": "\/dev\/null" ,
    "to": "file1"
  }
]

The first line should be a chunk line instead of a normal line:

[
  {
    "lines": [
      {
        "type": "chunk",
        "chunk": true,
        "content": "@@ -0,0 +1 @@"
      },
      {
        "type": "add",
        "add": true,
        "ln": 1,
        "content": "+line1`"
      }
    ],
    "deletions": 0,
    "additions": 1,
    "new": true,
    "index": [
      "0000000..db81be4"
    ],
    "from": "\/dev\/null" ,
    "to": "file1"
  }
]

Adjusting the chunk regex to handle single line formats should fix the issue.

Report binary diff

If a diff contains binary files, it produces messages like:

diff --git a/screenshots/split-view.png b/screenshots/split-view.png
index 1c352c2..e1fb381 100644
Binary files a/screenshots/split-view.png and b/screenshots/split-view.png differ

It could be nice if parse-diff gives a binary: true flag in file object

Wrong result when paring some diff that contains deleted sql comment starting with `--`

Here is a test diff input to reproduce the bug

diff --git a/test.sql b/test.sql
index 305feaa..70865e7 100644
--- a/test.sql
+++ b/test.sql
@@ -1,10 +1,9 @@
--- Person
-create table person
-( id                   		bigint,
-  primary key ( id ) );
-
-
 -- Photo
 create table photo
 ( id                      bigint,
   primary key (id ) );
+
+-- Person
+create table person
+( id                   		bigint,
+  primary key ( id ) );

And the parsed result is

[
  {
    "chunks": [
      {
        "content": "@@ -1,10 +1,9 @@",
        "changes": [
          {
            "type": "del",
            "del": true,
            "ln": 1,
            "content": "-create table person"
          },
          {
            "type": "del",
            "del": true,
            "ln": 2,
            "content": "-( id                   \t\tbigint,"
          },
          {
            "type": "del",
            "del": true,
            "ln": 3,
            "content": "-  primary key ( id ) );"
          },
          {
            "type": "del",
            "del": true,
            "ln": 4,
            "content": "-"
          },
          {
            "type": "del",
            "del": true,
            "ln": 5,
            "content": "-"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 6,
            "ln2": 1,
            "content": " -- Photo"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 7,
            "ln2": 2,
            "content": " create table photo"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 8,
            "ln2": 3,
            "content": " ( id                      bigint,"
          },
          {
            "type": "normal",
            "normal": true,
            "ln1": 9,
            "ln2": 4,
            "content": "   primary key (id ) );"
          },
          {
            "type": "add",
            "add": true,
            "ln": 5,
            "content": "+"
          },
          {
            "type": "add",
            "add": true,
            "ln": 6,
            "content": "+-- Person"
          },
          {
            "type": "add",
            "add": true,
            "ln": 7,
            "content": "+create table person"
          },
          {
            "type": "add",
            "add": true,
            "ln": 8,
            "content": "+( id                   \t\tbigint,"
          },
          {
            "type": "add",
            "add": true,
            "ln": 9,
            "content": "+  primary key ( id ) );"
          }
        ],
        "oldStart": 1,
        "oldLines": 10,
        "newStart": 1,
        "newLines": 9
      }
    ],
    "deletions": 0,
    "additions": 0,
    "from": "test.sql",
    "to": "test.sql",
    "index": [
      "305feaa..70865e7",
      "100644"
    ]
  },
  {
    "chunks": [],
    "deletions": 5,
    "additions": 5,
    "from": "Person"
  }
]

which is wrong.

Move to JavaScript?

Here's a JS version powered by decaffeinate. It would need some manual cleanup.

// parses unified diff
// http://www.gnu.org/software/diffutils/manual/diffutils.html#Unified-Format
export default function(input) {
    if (!input) { return []; }
    if (input.match(/^\s+$/)) { return []; }

    let lines = input.split('\n');
    if (lines.length === 0) { return []; }

    let files = [];
    let file = null;
    let ln_del = 0;
    let ln_add = 0;
    let current = null;

    let start = function(line) {
        file = {
            chunks: [],
            deletions: 0,
            additions: 0
        };
        files.push(file);

        if (!file.to && !file.from) {
            let fileNames = parseFile(line);

            if (fileNames) {
                file.from = fileNames[0];
                return file.to = fileNames[1];
            }
        }
    };

    let restart = function() {
        if (!file || file.chunks.length) { return start(); }
    };

    let new_file = function() {
        restart();
        file.new = true;
        return file.from = '/dev/null';
    };

    let deleted_file = function() {
        restart();
        file.deleted = true;
        return file.to = '/dev/null';
    };

    let index = function(line) {
        restart();
        return file.index = line.split(' ').slice(1);
    };

    let from_file = function(line) {
        restart();
        return file.from = parseFileFallback(line);
    };

    let to_file = function(line) {
        restart();
        return file.to = parseFileFallback(line);
    };

    let chunk = function(line, match) {
        let oldStart;
        let newStart;
        ln_del = oldStart = +match[1];
        let oldLines = +(match[2] || 0);
        ln_add = newStart = +match[3];
        let newLines = +(match[4] || 0);
        current = {
            content: line,
            changes: [],
            oldStart, oldLines, newStart, newLines
        };
        return file.chunks.push(current);
    };

    let del = function(line) {
        current.changes.push({type:'del', del:true, ln:ln_del++, content:line});
        return file.deletions++;
    };

    let add = function(line) {
        current.changes.push({type:'add', add:true, ln:ln_add++, content:line});
        return file.additions++;
    };

    let noeol = '\\ No newline at end of file';
    let normal = function(line) {
        if (!file) { return; }
        return current.changes.push({
            type: 'normal',
            normal: true,
            ln1: line === noeol ? ln_del++ : undefined,
            ln2: line === noeol ? ln_add++ : undefined,
            content: line
        });
    };

    let schema = [
        // todo beter regexp to avoid detect normal line starting with diff
        [/^\s+/, normal],
        [/^diff\s/, start],
        [/^new file mode \d+$/, new_file],
        [/^deleted file mode \d+$/, deleted_file],
        [/^index\s[\da-zA-Z]+\.\.[\da-zA-Z]+(\s(\d+))?$/, index],
        [/^---\s/, from_file],
        [/^\+\+\+\s/, to_file],
        [/^@@\s+\-(\d+),?(\d+)?\s+\+(\d+),?(\d+)?\s@@/, chunk],
        [/^-/, del],
        [/^\+/, add]
    ];

    let parse = function(line) {
        for (let i = 0; i < schema.length; i++) {
            let p = schema[i];
            let m = line.match(p[0]);
            if (m) {
                p[1](line, m);
                return true;
            }
        }
        return false;
    };

    for (let i = 0; i < lines.length; i++) {
        let line = lines[i];
        parse(line);
    }

    return files;
};

var parseFile = function(s) {
    if (!s) { return; }

    let fileNames = s.split(' ').slice(-2);
    fileNames.map((fileName, i) => fileNames[i] = fileName.replace(/^(a|b)\//, ''));

    return fileNames;
};

// fallback function to overwrite file.from and file.to if executed
var parseFileFallback = function(s) {
    s = ltrim(s, '-');
    s = ltrim(s, '+');
    s = s.trim();
    // ignore possible time stamp
    let t = (/\t.*|\d{4}-\d\d-\d\d\s\d\d:\d\d:\d\d(.\d+)?\s(\+|-)\d\d\d\d/).exec(s);
    if (t) { s = s.substring(0, t.index).trim(); }
    // ignore git prefixes a/ or b/
    if (s.match((/^(a|b)\//))) { return s.substr(2); } else { return s; }
};

var ltrim = function(s, chars) {
    s = makeString(s);
    if (!chars && trimLeft) { return trimLeft.call(s); }
    chars = defaultToWhiteSpace(chars);
    return s.replace(new RegExp(`^${chars}+`), '');
};

var makeString = function(s) { if (s === null) { return ''; } else { return s + ''; } };

var { trimLeft } = String.prototype;

var defaultToWhiteSpace = function(chars) {
  if (chars === null) { return '\\s'; }
  if (chars.source) { return chars.source; }
  return `[${escapeRegExp(chars)}]`;
 };

var escapeRegExp = s => makeString(s).replace(/([.*+?^=!:${}()|[\]\/\\])/g, '\\$1');

No from, to when a new empty file is in the diff

When the diff consists of an empty, and new file, the resulting object does not have from, to properties that usually are there.

To reproduce, parse this diff:

diff --git a/a b/a
new file mode 100644
index 0000000..e69de29
diff --git a/a.s b/a.s
new file mode 100644
index 0000000..e69de29
diff --git a/a.txt b/a.txt
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/a.txt
@@ -0,0 +1 @@
+a

to get:

[ { chunks: [],
    deletions: 0,
    additions: 0,
    new: true,
    index: [ '0000000..e69de29' ] },
  { chunks: [],
    deletions: 0,
    additions: 0,
    new: true,
    index: [ '0000000..e69de29' ] },
  { chunks: [ [Object] ],
    deletions: 0,
    additions: 1,
    new: true,
    index: [ '0000000..7898192' ],
    from: '/dev/null',
    to: 'a.txt' }]

I think it makes sense for them to have from: '/dev/null' and to: fileName.

Should quotes in file names in ---/+++ lines be removed?

Ie. should this test pass?

  it("should parse diff with single line quote escaped file names", function () {
    const diff = `
diff --git "a/file \\"space\\"" "b/file \\"space\\""
index 9daeafb..88bd214 100644
--- "a/file \\"space\\""  
+++ "b/file \\"space\\""  
@@ -1 +1 @@
-test
+test\n1234
`;
    const files = parse(diff);
    expect(files.length).toBe(1);
    const [file] = files;
    expect(file.from).toBe(`file \\"space\\"`);
    expect(file.to).toBe(`file \\"space\\"`);
  });

Repro:

Running this script will generate the diff above.

#!/bin/bash

git init > /dev/null
echo test > "file \"space\""
git add . > /dev/null
git commit -m "add file" > /dev/null
git tag -a BEFORE -m "" > /dev/null
echo "test\n1234" > "file \"space\""
git add . > /dev/null
git commit -m "update file" > /dev/null
git diff BEFORE

ubunutu v20.04
node v14.16.0
npm v6.14.11
git v2.25.1

P.S. I am willing to make a PR to fix it (if it is a bug).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.