inveniosoftware / dictdiffer Goto Github PK

View Code? Open in Web Editor NEW

821.0 821.0 91.0 205 KB

Dictdiffer is a module that helps you to diff and patch dictionaries.

Home Page: https://dictdiffer.readthedocs.io

License: Other

Python 99.25% Shell 0.75%

dictdiffer's People

Contributors

Stargazers

Watchers

Forkers

brianr mariuscc mariorz vibster lutfidemirci mapping scraping-xx bigdata-tools big-data jeremyjbowers chu888chu888 theglycerine mortbauer jwpeddle lnielsen lawrlee tiborsimko jirikuncar andrewmichaelsmith mvesper pzborow jmartinm frodon1 sazlin jrjsmrtn geostellar mihaibivol kmuehlbauer mikaelho rikirenz nmeisels cclauss iulianav inspirehep drjova diegodelemos ccn-2m gmazelier psucurran barracel leewalter jyoti-arora1991 yoyonel gladiatr72 pombredanne robinchew rlizzo opoplawski allena29 enquora alexor2 eailoo danielduhh bradodarb bazinga012 adelevie eldruin tnusraddinov anuragsatish zone1511 fitzoreilly hellocoldworld eric-bonfadini adrien-berchet joesolly momirza mafrosis netshy major-mayer dalejung frankfanslc deniseschmitz42 koying aldobrett diegoguedes stoensin jayvdb eduardflorinescu capuanob mayhemheroes hugebig srmamit lupko zeitgeberh napam sam-phinizy arpitjain799 the-bird-is-the-word choromanski 5l1v3r1

dictdiffer's Issues

ignore doesn't work with CaseInsensitiveDict

ignore doesn't work correctly with a CaseInsensitiveDict:

>>> from ldap3.utils.ciDict import CaseInsensitiveDict

>>> d1 = CaseInsensitiveDict(A=2, b=3, C=7, d=9, e=1)
>>> d2 = CaseInsensitiveDict(a=3, b=3, d=9, E=4)
>>> list(diff(d1, d2))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]

>>> list(diff(d1, d2, ignore=('a')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# ('change', 'A', (2, 3)) should be ignored

>>> list(diff(d1, d2, ignore=('A')))
[('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# this is correct

>>> list(diff(d1, d2, ignore=('c')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# ('remove', '', [('C', 7)]) should be ignored

>>> list(diff(d1, d2, ignore=('C')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4))]
# this is correct

Wrong patch format WRT the standard RFC 6902

This is the test that I run

self.root= {}
self.head= {'foo': 'baz1'}
self.update= {'foo': 'baz2'}

non_list_merger = Merger(self.root, self.head, self.update, {})
try:
    non_list_merger.run()
except UnresolvedConflictsException as e:
    print(e.content)

This is the result of the previous code:

[Conflict(('add', '', [('foo', 'baz1')]), ('add', '', [('foo', 'baz2')]))]

In according to the standard RFC 6902 that defines: a JSON document structure for expressing sequence of operations to apply to a JavaScript Object Notation (JSON) document; (https://tools.ietf.org/html/rfc6902)

IMHO the result is wrong for 2 reasons:

It is not wrapped in an object (but is a minor issue, tuples are fine)
The format of the response is wrong because is not returning the key foo in the right place. This force users to handle different cases and build manually the path for a given patch.

diff indicates 'change' if 'id' is different even value is same.

lets say SomeModel.objects.all() returns []

z = { 'x' : SomeModel.objects.all()}
w = { 'x' : SomeModel.objects.all()}

list(diff(z,w))
[('change', 'x', ([], []))]

because z is not equal to w

print z
{'x': []}
print w
{'x': []}

id(z.get('x'))
4426649360

id(w.get('x'))
4426647312

IndexError: list assignment index out of range

from dictdiffer import diff, patch, swap, revert

first = {
    "title": "hello",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30"],
    "settings": {
        "assignees": [100, 101, 201],
    }
}

second = {
    "title": "hellooo",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30", "/users/40"],
    "settings": {
        "assignees": [100, 101, 202],
    }
}

result = diff(second, first)
patched = patch(result, first)
print(patched)

Traceback (most recent call last):
  File "E:/MyCode/test/test31.py", line 26, in <module>
    patched = patch(result, first)
  File "D:\software\Python3.5.4\lib\site-packages\dictdiffer\__init__.py", line 332, in patch
    patchers[action](node, changes)
  File "D:\software\Python3.5.4\lib\site-packages\dictdiffer\__init__.py", line 323, in remove
    del dest[key]
IndexError: list assignment index out of range

Ignore does not work with integers

The ignore attribute does not work if one uses integers.

dictdiffer.__version__
'0.7.1'

a = {1:1,2:2,3:3}
b = {1:1,2:2,3:99,4:100}

list(diff(a,b,ignore=set([3,4])))
[('change', [3], (3, 99)), ('add', '', [(4, 100)])]

c = {'1':'1','2':'2','3':'3'}
d = {'1':'1','2':'2','3':'99','4':'100'}

list(diff(c,d,ignore=set(['3','4'])))
[]

Ignore list order

It will be nice to have ignore_order option:

from dictdiffer import diff

first = {'a': [1,2]}
second = {'a': [2,1]}

result = diff(first, second, ignore_order=True)
assert list(result) == []

Possible to treat a list always as a remove or add?

Disclaimer: this is probably out of scope, so please regard this as a feature wish

As it is right now, if an object in a list is changed, the diff will be executed as usual through field changes in the object. Would be nice to have an option to change the behavior to return a remove/add when an item in a list has changed.

your package should be in standard module

Thank for making dictdiffer, helps save time on inspection of dicts.

I could not think of a better way to support you work other than the issue, please feel free to close it.

possible to diff lists of dicts?

I'm trying to use dictdiffer to get the diff of two lists of dictionaries, but its not working well. It seems to get confused whenever the ordering of the dicts inside the list is not identical between the two lists, or when the number of dictionaries in one list is (slightly) different from the other list.

Is this functionality something that is currently possible and expected to work, or am I trying to accomplish something unsupported?

In case it matters, this is an example of a list of dictionaries that I'm trying to work with:

[{'prefix': '162.245.48.104/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.112/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.120/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.128/27',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.16/28',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.160/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.168/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.176/28',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.192/30',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.200/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.208/30',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.216/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.224/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.232/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.240/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.248/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.32/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.40/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.48/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.56/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.64/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.72/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.96/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.50.0/24',
  'effective_as_path_length': 1,
  'med': 6,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.112/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.120/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.128/28',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.144/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']}]

The return of function "patch" was wrong when dest dict`s key was int.

In [97]: a = {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}

In [98]: b = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j'}

In [99]: d = list(dd.diff(a, b))

In [100]: d
Out[100]:
[('change', '0', (0, 'a')),
('change', '1', (1, 'b')),
('change', '2', (2, 'c')),
('change', '3', (3, 'd')),
('change', '4', (4, 'e')),
('change', '5', (5, 'f')),
('change', '6', (6, 'g')),
('change', '7', (7, 'h')),
('change', '8', (8, 'i')),
('change', '9', (9, 'j'))]

In [101]: dd.patch(d, a)
Out[101]:
{0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
'0': 'a',
'1': 'b',
'2': 'c',
'3': 'd',
'4': 'e',
'5': 'f',
'6': 'g',
'7': 'h',
'8': 'i',
'9': 'j'}

And the issue may be fixed by update code:

if isinstance(dest, list) or last_node not in dest:   # line 90
            last_node = int(last_node)

diff fails on complex dictionaries.

For simple cases everything works as expected:

In [39]: d = {"erik": 1}

In [40]: d2 = {"erik": 2, "d": {'y':'Y'}}

In [41]: list(dictdiffer.diff(d, d2)     )
Out[41]: [('add', '', [('d', {'y': 'Y'})]), ('change', 'erik', (1, 2))]

However, when comparing complex objects, even minor differences are missed.

import dictdiffer
from decimal import Decimal

def test_complex_diff():
    d1 = {
            'id': 1,
            'code': None,
            'type': u'foo',
            'bars': [
                {'id': 6934900},
                {'id': 6934977},
                {'id': 6934992},
                {'id': 6934993},
                {'id': 6935014}],
            'n': 10,
            'date_str': u'2013-07-08 00:00:00',
            'float_here': 0.454545,
            'complex': [{'id': 83865,
                'goal': Decimal('2.000000'),
                'state': u'active'}],
            'profile_id': None,
            'state': u'active'
            }

    d2 = {k:v for k,v in d1.items()}

    d2['id'] = "2"

    assert len(list(dictdiffer.diff(d1, {}))) > 0, "this should work"
    assert d1['id'] == 1
    assert d2['id'] == "2"
    assert d1 is not d2
    assert d1 != d2

    assert len(list(dictdiffer.diff(d1, d2))) > 0, "didn't catch the change to the id value"

test_complex_diff()

Version tested dictdiffer==0.0.3

Include interactive demo in README file

We've been using dictdiffer lately and I think it would be great to provide some sort of interactive online demo for the library. I helped adding it to jsonschema project, you can see it running here:
https://github.com/Julian/jsonschema#demo

Do you think this will help users learn the library easier? If you want, I can create a PR to include a demo link in dictdiffer as well.

BUG - when the object is accessed (type-cast) affects the patch

print(list(result)) ## typecast change causes the breakage
patched = patch(result,first)
assert patched == second

This fails when there is print(list(result)) but works fine when there is no print/type-cast involved.

RFC Detailed patches and additional difference algorithms

For my master thesis (RFC inveniosoftware/invenio#1897) I need functionality similar to the current dictdiffer, but with the following additions

More detailed patches
In the current implementation, the patch extraction of insertions would generate something like this:
```
{} -> {'author': {'name': 'John Doe', 'affiliation': 'CERN'}}
('insert', ('author',), {'name': 'John Doe', 'affiliation': 'CERN'})
```
For my case, something like this would be desirable:
```
('insert', ('author',) {})
('insert', ('author', 'name') 'John Doe')
('insert', ('author', 'affiliation') 'CERN')
```
Since this behavior is obviously not always wanted, there should be a way to control it.
Different algorithms for patch extraction
The current dictdiffer recognizes changes in list simply by their length and their content. This doesn't suit in certain situations (text represented line by line as a list).
One alternative to this would be pythons SequenceMatcher form the difflib module, but since this also has it's quirks, a easy method to use different difference algorithms seems to be desirable.

Approaches

More detailed pathes
Since the dictdiffer is already in use, it would be a bad idea to change the default behavior to have more detailed patches, so the more reasonable approach seems to be to iterate over the patches and change them accordingly.

Different algorithms for patch extraction
Currently, the patch extraction for lists or dictionaries are hard coded in the diff method of the dictdiffer, but it should be easy to extract the corresponding methods.

def diff(first, second, node=None, ignore=None, methods=[dict_method, list_method]):
    for method in methods:
        differ, intersection, addition, deletion = method(first, second, node, ignore)
            if differ:
                break

And when we are at it we might as well introduce a key base choice for the algorithms:

def diff(first, second, node=None, ignore=None, methods={'default': [dict_method, list_method], <node>: special_method}):

    _methods = [methods.get(node)] or methods['default']

    for method in _methods:
        differ, intersection, addition, deletion = method(first, second, node, ignore)
            if differ:
                break

Even though I have to admit I don't know if this is really needed.

UPDATE:

This issue will be closed, since the requested functionality is not needed anymore. See #53 for a more pragmatic approach.

New pypi release

Hi,

We urgently need to have the changes from PR #10 integrated and released on PyPI. Is there any chance that you can integrate them or alternatively give us permission to do so? We using your module at inveniosoftware/invenio#2019.

Document how to ignore field in array of dicts

lets say i have an array like:

"races": [
    {
        "reportingUnits": [
           {
               "apiAccessTime": sometimestamp
           }
        ]
    }
]

I want to ignore races[?].reportingUnits[?].apiAccessTime regardless of the array index for both the races array and the reportingUnits subarray. Is this possible?

Better control with swap, select which of (change/add/remove) to swap.

I want to only swap add / remove of the diff, not change positions on all changes.
Or if i only wanted to change positions on changes for an example.
Any way to implement this?

data_old contains more keys than data_new.
i want to keep these old keys in the patch, but change values for newer values.

result = diff(data_old, data_new)
result = swap(result, 'remove') # I want to only swap remove to add, and not swap all changes.
patched = patch(result, data_old)
assert patched == data_old

Handling of arrays

Using dictdiffer 0.6.0, I get a list index out of range with the following code, probably because when reverting, the list elements are removed in the wrong order.

from dictdiffer import diff, patch, swap, revert

one = { 
   'one': ['is two']
 }
two = { 
   'one': ['is two', 'is three', 'is four']
}

result_diff = diff(one, two)
print revert(result_diff, two)

Is this library still maintained?

Last commit is 9 months ago, some issues are more than a year old.

Could you pls update us if this library plans to stay maintained or you're looking for new maintainers?

`diff()` producing diffs with empty list removals

Produced diffs may contain empty list instructions, e.g. see ('remove', 'stargazers', []) the canonical example from the user guide:

from dictdiffer import diff, patch, swap, revert

first = {
    "title": "hello",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30"],
    "settings": {
        "assignees": [100, 101, 201],
    }
}

second = {
    "title": "hellooo",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30", "/users/40"],
    "settings": {
        "assignees": [100, 101, 202],
    }
}

result = diff(first, second)

assert list(result) == [
    ('change', ['settings', 'assignees', 2], (201, 202)),
    ('remove', 'settings.assignees', []),
    ('add', 'stargazers', [(2, '/users/40')]),
    ('remove', 'stargazers', []),
    ('change', 'title', ('hello', 'hellooo'))]

Would look nicer without those empty list removals?

diff returns wrong result for different type of variables holding NaN

The following example using latest dictdiffer

In [58]: list(diff({"a": np.float32('nan')}, {"a" : float('nan')}))
Out[58]: [('change', 'a', (nan, nan))]

In [60]: are_different(np.float32('nan'),np.nan, 0)
Out[60]: True

fails as the diff should be empty.

The error is (I guess) in

dictdiffer/dictdiffer/utils.py

Lines 265 to 273 in f8dc205

 elif bool(first != first) ^ bool(second != second): 

 # only one of them is 'NaN', hence they are different 

 return True 

 elif isinstance(first, num_types) and isinstance(second, num_types): 

 # (a) two numerical values are compared with tolerance 

 # (b) both values are NaN and they will never fit the tolerance 

 return abs(first-second) > tolerance * max(abs(first), abs(second)) 

 # we got different values 

 return True

as np.float32 is not in num_types.

Instead of adding np.float32 to num_types I think a better solution is to return earlier as in

$ git diff .
diff --git a/dictdiffer/utils.py b/dictdiffer/utils.py
index b8885e9..981e200 100644
--- a/dictdiffer/utils.py
+++ b/dictdiffer/utils.py
@@ -262,12 +262,10 @@ def are_different(first, second, tolerance):
     if first == second:
         # values are same - simple case
         return False
-    elif bool(first != first) ^ bool(second != second):
-        # only one of them is 'NaN', hence they are different
-        return True
+    elif bool(first != first) or bool(second != second):
+        return not(bool(first != first) and bool(second != second))
     elif isinstance(first, num_types) and isinstance(second, num_types):
         # (a) two numerical values are compared with tolerance
-        # (b) both values are NaN and they will never fit the tolerance
         return abs(first-second) > tolerance * max(abs(first), abs(second))
     # we got different values
     return True

NaN comparison

right now NaNs compare as different, here's a patch to init.py to fix it:

```
   if first != second:
```

   if first != second and [first] != [second]:

Dictdiffer 0.3.0 changes and OrderedDicts

I used to use dictdiffer to diff OrderedDicts in version 0.2.0.

This testcase used to work there but it doesn't in the current git version (00e14e3):

def test_changed_order():
    from dictdiffer import diff
    from collections import OrderedDict
    o = OrderedDict
    diff_res = list(diff(
        o([('properties', o([(1, 'a'), (2, 'b')]))]),
        o([('properties', o([(2, 'b'), (1, 'a')]))])
    ))
    expected_result = [('change', 'properties', (
        o([(1, 'a'), (2, 'b')]),
        o([(2, 'b'), (1, 'a')])
    ))]
    assert diff_res == expected_result

diff_res is [] when I run it but I expect to see that something changed.
Did something change with 0.3.0 and OrderedDicts?

dictdiffer breaks if dict contains numpy arrays

When comparing two nested dictionaries containing numpy arrays it breaks with the well known:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

To reproduce:

import numpy as np
from dictdiffer import diff

d1 = {'a': np.array([1, 2, 3])}
d2 = {'a': np.array([1, 2, 4])}

print(list(diff(d1,d2)))

Traceback:

Traceback (most recent call last):
  File "example.py", line 7, in <module>
    print(list(diff(d1,d2)))
  File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/__init__.py", line 150, in diff
    for diffed in recurred:
  File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/__init__.py", line 205, in diff
    if are_different(first, second, tolerance):
  File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/utils.py", line 264, in are_different
    if first == second:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Unfortunately I have no elegant solution to this, but changing are_different like this works for my specific use case:

    # only check if same type
    if type(first) == type(second):
        # check if ndarray type
        if 'ndarray' in str(type(first)):
            return not (first == second).all()
        if first == second:
            # values are same - simple case
            return False
        elif bool(first != first) ^ bool(second != second):
            # only one of them is 'NaN', hence they are different
            return True
        elif isinstance(first, num_types) and isinstance(second, num_types):
            # (a) two numerical values are compared with tolerance
            # (b) both values are NaN and they will never fit the tolerance
            return abs(first-second) > tolerance * max(abs(first), abs(second))
    # we got different values
    return True

Support for OrderedDict?

It would be nice if dictdiffer had support for ordered dictionaries.

additional difference algorithm suggeestion

Hi,

This dictdiffer library rocks! Thank you all.

This might be another nice difference algorithm per #42. Basically similar to the symmetric difference operator from python's Set data structure.

Setup:

a = {'numbers' : [1, 2, 3]}
b = {'numbers': [0, 1, 2, 3]}

result = dd.diff(a, b)
for r in result:
    print(r)

Current output:

('change', ['numbers', 0], (1, 0))
('change', ['numbers', 1], (2, 1))
('change', ['numbers', 2], (3, 2))
('add', 'numbers', [(3, 3)])

Proposed output:

('symmetric_diff', ['numbers'], (, 0))

So (,0) would represent the addition of 0 to the second dict and that everything in the first dict is contained in the second. I think this would be like

(set(a) - set(b), set(b)-set(a))

Or at least some way of getting at the fact that the value 0 was added to the second list and not the value 3.

Cheers!

List format vs dot Notation format in dotted node

Is it intentional that path_limits are denoted in list notation rather than dot notation?

>>> list(dictdiffer.diff({'a':{'b':{'e':'f'}}, 'c':'d'}, {'a':{'b':{'e':'g'}}, 'c':'d'}, path_limit=[('',)]))
[('change', 'a.b.e', ('f', 'g'))]
>>> list(dictdiffer.diff({'a':{'b':{'e':'f'}}, 'c':'d'}, {'a':{'b':{'e':'g'}}, 'c':'d'}, path_limit=[('a','b')]))
[('change', ['a', 'b'], ({'e': 'f'}, {'e': 'g'}))]

KeyError on differenct dicts

Hi, given two different dictionaries where the second lacks the diff REMOVE key, I'm asking if this is the expected behaviour:

>>> dictdiffer.patch(dictdiffer.diff({'b': ''}, {'a': ''}), {})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/dictdiffer/__init__.py", line 330, in patch
    patchers[action](node, changes)
  File "/usr/local/lib/python3.5/site-packages/dictdiffer/__init__.py", line 321, in remove
    del dest[key]
KeyError: 'b'
>>> dictdiffer.__version__
'0.8.0'

when dict has unicode keys and ignore parameter is provided exception is raised

>>> d1 = {u'привет': 1}
>>> d2 = {'hello': 1}
>>> list(diff(d1, d2))
[('add', '', [('hello', 1)]), ('remove', '', [(u'\u043f\u0440\u0438\u0432\u0435\u0442', 1)])]
>>> list(diff(d1, d2, ignore='some'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex/.virtualenvs/29bc9535-3886-4c57-8727-42160cb6847c/local/lib/python2.7/site-packages/dictdiffer/__init__.py", line 63, in diff
    deletion = [k for k in first if k not in second and check(k)]
  File "/home/alex/.virtualenvs/29bc9535-3886-4c57-8727-42160cb6847c/local/lib/python2.7/site-packages/dictdiffer/__init__.py", line 59, in check
    else '.'.join(node + [str(key)])) not in ignore
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

rewrite in ruby

patching dictionaries is so hipster. this project should be written in ruby.

👍

Wrong diff in case of removing a non last index of an array

Suppose we have an array

old = ['a','b','c']
new = ['b','c']

# In this case I figured from the documentation that you should expect
# the diff to be ('remove','',[(0,'a')])

# but instead I'm getting:
[('change', [0], ('a', 'b')), ('change', [1], ('b', 'c')), ('remove', '', [(2, 'c')])]

Is that an error or did I misinterpret the usage of the library?

diff doesn't perform a dict diff on dict subclasses

from dictdiffer import diff


class Foo(dict):
    pass


print list(diff(
    Foo({2014: [
        dict(month=6, category=None, sum=672.00),
        dict(month=6, category=1, sum=-8954.00),
        dict(month=7, category=None, sum=7475.17),
        dict(month=7, category=1, sum=-11745.00),
        dict(month=8, category=None, sum=-12140.00),
        dict(month=8, category=1, sum=-11812.00),
        dict(month=9, category=None, sum=-31719.41),
        dict(month=9, category=1, sum=-11663.00),
    ]}),

    Foo({2014: [
       dict(month=6, category=None, sum=672.00),
       dict(month=6, category=1, sum=-8954.00),
       dict(month=7, category=None, sum=7475.17),
       dict(month=7, category=1, sum=-11745.00),
       dict(month=8, category=None, sum=-12141.00),
       dict(month=8, category=1, sum=-11812.00),
       dict(month=9, category=None, sum=-31719.41),
       dict(month=9, category=1, sum=-11663.00),
    ]})))

The code for diff() is a bit strange. It checks if the input isinstance() dict, but then throws away that data at the end and does type() on the input instead. Weird! The map "difffers" (https://github.com/inveniosoftware/dictdiffer/blob/master/dictdiffer/__init__.py#L110) is unnecessary. It'd be better to just assign to the variable 'differ' at line 61 and 69 I think.

On another note, it'd be neat if one could hook in some way to convert stuff to dicts or lists. It might be super useful for testing in django if one could just convert any model instance or queryset into a dict or a list respectively.

ignore non-overlapping keys

As part of unit testing, I'd like to use dictdiffer to compare stable/correct output (with tolerance) from a scientific simulation with potentially buggy development output. Newer versions of the tool may output additional keys that need/can not be compared and a flag to skip non-overlapping keys would be very useful.

reports NO-OP has occured when diffing lists

In [1]: from dictdiffer import diff

In [2]: d={1: [1]}; list(diff(d, d))
Out[2]: [('remove', '1', [])]

@mvesper is working on this. This report is for reference.

Announce repo move from fatiherikli to inveniosoftware

Wrong diif() output fot add value in list. returns two way operation. I thing should return only difference

current_data = {98: [], 2734: []}
new_data = {'98': ['lsjcalc'], '2734': [';sdlvmsld;vl']}
difference = list(diff(current_data, new_data))

difference
[('add', '', [('98', ['lsjcalc']), ('2734', [';sdlvmsld;vl'])]),
('remove', '', [(98, []), (2734, [])])]

Wrong diff generated if key string contains dot.

The following code will break for v0.7.0:

from dictdiffer import diff, patch

first = {
         "a.b": {
              "c.d": 1
         }
}

second = {
         "a.b": {
              "c.d": 2
         }
}

_diff = diff(first, second)
patch(_diff, first)

Exception:

Traceback (most recent call last):
  File "b.py", line 16, in <module>
    patch(_diff, first)
  File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/__init__.py", line 308, in patch
    patchers[action](node, changes)
  File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/__init__.py", line 283, in change
    dest = dot_lookup(destination, node, parent=True)
  File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/utils.py", line 251, in dot_lookup
    value = value[key]
KeyError: 'a'

Can we always use list ['a', 'b', 'c', 'd'] instead of string 'a.b.c.d' to represent a path in the dict?

installation: release universal wheels

setup.cfg universal=1
.travis.yml add deploy section

PYPI Release 0.4?

@fatiherikli Would it be possible for you to make a 0.4 release of dictdiffer on pypi with the latest changes in master? I'm dependent on the changes in one of my project, and would be nice if I could install from pypi instead of github :-)

If you need any help, I'd be happy to prepare everything so you just need to run python setup.py sdist upload.

Working with unknown length of changes?

What is the best way to work with unknown length of changes.
For example:
I know you can x = list(diff(a,b))
Which returns a list of changes in tuples.
But, x = dict(diff(a,b) does not work.
Are there any way to create a dict object easely from this diff class.
In this case I'm not really interested in wat is added or removed.
I want to check if a certain key has changed, and return back old and new value.

Support in-place patch/revert

I request adding an in-place patch/revert capability, i.e. option for performing these operations without copying the target structure.

This would be a useful performance optimization in some large-volume use cases. I also have a functional need of patching in place while recording all changes made to the structure.

This could be implemented as an additional, optional parameter to patch and revert:

Patch:

def patch(diff_result, destination, in_place=False):
    """Docstring"""
    if not in_place:
      destination = copy.deepcopy(destination)

Revert:

def revert(diff_result, destination, in_place=False):
    """Docstring"""
    return patch(swap(diff_result), destination, in_place)

And tested with e.g.

a = {
  'a' : 1
}
b = {
  'a' : 2
}
changes = list(diff(a, b))

c = patch(changes, a)
assert a != c

d = revert(changes, c, in_place=True)
assert a == d
assert c == d

e = patch(changes, a, in_place=True)
assert a == e

Suggestion: add ignore_keys optional argument

I'm writing a script to sync selected elements from OpenStreetMap database with a site database. And the dictionaries I use to store databases include some metadata that doesn't need to be synced. Looks like the ability to pass a list of keys to ignore during diff and patch would help a lot in this case.

Release v1.0?

Hello, and thanks for the helpful project you have here.

Looking through the issues, lack of recent commits to the lib and the milestone seems complete; is there anything else holding up a 1.0 release?

handling duplicate values in a list

first = {
    "title": "hello",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30"],
    "settings": {
        "assignees": [100, 101, 201, 101, 101],
    }
}


second = {
    "title": "hellooo",
    "fork_count": 20,
    "stargazers": ["/users/20", "/users/30", "/users/40"],
    "settings": {
        "assignees": [100, 101, 202],
    }
}

result = diff(first, second)

for v in result:
    print v

Output:

('push', 'settings.assignees', [202])
('pull', 'settings.assignees', [201])
('push', 'stargazers', ['/users/40'])
('change', 'title', ('hello', 'hellooo'))

Expected:

('push', 'settings.assignees', [202])
('pull', 'settings.assignees', [201, 101, 101])
('push', 'stargazers', ['/users/40'])
('change', 'title', ('hello', 'hellooo'))

Unifier class: `unify` method returns duplicated patches

Unifier class: unify method returns a list containing duplicated patches in the case of conflicts containing equal patches (same path).

e.g.

patch1 = ('remove', '', [('a', 'b')])
patch2 = ('remove', '', [('a', 'b')])
conflicts = Conflict(patch1, patch2)
conflicts.take = 'f' # can be 's' too

The result of calling u.unify([patch1], [patch2], [conflicts]) will be [patch1, patch2] instead of [patch1]. This particular case can lead to KeyErrors in case of calling patch, as it will try to remove the same key twice. The second time it fails as the key has already been deleted.

Immutable diffs

Current implementation returns diffs that reference the original structure. Thus the diff may change later when the structures change.

This is counter to at least my intuitive expectation that the diff is a snapshot of the structures at the time of running the diff.

I would request that the diff returns (deep) copies of the referenced structures by default, with maybe an option for returning references instead.

At the very least I recommend making this very explicit in the documentation - I just suffered from some very hard to debug bugs because of this.

Thanks!

Wrong diff in case of dictionary has dotted keys in ignore in dictdiffer.

Issue occurring on dictdiffer-0.7.1.

Lets take two dictionaries:

config_dict = OrderedDict([('address', 'devops011-slv-01.gvs.ggn'),('nifi.zookeeper.session.timeout', '3 secs')])

ref_dict = OrderedDict([('address', 'devops011-slv-01.gvs.ggn'),('nifi.zookeeper.session.timeout', '4 secs')])

list(diff(config_dict, ref_dict,ignore=set(['nifi.zookeeper.session.timeout'])))

Output for above is:

[('change', ['nifi.zookeeper.session.timeout'], ('3 secs', '4 secs'))]

seems like ignore functionality is not working when we pass dotted key in ignore set.

Pretty printer

Is your feature request related to a problem? Please describe.

I use dictdiffer in tests to show differences for assertion errors. Having a method that would pretty print a differences would make use of the library even better for me.

Describe the solution you'd like

Either

>>> dictdiffer.pretty_diff({}, {"foo": "bar"}))
add: 'foo': 'bar'
>>> dictdiffer.pretty_diff({"foo": "jar"}, {"foo": "bar"}))
change foo: 'jar' -> 'bar'
>>> dictdiffer.pretty_diff({"outer": {"foo": "jar"}}, {"outer": {"foo": "bar"}}))
change outer.foo: 'jar' -> 'bar'

or more like comparison view

>>> dictdiffer.pretty_diff({}, {"foo": "bar"}))
+ 'foo': 'bar'
>>> dictdiffer.pretty_diff({"foo": "jar"}, {"foo": "bar"}))
- 'foo': 'jar'        + 'foo': 'bar'
>>> dictdiffer.pretty_diff({"outer": {"foo": "jar"}}, {"outer": {"foo": "bar"}}))
  'outer': {            'outer': {
-   'foo': 'jar'      +   'foo': 'bar'

Additional context

https://github.com/lukaszb/pytest-dictsdiff and https://github.com/hjwp/pytest-icdiff does something similar (at least one have dictdiffer as dependency), but doesn't work for my cases.

expose a generic visit(diff_result, visitors) function ?

How about:

exposing a generic visit_diffs(diff_result, visitors, *args, **kwargs) function [1];
where visitors would be a dict of callables: {dictdiffer.ADD: do_something_with_add, ...};
and callables' signature would be visitor(node, changes, *args, **kwargs) ? [2]

Why ?

dictdiffer.diff() results are obviously simple to exploit but writing a visit_diffs() will probably the first thing most users will do [3];
you already implemented such a pattern in dictdiffer.patch() and dictdiffer.swap();

What do you think ?

[1] neither iter_diffs() or map_diffs() seemed correct to me.

[2] and yes, what I'm proposing does not model the Visitor pattern. For that, diff() should probably return DiffResult() instances with a visit() method ;-)

[3] that's what I've done. Twice already. The third one was generic and led me to this proposal... ;-)

Tolerance is not working if dict values are tuples

For dict values as tuples of ints or floats, the tolerance is not working:

d1 = {'a': 10.0}
d2 = {'a': 10.5}
d3 = {'a': (10.0,11,12)}
d4 = {'a': (10.5,11,12)}

result1 = diff(d1, d2, tolerance=0.1)
result2 = diff(d3, d4, tolerance=0.1)


print "result1: ", list(result1)
print "result2: ", list(result2)

Output result:

result1: []
result2: [('change', 'a', ((10.0, 11, 12), (10.5, 11, 12)))]

	elif bool(first != first) ^ bool(second != second):
	# only one of them is 'NaN', hence they are different
	return True
	elif isinstance(first, num_types) and isinstance(second, num_types):
	# (a) two numerical values are compared with tolerance
	# (b) both values are NaN and they will never fit the tolerance
	return abs(first-second) > tolerance * max(abs(first), abs(second))
	# we got different values
	return True