inveniosoftware / dictdiffer Goto Github PK
View Code? Open in Web Editor NEWDictdiffer is a module that helps you to diff and patch dictionaries.
Home Page: https://dictdiffer.readthedocs.io
License: Other
Dictdiffer is a module that helps you to diff and patch dictionaries.
Home Page: https://dictdiffer.readthedocs.io
License: Other
ignore
doesn't work correctly with a CaseInsensitiveDict
:
>>> from ldap3.utils.ciDict import CaseInsensitiveDict
>>> d1 = CaseInsensitiveDict(A=2, b=3, C=7, d=9, e=1)
>>> d2 = CaseInsensitiveDict(a=3, b=3, d=9, E=4)
>>> list(diff(d1, d2))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
>>> list(diff(d1, d2, ignore=('a')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# ('change', 'A', (2, 3)) should be ignored
>>> list(diff(d1, d2, ignore=('A')))
[('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# this is correct
>>> list(diff(d1, d2, ignore=('c')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4)), ('remove', '', [('C', 7)])]
# ('remove', '', [('C', 7)]) should be ignored
>>> list(diff(d1, d2, ignore=('C')))
[('change', 'A', (2, 3)), ('change', 'e', (1, 4))]
# this is correct
This is the test that I run
self.root= {}
self.head= {'foo': 'baz1'}
self.update= {'foo': 'baz2'}
non_list_merger = Merger(self.root, self.head, self.update, {})
try:
non_list_merger.run()
except UnresolvedConflictsException as e:
print(e.content)
This is the result of the previous code:
[Conflict(('add', '', [('foo', 'baz1')]), ('add', '', [('foo', 'baz2')]))]
In according to the standard RFC 6902 that defines: a JSON document structure for expressing sequence of operations to apply to a JavaScript Object Notation (JSON) document; (https://tools.ietf.org/html/rfc6902)
IMHO the result is wrong for 2 reasons:
It is not wrapped in an object (but is a minor issue, tuples are fine)
The format of the response is wrong because is not returning the key foo
in the right place. This force users to handle different cases and build manually the path for a given patch.
lets say SomeModel.objects.all() returns []
z = { 'x' : SomeModel.objects.all()}
w = { 'x' : SomeModel.objects.all()}
list(diff(z,w))
[('change', 'x', ([], []))]
because z is not equal to w
print z
{'x': []}
print w
{'x': []}id(z.get('x'))
4426649360id(w.get('x'))
4426647312
from dictdiffer import diff, patch, swap, revert
first = {
"title": "hello",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30"],
"settings": {
"assignees": [100, 101, 201],
}
}
second = {
"title": "hellooo",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30", "/users/40"],
"settings": {
"assignees": [100, 101, 202],
}
}
result = diff(second, first)
patched = patch(result, first)
print(patched)
Traceback (most recent call last):
File "E:/MyCode/test/test31.py", line 26, in <module>
patched = patch(result, first)
File "D:\software\Python3.5.4\lib\site-packages\dictdiffer\__init__.py", line 332, in patch
patchers[action](node, changes)
File "D:\software\Python3.5.4\lib\site-packages\dictdiffer\__init__.py", line 323, in remove
del dest[key]
IndexError: list assignment index out of range
The ignore attribute does not work if one uses integers.
dictdiffer.__version__
'0.7.1'
a = {1:1,2:2,3:3}
b = {1:1,2:2,3:99,4:100}
list(diff(a,b,ignore=set([3,4])))
[('change', [3], (3, 99)), ('add', '', [(4, 100)])]
c = {'1':'1','2':'2','3':'3'}
d = {'1':'1','2':'2','3':'99','4':'100'}
list(diff(c,d,ignore=set(['3','4'])))
[]
It will be nice to have ignore_order
option:
from dictdiffer import diff
first = {'a': [1,2]}
second = {'a': [2,1]}
result = diff(first, second, ignore_order=True)
assert list(result) == []
Disclaimer: this is probably out of scope, so please regard this as a feature wish
As it is right now, if an object in a list is changed, the diff will be executed as usual through field changes in the object. Would be nice to have an option to change the behavior to return a remove/add when an item in a list has changed.
Thank for making dictdiffer
, helps save time on inspection of dicts.
I could not think of a better way to support you work other than the issue, please feel free to close it.
I'm trying to use dictdiffer to get the diff of two lists of dictionaries, but its not working well. It seems to get confused whenever the ordering of the dicts inside the list is not identical between the two lists, or when the number of dictionaries in one list is (slightly) different from the other list.
Is this functionality something that is currently possible and expected to work, or am I trying to accomplish something unsupported?
In case it matters, this is an example of a list of dictionaries that I'm trying to work with:
[{'prefix': '162.245.48.104/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.112/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.120/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.128/27',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.16/28',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.160/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.168/29',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.176/28',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.192/30',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.200/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.208/30',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.216/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.224/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.232/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.240/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.248/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.32/29',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.40/29',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.48/29',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.56/29',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.64/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.72/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.48.96/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.50.0/24',
'effective_as_path_length': 1,
'med': 6,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.51.112/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.51.120/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.51.128/28',
'effective_as_path_length': 1,
'med': 15,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']},
{'prefix': '162.245.51.144/29',
'effective_as_path_length': 1,
'med': 0,
'destination': ['abcd.jln001.norc'],
'origin_asn': ['393467'],
'next_hop_asn': ['393467']}]
In [97]: a = {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
In [98]: b = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j'}
In [99]: d = list(dd.diff(a, b))
In [100]: d
Out[100]:
[('change', '0', (0, 'a')),
('change', '1', (1, 'b')),
('change', '2', (2, 'c')),
('change', '3', (3, 'd')),
('change', '4', (4, 'e')),
('change', '5', (5, 'f')),
('change', '6', (6, 'g')),
('change', '7', (7, 'h')),
('change', '8', (8, 'i')),
('change', '9', (9, 'j'))]
In [101]: dd.patch(d, a)
Out[101]:
{0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
'0': 'a',
'1': 'b',
'2': 'c',
'3': 'd',
'4': 'e',
'5': 'f',
'6': 'g',
'7': 'h',
'8': 'i',
'9': 'j'}
And the issue may be fixed by update code:
if isinstance(dest, list) or last_node not in dest: # line 90
last_node = int(last_node)
For simple cases everything works as expected:
In [39]: d = {"erik": 1}
In [40]: d2 = {"erik": 2, "d": {'y':'Y'}}
In [41]: list(dictdiffer.diff(d, d2) )
Out[41]: [('add', '', [('d', {'y': 'Y'})]), ('change', 'erik', (1, 2))]
However, when comparing complex objects, even minor differences are missed.
import dictdiffer
from decimal import Decimal
def test_complex_diff():
d1 = {
'id': 1,
'code': None,
'type': u'foo',
'bars': [
{'id': 6934900},
{'id': 6934977},
{'id': 6934992},
{'id': 6934993},
{'id': 6935014}],
'n': 10,
'date_str': u'2013-07-08 00:00:00',
'float_here': 0.454545,
'complex': [{'id': 83865,
'goal': Decimal('2.000000'),
'state': u'active'}],
'profile_id': None,
'state': u'active'
}
d2 = {k:v for k,v in d1.items()}
d2['id'] = "2"
assert len(list(dictdiffer.diff(d1, {}))) > 0, "this should work"
assert d1['id'] == 1
assert d2['id'] == "2"
assert d1 is not d2
assert d1 != d2
assert len(list(dictdiffer.diff(d1, d2))) > 0, "didn't catch the change to the id value"
test_complex_diff()
Version tested dictdiffer==0.0.3
We've been using dictdiffer
lately and I think it would be great to provide some sort of interactive online demo for the library. I helped adding it to jsonschema
project, you can see it running here:
https://github.com/Julian/jsonschema#demo
Do you think this will help users learn the library easier? If you want, I can create a PR to include a demo link in dictdiffer
as well.
print(list(result)) ## typecast change causes the breakage
patched = patch(result,first)
assert patched == second
This fails when there is print(list(result)) but works fine when there is no print/type-cast involved.
For my master thesis (RFC inveniosoftware/invenio#1897) I need functionality similar to the current dictdiffer, but with the following additions
More detailed patches
In the current implementation, the patch extraction of insertions would generate something like this:
{} -> {'author': {'name': 'John Doe', 'affiliation': 'CERN'}}
('insert', ('author',), {'name': 'John Doe', 'affiliation': 'CERN'})
For my case, something like this would be desirable:
('insert', ('author',) {})
('insert', ('author', 'name') 'John Doe')
('insert', ('author', 'affiliation') 'CERN')
Since this behavior is obviously not always wanted, there should be a way to control it.
Different algorithms for patch extraction
The current dictdiffer recognizes changes in list simply by their length and their content. This doesn't suit in certain situations (text represented line by line as a list).
One alternative to this would be pythons SequenceMatcher form the difflib module, but since this also has it's quirks, a easy method to use different difference algorithms seems to be desirable.
Approaches
More detailed pathes
Since the dictdiffer is already in use, it would be a bad idea to change the default behavior to have more detailed patches, so the more reasonable approach seems to be to iterate over the patches and change them accordingly.
Different algorithms for patch extraction
Currently, the patch extraction for lists or dictionaries are hard coded in the diff method of the dictdiffer, but it should be easy to extract the corresponding methods.
def diff(first, second, node=None, ignore=None, methods=[dict_method, list_method]):
for method in methods:
differ, intersection, addition, deletion = method(first, second, node, ignore)
if differ:
break
And when we are at it we might as well introduce a key base choice for the algorithms:
def diff(first, second, node=None, ignore=None, methods={'default': [dict_method, list_method], <node>: special_method}):
_methods = [methods.get(node)] or methods['default']
for method in _methods:
differ, intersection, addition, deletion = method(first, second, node, ignore)
if differ:
break
Even though I have to admit I don't know if this is really needed.
This issue will be closed, since the requested functionality is not needed anymore. See #53 for a more pragmatic approach.
Hi,
We urgently need to have the changes from PR #10 integrated and released on PyPI. Is there any chance that you can integrate them or alternatively give us permission to do so? We using your module at inveniosoftware/invenio#2019.
lets say i have an array like:
"races": [
{
"reportingUnits": [
{
"apiAccessTime": sometimestamp
}
]
}
]
I want to ignore races[?].reportingUnits[?].apiAccessTime
regardless of the array index for both the races
array and the reportingUnits
subarray. Is this possible?
I want to only swap add / remove of the diff, not change positions on all changes.
Or if i only wanted to change positions on changes for an example.
Any way to implement this?
data_old contains more keys than data_new.
i want to keep these old keys in the patch, but change values for newer values.
result = diff(data_old, data_new)
result = swap(result, 'remove') # I want to only swap remove to add, and not swap all changes.
patched = patch(result, data_old)
assert patched == data_old
Using dictdiffer 0.6.0, I get a list index out of range with the following code, probably because when reverting, the list elements are removed in the wrong order.
from dictdiffer import diff, patch, swap, revert
one = {
'one': ['is two']
}
two = {
'one': ['is two', 'is three', 'is four']
}
result_diff = diff(one, two)
print revert(result_diff, two)
Last commit is 9 months ago, some issues are more than a year old.
Could you pls update us if this library plans to stay maintained or you're looking for new maintainers?
Produced diffs may contain empty list instructions, e.g. see ('remove', 'stargazers', [])
the canonical example from the user guide:
from dictdiffer import diff, patch, swap, revert
first = {
"title": "hello",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30"],
"settings": {
"assignees": [100, 101, 201],
}
}
second = {
"title": "hellooo",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30", "/users/40"],
"settings": {
"assignees": [100, 101, 202],
}
}
result = diff(first, second)
assert list(result) == [
('change', ['settings', 'assignees', 2], (201, 202)),
('remove', 'settings.assignees', []),
('add', 'stargazers', [(2, '/users/40')]),
('remove', 'stargazers', []),
('change', 'title', ('hello', 'hellooo'))]
Would look nicer without those empty list removals?
The following example using latest dictdiffer
In [58]: list(diff({"a": np.float32('nan')}, {"a" : float('nan')}))
Out[58]: [('change', 'a', (nan, nan))]
In [60]: are_different(np.float32('nan'),np.nan, 0)
Out[60]: True
fails as the diff should be empty.
The error is (I guess) in
dictdiffer/dictdiffer/utils.py
Lines 265 to 273 in f8dc205
as np.float32 is not in num_types.
Instead of adding np.float32 to num_types I think a better solution is to return earlier as in
$ git diff .
diff --git a/dictdiffer/utils.py b/dictdiffer/utils.py
index b8885e9..981e200 100644
--- a/dictdiffer/utils.py
+++ b/dictdiffer/utils.py
@@ -262,12 +262,10 @@ def are_different(first, second, tolerance):
if first == second:
# values are same - simple case
return False
- elif bool(first != first) ^ bool(second != second):
- # only one of them is 'NaN', hence they are different
- return True
+ elif bool(first != first) or bool(second != second):
+ return not(bool(first != first) and bool(second != second))
elif isinstance(first, num_types) and isinstance(second, num_types):
# (a) two numerical values are compared with tolerance
- # (b) both values are NaN and they will never fit the tolerance
return abs(first-second) > tolerance * max(abs(first), abs(second))
# we got different values
return True
right now NaNs compare as different, here's a patch to init.py to fix it:
if first != second:
if first != second and [first] != [second]:
I used to use dictdiffer to diff OrderedDicts in version 0.2.0.
This testcase used to work there but it doesn't in the current git version (00e14e3):
def test_changed_order():
from dictdiffer import diff
from collections import OrderedDict
o = OrderedDict
diff_res = list(diff(
o([('properties', o([(1, 'a'), (2, 'b')]))]),
o([('properties', o([(2, 'b'), (1, 'a')]))])
))
expected_result = [('change', 'properties', (
o([(1, 'a'), (2, 'b')]),
o([(2, 'b'), (1, 'a')])
))]
assert diff_res == expected_result
diff_res is [] when I run it but I expect to see that something changed.
Did something change with 0.3.0 and OrderedDicts?
When comparing two nested dictionaries containing numpy arrays it breaks with the well known:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
To reproduce:
import numpy as np
from dictdiffer import diff
d1 = {'a': np.array([1, 2, 3])}
d2 = {'a': np.array([1, 2, 4])}
print(list(diff(d1,d2)))
Traceback:
Traceback (most recent call last):
File "example.py", line 7, in <module>
print(list(diff(d1,d2)))
File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/__init__.py", line 150, in diff
for diffed in recurred:
File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/__init__.py", line 205, in diff
if are_different(first, second, tolerance):
File "/local/kai/anaconda/lib/python2.7/site-packages/dictdiffer/utils.py", line 264, in are_different
if first == second:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Unfortunately I have no elegant solution to this, but changing are_different
like this works for my specific use case:
# only check if same type
if type(first) == type(second):
# check if ndarray type
if 'ndarray' in str(type(first)):
return not (first == second).all()
if first == second:
# values are same - simple case
return False
elif bool(first != first) ^ bool(second != second):
# only one of them is 'NaN', hence they are different
return True
elif isinstance(first, num_types) and isinstance(second, num_types):
# (a) two numerical values are compared with tolerance
# (b) both values are NaN and they will never fit the tolerance
return abs(first-second) > tolerance * max(abs(first), abs(second))
# we got different values
return True
It would be nice if dictdiffer had support for ordered dictionaries.
Hi,
This dictdiffer library rocks! Thank you all.
This might be another nice difference algorithm per #42. Basically similar to the symmetric difference operator from python's Set data structure.
Setup:
a = {'numbers' : [1, 2, 3]}
b = {'numbers': [0, 1, 2, 3]}
result = dd.diff(a, b)
for r in result:
print(r)
Current output:
('change', ['numbers', 0], (1, 0))
('change', ['numbers', 1], (2, 1))
('change', ['numbers', 2], (3, 2))
('add', 'numbers', [(3, 3)])
Proposed output:
('symmetric_diff', ['numbers'], (, 0))
So (,0) would represent the addition of 0 to the second dict and that everything in the first dict is contained in the second. I think this would be like
(set(a) - set(b), set(b)-set(a))
Or at least some way of getting at the fact that the value 0 was added to the second list and not the value 3.
Cheers!
Is it intentional that path_limits are denoted in list notation rather than dot notation?
>>> list(dictdiffer.diff({'a':{'b':{'e':'f'}}, 'c':'d'}, {'a':{'b':{'e':'g'}}, 'c':'d'}, path_limit=[('',)]))
[('change', 'a.b.e', ('f', 'g'))]
>>> list(dictdiffer.diff({'a':{'b':{'e':'f'}}, 'c':'d'}, {'a':{'b':{'e':'g'}}, 'c':'d'}, path_limit=[('a','b')]))
[('change', ['a', 'b'], ({'e': 'f'}, {'e': 'g'}))]
Hi, given two different dictionaries where the second lacks the diff REMOVE key, I'm asking if this is the expected behaviour:
>>> dictdiffer.patch(dictdiffer.diff({'b': ''}, {'a': ''}), {})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/dictdiffer/__init__.py", line 330, in patch
patchers[action](node, changes)
File "/usr/local/lib/python3.5/site-packages/dictdiffer/__init__.py", line 321, in remove
del dest[key]
KeyError: 'b'
>>> dictdiffer.__version__
'0.8.0'
>>> d1 = {u'привет': 1}
>>> d2 = {'hello': 1}
>>> list(diff(d1, d2))
[('add', '', [('hello', 1)]), ('remove', '', [(u'\u043f\u0440\u0438\u0432\u0435\u0442', 1)])]
>>> list(diff(d1, d2, ignore='some'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alex/.virtualenvs/29bc9535-3886-4c57-8727-42160cb6847c/local/lib/python2.7/site-packages/dictdiffer/__init__.py", line 63, in diff
deletion = [k for k in first if k not in second and check(k)]
File "/home/alex/.virtualenvs/29bc9535-3886-4c57-8727-42160cb6847c/local/lib/python2.7/site-packages/dictdiffer/__init__.py", line 59, in check
else '.'.join(node + [str(key)])) not in ignore
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
patching dictionaries is so hipster. this project should be written in ruby.
👍
Suppose we have an array
old = ['a','b','c']
new = ['b','c']
# In this case I figured from the documentation that you should expect
# the diff to be ('remove','',[(0,'a')])
# but instead I'm getting:
[('change', [0], ('a', 'b')), ('change', [1], ('b', 'c')), ('remove', '', [(2, 'c')])]
Is that an error or did I misinterpret the usage of the library?
from dictdiffer import diff
class Foo(dict):
pass
print list(diff(
Foo({2014: [
dict(month=6, category=None, sum=672.00),
dict(month=6, category=1, sum=-8954.00),
dict(month=7, category=None, sum=7475.17),
dict(month=7, category=1, sum=-11745.00),
dict(month=8, category=None, sum=-12140.00),
dict(month=8, category=1, sum=-11812.00),
dict(month=9, category=None, sum=-31719.41),
dict(month=9, category=1, sum=-11663.00),
]}),
Foo({2014: [
dict(month=6, category=None, sum=672.00),
dict(month=6, category=1, sum=-8954.00),
dict(month=7, category=None, sum=7475.17),
dict(month=7, category=1, sum=-11745.00),
dict(month=8, category=None, sum=-12141.00),
dict(month=8, category=1, sum=-11812.00),
dict(month=9, category=None, sum=-31719.41),
dict(month=9, category=1, sum=-11663.00),
]})))
The code for diff() is a bit strange. It checks if the input isinstance() dict, but then throws away that data at the end and does type() on the input instead. Weird! The map "difffers" (https://github.com/inveniosoftware/dictdiffer/blob/master/dictdiffer/__init__.py#L110) is unnecessary. It'd be better to just assign to the variable 'differ' at line 61 and 69 I think.
On another note, it'd be neat if one could hook in some way to convert stuff to dicts or lists. It might be super useful for testing in django if one could just convert any model instance or queryset into a dict or a list respectively.
As part of unit testing, I'd like to use dictdiffer to compare stable/correct output (with tolerance) from a scientific simulation with potentially buggy development output. Newer versions of the tool may output additional keys that need/can not be compared and a flag to skip non-overlapping keys would be very useful.
In [1]: from dictdiffer import diff
In [2]: d={1: [1]}; list(diff(d, d))
Out[2]: [('remove', '1', [])]
@mvesper is working on this. This report is for reference.
current_data = {98: [], 2734: []}
new_data = {'98': ['lsjcalc'], '2734': [';sdlvmsld;vl']}
difference = list(diff(current_data, new_data))
difference
[('add', '', [('98', ['lsjcalc']), ('2734', [';sdlvmsld;vl'])]),
('remove', '', [(98, []), (2734, [])])]
The following code will break for v0.7.0:
from dictdiffer import diff, patch
first = {
"a.b": {
"c.d": 1
}
}
second = {
"a.b": {
"c.d": 2
}
}
_diff = diff(first, second)
patch(_diff, first)
Exception:
Traceback (most recent call last):
File "b.py", line 16, in <module>
patch(_diff, first)
File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/__init__.py", line 308, in patch
patchers[action](node, changes)
File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/__init__.py", line 283, in change
dest = dot_lookup(destination, node, parent=True)
File "/nail/home/yifan/virtualenv_run/lib/python2.7/site-packages/dictdiffer/utils.py", line 251, in dot_lookup
value = value[key]
KeyError: 'a'
Can we always use list ['a', 'b', 'c', 'd'] instead of string 'a.b.c.d' to represent a path in the dict?
setup.cfg
universal=1.travis.yml
add deploy section@fatiherikli Would it be possible for you to make a 0.4 release of dictdiffer on pypi with the latest changes in master? I'm dependent on the changes in one of my project, and would be nice if I could install from pypi instead of github :-)
If you need any help, I'd be happy to prepare everything so you just need to run python setup.py sdist upload.
What is the best way to work with unknown length of changes.
For example:
I know you can x = list(diff(a,b))
Which returns a list of changes in tuples.
But, x = dict(diff(a,b) does not work.
Are there any way to create a dict object easely from this diff class.
In this case I'm not really interested in wat is added or removed.
I want to check if a certain key has changed, and return back old and new value.
I request adding an in-place patch/revert capability, i.e. option for performing these operations without copying the target structure.
This would be a useful performance optimization in some large-volume use cases. I also have a functional need of patching in place while recording all changes made to the structure.
This could be implemented as an additional, optional parameter to patch and revert:
Patch:
def patch(diff_result, destination, in_place=False):
"""Docstring"""
if not in_place:
destination = copy.deepcopy(destination)
Revert:
def revert(diff_result, destination, in_place=False):
"""Docstring"""
return patch(swap(diff_result), destination, in_place)
And tested with e.g.
a = {
'a' : 1
}
b = {
'a' : 2
}
changes = list(diff(a, b))
c = patch(changes, a)
assert a != c
d = revert(changes, c, in_place=True)
assert a == d
assert c == d
e = patch(changes, a, in_place=True)
assert a == e
I'm writing a script to sync selected elements from OpenStreetMap database with a site database. And the dictionaries I use to store databases include some metadata that doesn't need to be synced. Looks like the ability to pass a list of keys to ignore during diff and patch would help a lot in this case.
Hello, and thanks for the helpful project you have here.
Looking through the issues, lack of recent commits to the lib and the milestone seems complete; is there anything else holding up a 1.0 release?
first = {
"title": "hello",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30"],
"settings": {
"assignees": [100, 101, 201, 101, 101],
}
}
second = {
"title": "hellooo",
"fork_count": 20,
"stargazers": ["/users/20", "/users/30", "/users/40"],
"settings": {
"assignees": [100, 101, 202],
}
}
result = diff(first, second)
for v in result:
print v
Output:
('push', 'settings.assignees', [202])
('pull', 'settings.assignees', [201])
('push', 'stargazers', ['/users/40'])
('change', 'title', ('hello', 'hellooo'))
Expected:
('push', 'settings.assignees', [202])
('pull', 'settings.assignees', [201, 101, 101])
('push', 'stargazers', ['/users/40'])
('change', 'title', ('hello', 'hellooo'))
Unifier class: unify
method returns a list containing duplicated patches in the case of conflicts containing equal patches (same path).
e.g.
patch1 = ('remove', '', [('a', 'b')])
patch2 = ('remove', '', [('a', 'b')])
conflicts = Conflict(patch1, patch2)
conflicts.take = 'f' # can be 's' too
The result of calling u.unify([patch1], [patch2], [conflicts])
will be [patch1, patch2]
instead of [patch1]
. This particular case can lead to KeyError
s in case of calling patch
, as it will try to remove the same key twice. The second time it fails as the key has already been deleted.
Current implementation returns diffs that reference the original structure. Thus the diff may change later when the structures change.
This is counter to at least my intuitive expectation that the diff is a snapshot of the structures at the time of running the diff.
I would request that the diff returns (deep) copies of the referenced structures by default, with maybe an option for returning references instead.
At the very least I recommend making this very explicit in the documentation - I just suffered from some very hard to debug bugs because of this.
Thanks!
Issue occurring on dictdiffer-0.7.1.
Lets take two dictionaries:
config_dict = OrderedDict([('address', 'devops011-slv-01.gvs.ggn'),('nifi.zookeeper.session.timeout', '3 secs')])
ref_dict = OrderedDict([('address', 'devops011-slv-01.gvs.ggn'),('nifi.zookeeper.session.timeout', '4 secs')])
list(diff(config_dict, ref_dict,ignore=set(['nifi.zookeeper.session.timeout'])))
Output for above is:
[('change', ['nifi.zookeeper.session.timeout'], ('3 secs', '4 secs'))]
seems like ignore functionality is not working when we pass dotted key in ignore set.
I use dictdiffer in tests to show differences for assertion errors. Having a method that would pretty print a differences would make use of the library even better for me.
Either
>>> dictdiffer.pretty_diff({}, {"foo": "bar"}))
add: 'foo': 'bar'
>>> dictdiffer.pretty_diff({"foo": "jar"}, {"foo": "bar"}))
change foo: 'jar' -> 'bar'
>>> dictdiffer.pretty_diff({"outer": {"foo": "jar"}}, {"outer": {"foo": "bar"}}))
change outer.foo: 'jar' -> 'bar'
or more like comparison view
>>> dictdiffer.pretty_diff({}, {"foo": "bar"}))
+ 'foo': 'bar'
>>> dictdiffer.pretty_diff({"foo": "jar"}, {"foo": "bar"}))
- 'foo': 'jar' + 'foo': 'bar'
>>> dictdiffer.pretty_diff({"outer": {"foo": "jar"}}, {"outer": {"foo": "bar"}}))
'outer': { 'outer': {
- 'foo': 'jar' + 'foo': 'bar'
https://github.com/lukaszb/pytest-dictsdiff and https://github.com/hjwp/pytest-icdiff does something similar (at least one have dictdiffer as dependency), but doesn't work for my cases.
How about:
visit_diffs(diff_result, visitors, *args, **kwargs)
function [1];visitors
would be a dict of callables: {dictdiffer.ADD: do_something_with_add, ...}
;visitor(node, changes, *args, **kwargs)
? [2]Why ?
dictdiffer.diff()
results are obviously simple to exploit but writing a visit_diffs()
will probably the first thing most users will do [3];dictdiffer.patch()
and dictdiffer.swap()
;What do you think ?
G.
[1] neither iter_diffs()
or map_diffs()
seemed correct to me.
[2] and yes, what I'm proposing does not model the Visitor pattern. For that, diff() should probably return DiffResult() instances with a visit() method ;-)
[3] that's what I've done. Twice already. The third one was generic and led me to this proposal... ;-)
For dict values as tuples of ints or floats, the tolerance is not working:
d1 = {'a': 10.0}
d2 = {'a': 10.5}
d3 = {'a': (10.0,11,12)}
d4 = {'a': (10.5,11,12)}
result1 = diff(d1, d2, tolerance=0.1)
result2 = diff(d3, d4, tolerance=0.1)
print "result1: ", list(result1)
print "result2: ", list(result2)
Output result:
result1: []
result2: [('change', 'a', ((10.0, 11, 12), (10.5, 11, 12)))]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.