Comments (5)
Hi @Zaab1t ! This is definitely one of the more common requests I get. It's mostly a matter of how you want to do duplicate detection. Most cases people have an "id" key of some sort, so they can do something like
from boltons import iterutils
dupe_dicts = [{"id": 1, "val": 3}, {"id": 2, "val": 5}, {"id": 1, "val": 1}]
deduped_dicts = iterutils.unique(dupe_dicts, key=lambda x: x.get('id'))
print(deduped_dicts)
But it gets more complex as the data structures become more nested and the equality test stricter. frozendict
would help in some cases, but even that won't cover a highly nested dictionary. Did you have a specific use case you can share?
from boltons.
Well I get my data online and sometimes the same data appeared multiple times, which caused problems. Current solution:
def remove_dublicate_dicts(iterable):
"""Only keep one copy of each dict with the exact same key/value
pairs.
:rtype: list of dicts.
"""
s = set()
for d in iterable:
hashable_dict = tuple((key, value) for key, value in d.items())
s.add(hashable_dict)
return [dict(item) for item in s]
>>> remove_dublicate_dicts([{'a': 123, 'b': 0}, {'a': 123, 'b': 1}, {'a': 123, 'b': 0}])
[{'a': 123, 'b': 0}, {'a': 123, 'b': 1}]
from boltons.
Right, so your dicts' values are not nested and all hashable, which means you can just do:
unique_dicts = iterutils.unique(dupe_dicts, key=lambda d: d.items())
And you should be good! :)
from boltons.
I see. Taking care of nested mutable values wouldn't be fun. I'm gonna take a shot at it though. See how pretty a solution, we can come up with :)
from boltons.
In Python 3, d.items()
is a view object, so it needs to be key=lambda d: tuple(d.items())
key=lambda d: frozenset(d.items())
.
On the other hand, while dicts and lists don't support hashing, they have "deep" equality comparison - of course, then you'd have to compare each new item with each past item, which is inefficient.
And in Python 2 too, where items(d)
is a list I believe, and the order isn't guaranteed.
from boltons.
Related Issues (20)
- Test failure with Python 3.11 HOT 1
- Tag for the 23.0.0 release is missing HOT 1
- Include tests in future pypi sdist tarball HOT 5
- Convert list of dict items to list of string items
- `ParsedException.from_string(text).to_string() == text` property violated due to anchors
- `boltons.ecoutils` `23.0.0` breaks `pdb` interactive prompt in `pytest` debug sessions HOT 4
- Non-empty `dictutils.OMD` cannot be loaded from `pickle` HOT 1
- RFC: Make boltons Python 3.7+ only. HOT 8
- Support in-place union for `dictutils.OrderedMultiDict` HOT 1
- [Feature request] Parametrize the delimiter to make glom use any kind of Path delimiter, not just `.` HOT 1
- wraps loses keywords
- chunked filter HOT 3
- tracking some ideas HOT 3
- LRU .values() and dict return old entries HOT 3
- call _orig_default identity
- Names in `boltons.strutils.__all__` with no definitions
- Missing git tags for 23.1.0 & 23.1.1 releases HOT 1
- iterutils.get_path has undocumented path as string parameter HOT 1
- 23.1.1: pytest (8.1.0) fail HOT 3
- Build fails with Python 3.13 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from boltons.