A method, function or perhaps even a frozendict could help achieve this in a clean way

In Python 3, d.items() <a href="https://docs.python.o

Remove dublicates from list of dicts about boltons HOT 5 CLOSED

mahmoud commented on May 21, 2024

Remove dublicates from list of dicts

from boltons.

Comments (5)

mahmoud commented on May 21, 2024

Hi @Zaab1t ! This is definitely one of the more common requests I get. It's mostly a matter of how you want to do duplicate detection. Most cases people have an "id" key of some sort, so they can do something like

from boltons import iterutils

dupe_dicts = [{"id": 1, "val": 3}, {"id": 2, "val": 5}, {"id": 1, "val": 1}]

deduped_dicts = iterutils.unique(dupe_dicts, key=lambda x: x.get('id'))

print(deduped_dicts)

But it gets more complex as the data structures become more nested and the equality test stricter. frozendict would help in some cases, but even that won't cover a highly nested dictionary. Did you have a specific use case you can share?

from boltons.

carlbordum commented on May 21, 2024

Well I get my data online and sometimes the same data appeared multiple times, which caused problems. Current solution:

def remove_dublicate_dicts(iterable):
    """Only keep one copy of each dict with the exact same key/value
    pairs.
    :rtype: list of dicts.
    """
    s = set()
    for d in iterable:
        hashable_dict = tuple((key, value) for key, value in d.items())
        s.add(hashable_dict)
    return [dict(item) for item in s]

>>> remove_dublicate_dicts([{'a': 123, 'b': 0}, {'a': 123, 'b': 1}, {'a': 123, 'b': 0}])
[{'a': 123, 'b': 0}, {'a': 123, 'b': 1}]

from boltons.

mahmoud commented on May 21, 2024

Right, so your dicts' values are not nested and all hashable, which means you can just do:

unique_dicts = iterutils.unique(dupe_dicts, key=lambda d: d.items())

And you should be good! :)

from boltons.

carlbordum commented on May 21, 2024

I see. Taking care of nested mutable values wouldn't be fun. I'm gonna take a shot at it though. See how pretty a solution, we can come up with :)

from boltons.

tiwo commented on May 21, 2024

In Python 3, d.items() is a view object, so it needs to be ~~key=lambda d: tuple(d.items())~~ key=lambda d: frozenset(d.items()).
On the other hand, while dicts and lists don't support hashing, they have "deep" equality comparison - of course, then you'd have to compare each new item with each past item, which is inefficient.

And in Python 2 too, where items(d) is a list I believe, and the order isn't guaranteed.

from boltons.

Recommend Projects

Remove dublicates from list of dicts about boltons HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent