Git Product home page Git Product logo

Comments (5)

mahmoud avatar mahmoud commented on May 21, 2024

Hi @Zaab1t ! This is definitely one of the more common requests I get. It's mostly a matter of how you want to do duplicate detection. Most cases people have an "id" key of some sort, so they can do something like

from boltons import iterutils

dupe_dicts = [{"id": 1, "val": 3}, {"id": 2, "val": 5}, {"id": 1, "val": 1}]

deduped_dicts = iterutils.unique(dupe_dicts, key=lambda x: x.get('id'))

print(deduped_dicts)

But it gets more complex as the data structures become more nested and the equality test stricter. frozendict would help in some cases, but even that won't cover a highly nested dictionary. Did you have a specific use case you can share?

from boltons.

carlbordum avatar carlbordum commented on May 21, 2024

Well I get my data online and sometimes the same data appeared multiple times, which caused problems. Current solution:

def remove_dublicate_dicts(iterable):
    """Only keep one copy of each dict with the exact same key/value
    pairs.
    :rtype: list of dicts.
    """
    s = set()
    for d in iterable:
        hashable_dict = tuple((key, value) for key, value in d.items())
        s.add(hashable_dict)
    return [dict(item) for item in s]
>>> remove_dublicate_dicts([{'a': 123, 'b': 0}, {'a': 123, 'b': 1}, {'a': 123, 'b': 0}])
[{'a': 123, 'b': 0}, {'a': 123, 'b': 1}]

from boltons.

mahmoud avatar mahmoud commented on May 21, 2024

Right, so your dicts' values are not nested and all hashable, which means you can just do:

unique_dicts = iterutils.unique(dupe_dicts, key=lambda d: d.items())

And you should be good! :)

from boltons.

carlbordum avatar carlbordum commented on May 21, 2024

I see. Taking care of nested mutable values wouldn't be fun. I'm gonna take a shot at it though. See how pretty a solution, we can come up with :)

from boltons.

tiwo avatar tiwo commented on May 21, 2024

In Python 3, d.items() is a view object, so it needs to be key=lambda d: tuple(d.items()) key=lambda d: frozenset(d.items()).
On the other hand, while dicts and lists don't support hashing, they have "deep" equality comparison - of course, then you'd have to compare each new item with each past item, which is inefficient.

And in Python 2 too, where items(d) is a list I believe, and the order isn't guaranteed.

from boltons.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.