Git Product home page Git Product logo

Comments (4)

mrckzgl avatar mrckzgl commented on May 23, 2024

Some more data. From

eval_obj.calculate_scores(true_positives=true_positives)

I printed eval_obj.__dict__:

{'total_matching_pairs': 76.0, 'data': <pyjedai.datamodel.Data object at 0x7e11d1839db0>, 'true_positives': 102, 'true_negatives': 185456764.0, 'false_positives': -26.0, 'false_negatives': 553360, 'all_gt_ids': {0, 1, 2, [...], 19316}, 'num_of_true_duplicates': 553462, 'precision': 1.3421052631578947, 'recall': 0.00018429449537637633, 'f1': 0.00036853838399531744}

So total_matching_pairs is smaller than true_positives.

from pyjedai.

mrckzgl avatar mrckzgl commented on May 23, 2024

Ah I got it. We have matching pairs of the same id in our ground truth. So sth. like "id1|id1" as row in the csv file. Thinking about it, this is not incorrect: An entity obviously is identical to itself, but I see also that the gt is not as clean as it should be. I will cleanup the gt, but an additional approach might be to check for identity of the ids here:

if id1 in entity_index and \

and in that case not increase true_positives to make evaluation more robust. But of course, one would need to investigate also for clean clean ER case and the other steps' evaluations, that calculations remain correct / consistent.

from pyjedai.

Nikoletos-K avatar Nikoletos-K commented on May 23, 2024

We hadn't considered this scenario before. I fully agree that it should be addressed, given the prevalence of errors in data. We will address this by adding a validation check.

Thanks for the detailed trace and feedback!

from pyjedai.

Nikoletos-K avatar Nikoletos-K commented on May 23, 2024

We added a drop_duplicates when we parse the GT file. Here:

self.ground_truth.drop_duplicates(inplace=True)

I think this will work better.

Cheers,
Konstantinos

from pyjedai.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.