Comments (4)
Some more data. From
pyJedAI/src/pyjedai/clustering.py
Line 366 in 2e41af4
I printed
eval_obj.__dict__
:
{'total_matching_pairs': 76.0, 'data': <pyjedai.datamodel.Data object at 0x7e11d1839db0>, 'true_positives': 102, 'true_negatives': 185456764.0, 'false_positives': -26.0, 'false_negatives': 553360, 'all_gt_ids': {0, 1, 2, [...], 19316}, 'num_of_true_duplicates': 553462, 'precision': 1.3421052631578947, 'recall': 0.00018429449537637633, 'f1': 0.00036853838399531744}
So total_matching_pairs
is smaller than true_positives
.
from pyjedai.
Ah I got it. We have matching pairs of the same id in our ground truth. So sth. like "id1|id1" as row in the csv file. Thinking about it, this is not incorrect: An entity obviously is identical to itself, but I see also that the gt is not as clean as it should be. I will cleanup the gt, but an additional approach might be to check for identity of the ids here:
pyJedAI/src/pyjedai/clustering.py
Line 362 in 2e41af4
and in that case not increase
true_positives
to make evaluation more robust. But of course, one would need to investigate also for clean clean ER case and the other steps' evaluations, that calculations remain correct / consistent.from pyjedai.
We hadn't considered this scenario before. I fully agree that it should be addressed, given the prevalence of errors in data. We will address this by adding a validation check.
Thanks for the detailed trace and feedback!
from pyjedai.
We added a drop_duplicates when we parse the GT file. Here:
pyJedAI/src/pyjedai/datamodel.py
Line 159 in c19399a
I think this will work better.
Cheers,
Konstantinos
from pyjedai.
Related Issues (8)
- Block Filtering and Block Purging after Vector Based Blocking HOT 3
- Entity Matching metrics get sim score error HOT 4
- Entity Resolution Results Inconsistent Between Individual Steps and Workflow Method HOT 1
- ValueError in datamodel.Data HOT 1
- Hello! Collaborate and cross-inspire? HOT 2
- Executing BlockPurging -> stats results in AttributeError HOT 2
- Bug in similarity calculation in EntityMatching and incorrect documentation for dirtyER HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyjedai.