Comments (3)
The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.
In your case, the sname
comparison makes reference to the columns sex
and mar
, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for the sname
comparison you will need to use a blocking rule that does not use any of the columns sname
, sex
, or mar
, as these are the columns that the sname
comparison depends on.
The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.
The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for sname,
would you be able to upload an image of it?
from splink.
Thanks both for the replies this solves it. @ADBond apologies, there was indeed no values shown for sname in parameter_estimate_comparisons_chart()
from splink.
The condition used to determine whether or not parameters are estimated for a comparison is whether it not any data columns are used in any of the comparison levels.
In your case, the
sname
comparison makes reference to the columnssex
andmar
, which also appear in your training blocking rules, and so this comparison cannot be estimated. To train the parameters for thesname
comparison you will need to use a blocking rule that does not use any of the columnssname
,sex
, ormar
, as these are the columns that thesname
comparison depends on.The match weight chart (and the m u parameters chart) will show the default m-values for any comparison that has no trained values associated to it, so those will probably be what you are seeing there.
The parameter estimates chart should not show default values, and should only be displaying values that are estimated from training sessions (expectation maximisation or estimate u from random sampling) - if you do have m-values appearing there for
sname,
would you be able to upload an image of it?
I think possibly the distinction here is whether you're displaying from linker.match_weights_chart()
(which iirc does display default values) or the charts returned by the training session:
training_session = linker.estimate_parameters_using_expectation_maximisation(block_on(["first_name"]))
training_session.match_weights_interactive_history_chart()
(which shouldn't)
I admit, it's a bit confusing that linker.match_weights_chart()
shows default values, we should probably improve that somehow!
from splink.
Related Issues (20)
- [FEAT] Add GitHub action to sort/update custom dictionary HOT 3
- [FEAT] Split out system installs from spellchecker bash script HOT 2
- [MAINT] Ensure consistent capitalisation when referencing functions named after people
- [FEAT] Scala 2.13 support? HOT 4
- Can't train for M values on Databricks HOT 4
- [FEAT] Rename cols in graph metric tables
- [FEAT] Add cluster metrics to cluster studio
- Allow `__splink__df_concat` to be computed without `linker` HOT 1
- `linker.estimate_u_using_random_sampling` fails with default arguments, with no clear indication why HOT 3
- [FEAT] Allow training m without a blocking rule with a sample of the input records
- Notebook test sometimes fails HOT 2
- CI tests are not caching environment HOT 1
- [FEAT] Seed for comparison_viewer_dashboard?
- threshold_selection_tool_from_labels_table does not work using spark HOT 1
- [FEAT] Cluster evaluation - summary statistics
- [FEAT] Cluster evaluation - with ground truth data
- Sql syntax error: HOT 1
- [FEAT] Add string similarity functions to PostGres backend
- Splink3: Very poor performance of `block_using_rules_sqls` due to repeated calls to
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from splink.