dscolby / causalelm.jl Goto Github PK
View Code? Open in Web Editor NEWTaking causal inference to the extreme!
Home Page: https://dscolby.github.io/CausalELM/
License: MIT License
Taking causal inference to the extreme!
Home Page: https://dscolby.github.io/CausalELM/
License: MIT License
@JuliaRegistrator register
This is not a priority and it will take a while. It is going to require working with CSS. Right now the theme is clean but the color palette is very bland.
@JuliaRegistrator register()
Explain that the final model is linear and that the functional form is partially linear and semiparametric.
@JuliaRegistrator register()
@JuliaRegistrator register()
Right now the models work on arrays but they should be able to work with other input like dataframes, tables, etc. This will also require figuring out what people usually use to read and manipulate their data in Julia. Getting this done will probably be a v0.4 task.
ITS conducts sensitivity analysis by generating random variables and re-estimating the causal effect. Instead, we should implement E-values, as proposed by VanderWeele, Tyler J., and Peng Ding. "Sensitivity analysis in observational research: introducing the E-value." Annals of internal medicine 167, no. 4 (2017): 268-274. Besides ITS, we should implement this as a test of confounding/exchangeability for all estimators.
The steps to implement are:
Start with the observed effect estimate (e.g., mean difference or average treatment effect) from your study, denoted as MEAN_OBS.
Calculate the lower and upper confidence limits (LL and UL) for your observed effect estimate (MEAN_OBS) based on the confidence interval from your statistical analysis.
To calculate MD_U (Minimum Strength for Upper Limit):
Start with the UL (Upper Confidence Limit) of the mean difference.
Mathematically, you can set up the following equation:
MD_U * MEAN_OBS = UL
Solve for MD_U:
MD_U = UL / MEAN_OBS
To calculate MD_L (Minimum Strength for Lower Limit):
Start with the LL (Lower Confidence Limit) of the mean difference.
Set up the following equation:
MD_L * MEAN_OBS = LL
Solve for MD_L:
MD_L = LL / MEAN_OBS
It's common for researchers to consider an E-value greater than 2 as indicative of a relatively strong association, meaning that unmeasured confounding would need to be at least twice as strong as the observed effect to explain it away. However, this is a heuristic, and the specific threshold for what is considered "high" may vary based on the field and the judgment of the researchers involved in a given study.
Currently, classification is done by just using the sigmoid activation function, which is basically just regression. This could potentially lead to predicted probabilities being outside of [0, 1]. Instead, for classification we should use a normal ELM with ReLU or another activation to get raw predictions and apply the sigmoid to those outputs similar to the way we use a softmax layer for multiclass classification.
Convert all the functions and methods from flat case.
@JuliaRegistrator register()
Add that for functions with multiple methods there should be a function definition with no parameters in the main module that includes the docstring for all the methods.
Specifically, for the estimators and metalearners we should make a separate risk ratio function; we should find a way to make estimate_causal_effect! for g-computation smaller; estimate_causal_effect! for double machine learning should definitely be broken up into smaller functions; and crossfitting_sets should also be smaller. These smaller functions will then need to be testes.
Pretty self-explanatory
@JuliaRegistrator register()
@JuliaRegistrator register()
@JuliaRegistrator register
S-learning does almost the same things as G-computation and R-learning is almost the same as double machine learning. Instead of having GComputation and DoubmleMachineLearner encapsulated within SLearner and RLearner and reusing the estimate_causal_effect! method for GComputation and DoubleMachineLearner, we should create AbstractSingleModelLearner and AbstractDoubleMachineLearner types. Then we can get rid of the encapsulation and having separate methods for them in inference.jl and overloading Base.getproperty and Base.setproperty! to get the Y vectors and just have estimate_causal_effect! methods for the abstract classes. The methods would then just do slightly different things for CATE vs ATE estimation.
They should have separate sections for notes and references and not have "..." to sandwich the arguments section. The updated format should also be in the contributor guidelines.
@JuliaRegistrator register()
@JuliaRegistrator register()
Currently, most of the functions and methods only work with floats but they should be able to accept any real numbers or subtypes of real numbers. This should be a very easy fix.
Only where the output can be kept short and there is no randomization.
Implement doubly robust estimation to estimate ATE/ATT and CATE.
Specifically, we need to remake the table with all the estimators, update the supported treatment and outcome types, add a star explaining how we clip binary treatments and outcomes, and add these explanations to the tutorials.
@JuliaRegistrator register()
Need to go back and look at Kennedy (2022) again. The algorithm he gives is kind of ambiguous and confusing about what split to train on and what split to predict on.
Some possible approaches are:
Self-explanatory
This is essentially the same thing as #36. We will use the clipping function because it preserves class predictions and constrains the predictions within the range of natural values.
Specifically incorporate accuracy because MSE doesn't work for classification.
@JuliaRegistrator register()
@JuliaRegistrator register()
This project has gotten pretty large, so there needs to be some more comments.
Re-implement data splitting using rolling folds for temporal data for G-Computation.
@JuliaRegristrator register
Change multi-line function definitions and calls to
foo(arg1, arg2,
arg3, arg4)
remove docstrings from every struct field and only put docstrings at the top of each struct.
The current permutation test for p-values using ITS estimates the null distribution by using alternative cutoff points for pre and post-intervention periods. Instead, we should permute the order of the post-treatment observations to estimate null effects and use those estimated effects under permuted post-treatment observations to estimate the null distribution.
As a reminder, we can easily create permutations using a[shuffle(1:end), :].
Priority should probably be integration with CUDA, then Intel, then ATI, then Mac. In theory this should be easy but we should watch for type instability.
@JuliaRegistrator register()
While all of the examples in the documentation are from the unit tests and have all passed, it would be better to make as many doc tests as possible.
However, a lot of the tests give the expected output and fail even though they are correct, so we need to figure out how to get doctests working.
Add W parameter for covariates for the treatment function in double machine learning, X learning, doubly robust estimation, and R learning.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
@JuliaRegistrator register()
Generate interpretations from model validation and return them as strings.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.