py-why / dodiscover Goto Github PK
View Code? Open in Web Editor NEW[Experimental] Global causal discovery algorithms
Home Page: https://www.pywhy.org/dodiscover/
License: MIT License
[Experimental] Global causal discovery algorithms
Home Page: https://www.pywhy.org/dodiscover/
License: MIT License
Yep! It is any Callable with the BaseConditionalIndependenceTest
API. However, I will end up refactoring all those to just "functions" as suggested by @bloebp and @petergtz. In a downstream PR this will look like the following:
BaseConditionalIndependenceTest
with Callable[[np.ndarray, np.ndarray, np.ndarray], float]
for type hintsdodiscover/ci/
submoduleOriginally posted by @adam2392 in #30 (comment)
Is your feature request related to a problem? Please describe.
We need a tutorial that demonstrates the utility of constraints in causal discovery.
Describe the solution you'd like
Update the notebook in this PR to work with constraints. Needs for this to be resolved first.
Describe alternatives you've considered
I think we could have multiple notebooks to that address this, actually.
Additional context
See links above.
Constraints are not working with the PC algorithm. See code here to reproduce.
This is probably an overlap with Issue 46 but I include it as a bug since we have code to reproduce.
main
branch.pip freeze
.
No traceback, no error thrown, just got unexpected results.
OS: OS X
Python version: 3.10.8
pip freeze
:
anyio==3.6.2
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.8
attrs==22.1.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cffi==1.15.1
contourpy==1.0.5
cycler==0.11.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
-e git+https://github.com/py-why/dodiscover.git@c89670d2fcd1cc3333fea5e6ea00b9a095bdb6d1#egg=dodiscover
dowhy==0.8
entrypoints==0.4
executing==1.1.1
fastjsonschema==2.16.2
fonttools==4.38.0
graphviz==0.20.1
idna==3.4
ipykernel==6.16.2
ipython==8.5.0
ipython-genutils==0.2.0
ipywidgets==8.0.2
jedi==0.18.1
Jinja2==3.1.2
joblib==1.2.0
jsonschema==4.16.0
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-server==1.21.0
jupyter_client==7.4.4
jupyter_core==4.11.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.3
kiwisolver==1.4.4
MarkupSafe==2.1.1
matplotlib==3.6.1
matplotlib-inline==0.1.6
mistune==2.0.4
mpmath==1.2.1
nbclassic==0.4.5
nbclient==0.7.0
nbconvert==7.2.2
nbformat==5.7.0
nest-asyncio==1.5.6
networkx @ git+https://github.com/adam2392/networkx.git@100c93d213b1f00ac6dff67e5c02bb098060fd9e
notebook==6.5.1
notebook_shim==0.2.0
numpy==1.23.4
packaging==21.3
pandas==1.5.1
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
prometheus-client==0.15.0
prompt-toolkit==3.0.31
psutil==5.9.3
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
pydot==1.4.2
Pygments==2.13.0
pyparsing==3.0.9
pyrsistent==0.18.1
python-dateutil==2.8.2
pytz==2022.5
-e git+https://github.com/py-why/pywhy-graphs.git@a5b74ac31d0f02b8b7a35ef0ca01ea10d8436646#egg=pywhy_graphs
pyzmq==24.0.1
qtconsole==5.3.2
QtPy==2.2.1
scikit-learn==1.1.3
scipy==1.9.3
Send2Trash==1.8.0
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
stack-data==0.5.1
statsmodels==0.13.2
sympy==1.11.1
terminado==0.17.0
threadpoolctl==3.1.0
tinycss2==1.2.1
tornado==6.2
tqdm==4.64.1
traitlets==5.5.0
typing_extensions==4.4.0
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.1
widgetsnbextension==4.0.3
Go through the notebook in [this PR](https://github.com//pull/67), and then run this code.
included_edges = nx.DiGraph([('S', 'L'), ('S', 'B')])
context = make_context().variables(data=data).edges(include=included_edges).build()
ci_estimator = GSquareCITest(data_type="discrete")
pc = PC(ci_estimator=ci_estimator)
def convert_to_int(df):
for var in df.columns:
data[var] = [1 if x == "yes" else 0 for x in data[var]]
return df
data_mod = convert_to_int(data)
pc.fit(data_mod, context)
graph = pc.graph_
draw(graph)
Is your feature request related to a problem? Please describe.
We need ways to evaluate the performance of causal discovery on some ground truth
Describe the solution you'd like
J. Peters and P. Bühlmann: "Structural intervention distance (SID) for evaluating causal graphs", Neural Computation 27, pages 771-799, 2015.
Describe alternatives you've considered
In causal terms, this is preferable to structural Hamming distance, which is also proposed in a separate issue.
Additional context
Is your feature request related to a problem? Please describe.
Implement GIN as a native algorithm.
Describe the solution you'd like
We have a wrapper around causal-learn for the GIN algorithm. The GIN algorithm uses kernel-based conditional independence tests. So do we. In the very least we can leverage our KCI test and other CI tests. In my view, GIN is a constraint-based algorithm (that learns latents) and should be with other constraint-based algorithms. @adam2392
A native implementation would help us apply constraints as well.
Describe alternatives you've considered
Could stay in a separate directory for causal rep related algorithms.
Is your feature request related to a problem? Please describe.
No causal discovery library worth it's salt has only observational causal discovery alone.
Describe the solution you'd like
Hard interventions: For data in a data frame. I imaged a list of lists, where the jth element of the outer list is a list of column indexes that were intervened on in the jth row. Although this already feels like there is a missing abstraction that needs definition
Soft interventions: A list, where the jth element of the list corresponds to value of the soft intervention had in the jth row of the data.
Describe alternatives you've considered
Perhaps an intervention column in the data itself.
Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters.
I tried running PC on this data:
A | S | T | L | B | E | X | D |
---|---|---|---|---|---|---|---|
no | yes | no | no | yes | no | no | yes |
no | yes | no | no | no | no | no | no |
no | no | yes | no | no | yes | yes | yes |
no | no | no | no | yes | no | no | yes |
no | no | no | no | no | no | no | yes |
But it threw an error. To get it to work I had to convert the values to ints.
def convert_to_int(df):
for var in df.columns:
data[var] = [1 if x == "yes" else 0 for x in data[var]]
return df
data_mod = convert_to_int(data)
pc.fit(data_mod, context)
Calling this a bug. pc.fit(data, context)
should work.
LGTM. My only issues are with some of the type hints.
@adam2392 can you comment here on why some flavors are left out? For example:
Originally posted by @robertness in #16 (review)
Is your feature request related to a problem? Please describe.
Implement: https://www.jmlr.org/papers/volume10/li09a/li09a.pdf
Describe the solution you'd like
Basically perform FDR procedure on sets of pvalues over the entire graph.
Note that the storage cost will go up since we need to keep track of these values
Additional context
Came up in #18
The general issue here is that many structure learning algorithms are sort of "state dependent" and thus it is difficult to parse what is occurring before the final result (i.e. graph) is returned. Logging is a way for developers and users to debug their pipeline, but it should also be easy to turn on/off various levels of messaging.
I think we can possibly emulate something like the MNE-Python, or some other package's solution: https://mne.tools/stable/logging.html
"There should be an option to turn off logging information. Maybe make the logger an optional parameter that can be passed or add flag to turn it off. (This applies to all logging/print information)"
Originally posted by @bloebp in #20 (comment)
Is your feature request related to a problem? Please describe.
We need notebooks for our algorithms.
Describe the solution you'd like
A notebook describing the FCI algorithm on a public benchmark dataset
Is your feature request related to a problem? Please describe.
Add causal constraints to Causica.
Most likely we will be moving the CI tests from dodiscover/dowhy to a new CI testing repo. Then this thread talks about some of the design choices of type of input to expect.
Originally posted by @adam2392 in #25 (comment)
https://app.codecov.io/gh/py-why/dodiscover/tree/main/dodiscover shows the coverage by our unit-testing. A robust software package should at least aim for >90% test coverage.
This is feasible for us given that many of the API and algorithms have expected usage. The main components that we DEFINITELY should improve upon are:
Context
is a central ingredient for our API, then we should have full test coverage of the expected errors and edge casesThe first two should be relatively straightforward upon inspection of the codecov files. However, adding additional tests to the causal discovery algos may require some more thinking since these need to test different parametrizations of the algorithms.
We are interested in a new user contribution of an example where they demonstrate how to use the ClassifierCITest
with a scikit-learn classifier and a neural-net based classifier.
This would enable CI testing over high-dimensional objects, such as images and pixels for example. The example does not necessarily need to operate over images/pixels.
Ideally the pytorch model is already trained and fitted, and the model is just pulled from something like hugging face. We can introduce a test-time dependency on pytorch and hugging face. Then the model should be relatively small, so it can run in the continuous integration platform.
"Could we expand this method to work with a neural-net based classifier using Keras or Pytorch? It likely feels like overkill but it would be a first step to telling a story to the broader community that this repo is using cutting edge "deep" methods. We could also workshop an example with multidimensional variables later (e.g. pixels) though not in this PR.
Originally posted by @robertness in #28 (review)"
LGTM. One comment. The CMI value that is calculated as follows:
val = hxyz - (hxz + hyz - hz).mean()
Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?
Originally posted by @robertness in #85 (review)
Hi @adam2392,
just as a heads-up: By using type-hints in
dodiscover/dodiscover/_protocol.py
Line 10 in 3c4f8d2
it could be that type-checkers will give a type violation when passing in a networkx Graph
for this protocol, because networkx does not use type-hints. We've tried that originally for
and then removed them again, because IntelliJ would show an error, and I'd assume other type-checkers will do the same.
Feel free to close this issue after reading, just wanted to make you aware of it.
One may easily extend the code in dodiscover.metrics.py
for the confusion_matrix_networks
function to allow comparisons based on edge directionality. This is an ALL-OR-NONE type of comparisons, where we either test for the presence of an adjacency, or we test for absolute correctness of the adjacency and directionality of the edge.
Right now, the function only implements testing for the presence of an adjacency. See Robert's comments for more details.
I would say either you have an option for a strict matching of edge-type or just presence of an edge. So in the latter case if the true edge is A <- B and the detected edge is A -> B or A - B, then this would be a false negative in the former case and a true positive in the latter. I think you're just doing the latter case. I think that is the preferred case and it's fine if we just have that.
Originally posted by @robertness in #48 (comment)
Reposting from @bloebp and Jaron
https://github.com/cmu-phil/example-causal-datasets for example datasets
The below for real/synthetic datasets used in dowhy.
"You can generate new ones (see the code at the bottom). Otherwise, the notebook uses this:
Latencies:
https://github.com/py-why/dowhy/blob/main/docs/source/example_notebooks/rca_microservice_architecture_latencies.csv
Supply chain:
https://github.com/py-why/dowhy/blob/main/docs/source/example_notebooks/supply_chain_week_over_week.csv"
Based on the larger discussion today, we like toy examples into the examples/
directory for communicating core concepts and then tutorials/
for larger more involved code and documentation that works with more complex datasets, such as the above.
What would be nice is to have a tutorial for the key structure learning algorithms that leverage the above two datasets and demonstrates the difference in output.
In addition, it would be nice to have a short sweet example for each core algo.
This Issue is just keeping track of that discussion.
Is your feature request related to a problem? Please describe.
We need more ways to compare a discovered graph to a ground truth graph.
Describe the solution you'd like
Overlay two graphs using graph viz, and use different edge colors to indicate:
Describe alternatives you've considered
We've proposed structural Hamming distance and confusion matrices. However, graph visualization is more intuitive and it shows you where exactly in the graph you made your errors, which can be much more informative than a statistic.
Additional context
This exists in bnlearn though it is not clear to the user which graph is the graph being evaluated (the discovered graph) and which graph is the reference graph (the ground truth).
The team has proposed an API for causal discovery in: https://github.com/py-why/dowhy/wiki/API-for-Global-Causal-Discovery
Porting over a relevant example algorithm such as the PC algorithm will require:
Ask @adam2392 for details. Involves refactoring how the algorithm deals with colliders. Requires only one method to be rewritten.
Is your feature request related to a problem? Please describe.
Would like the graphviz.compare
feature from bnlearn.
graphviz.compare() plots one or more figures and returns invisibly a list containing the graph objects generated from the networks that were passed as arguments (in the same order). They can be further modified using the graph and Rgraphviz packages.
graphviz.compare(groundtruth, learned_graph)
Describe the solution you'd like
Basically, you provide a ground truth graph and a discovered graph. Edge color and type is used to
You get a false positive in the case of comparing directed edge against an undirected edge,
net <- model2network("[B][A|B]")
graphviz.compare(cpdag(net), net)
Describe alternatives you've considered
We have some other graph comparison
Is your feature request related to a problem? Please describe.
Need to apply constraints, if possible to GIN algorithm. If not possible with causal-learn interface, need to surface this to the user.
Describe the solution you'd like
Constraints in the context variable should be applied to GIN learning.
Describe alternatives you've considered
Could just have a warning message.
Additional context
Initial GIN wrapper PR
Hi @adam2392,
I recommend renaming
dodiscover/dodiscover/_protocol.py
Line 6 in 3c4f8d2
to just Graph
. In Java it became kind of an anti-pattern to name an entity ISomeEntity
for interfaces and I would apply the same here. The reason is that it gives you less flexibility in your design to change from a concrete class to a protocol or back, because you'd have to change the name too and everywhere where it's used.
Is your feature request related to a problem? Please describe.
Currently, the FCI algorithm implementation does not implement R5-8 in the FCI algorithm, which accounts for learning graphs with selection bias.
It would be a straightforward issue for someone to add these into the existing implementation.
Describe the solution you'd like
Implement R5-8 of the FCI algorithm. See publication.
_apply_rule5
, _apply_rule6
and _apply_rule7
_apply_rules_1to10
tests/unit_tests/constraint/test_fcialg.py
Probably the most involved would be implementing the unit tests. The current graph API is a work in progress.
Required and excluded edges are not taken into account at the orientation phase (i.e. setting edge directions) of constraint-based structure learning algos.
There are a few challenges though to tackle:
"Required edges and excluded edges are built into the LearnSkeleton
class rn meaning they do not account for directionality. However, these can also be taken into account at the orientation phase...
There currently is a tradeoff that theory has yet to answer:
^ the nuanced challenge here is that the resulting object is NOT a CPDAG/PAG, and thus would not be expected to work with the downstream ID/Estimation algorithms that assume an input CPDAG/PAG. I'm leaning towards passing in a warning str to the graph object that when printed out, warns users to not use them in ID/estimation if they set prior edges that might break the conditional-independence statements.
Originally posted by @adam2392 in #30 (comment)"
Is your feature request related to a problem? Please describe.
The Psi-FCI algorithm is an extension of the FCI algorithm that is complete for Psi-PAGs. The algorithm can learn a Psi-PAG given a combination of observational and soft-interventional datasets. This will be a good introduction of a more complex multi-dataset algorithm that we can start taking advantage of to flush out the Context/data API. Moreover, it will help us test the limits of the existing graph and dowhy.gcm
API.
We should implement the algorithm as outlined in https://causalai.net/r67.pdf.
Describe the solution you'd like
The Psi-FCI should subclass the existing FCI algorithm. In terms of graphical structure, there are additional graph API to consider. The main part being encoding what are known as "F-nodes". Another issue to sort out is how to test for conditional distribution differences (i.e. is P_1(y|x) = P_2(y|x)?), which is a central problem within Psi-FCI and other multi-dataset structure learning algos.
P_1(y|x) = P_2(y|x)
fit(data, context)
/ learn_graph(data, context)
APIAdditional context
cc: @robertness @darthtrevino
Is your feature request related to a problem? Please describe.
Before implementing a native version, a first step could be to demonstrate the usage of GIN by for example replicating one of the simulation results in the original paper. One should note that the algorithm only has guarantees when all observed variables have no direct causal relationships with each other.
It should be noted in the tutorial.
Afterwards, a native implementation (#87) can take place that can use the unit tests and the tutorial to check for correctness.
Describe the solution you'd like
Add a notebook to docs/tutorials
, or add a short python script example to examples/
. I think since this is "explaining an algorithm", this would be more "in-depth" and should therefore be a tutorial.
Additional context
cc: @robertness wanna add this, since you're more familiar?
In constraint-based causal discovery, there are various types of "conditional testing". In traditional algos, like PC/FCI, one typically has variables of the same type (i.e. binary, discrete, continuous) and one is interested in the conditional independence test:
$H_0: X \perp Y | Z$
On the other hand, modern algorithms that leverage multiple datasets, such as the psi-FCI algorithm, inherently want to perform a conditional k-sample test, where:
$H_0: P_t(Y|X) = P_{t'}(Y|X)$
testing for differences in the conditional distributions of Y|X.
Right now the API is slightly different, but they could be consolidated.
The test(..)
api could probably have the following:
df
the datasetx_vars
: X variable(s)y_vars
: Y variable(s)z_vars
: Z variables in CI testing, or group indicator in CD testing" That directory handles CI testing, which tests: H_0: P(y | x, z) = P(y | z)
, where typically x,y,z
have to of the same variable type (i.e. binary, discrete, continuous).
while conditional discrepancy tests: H_0: P(y | x) = P'(y | x)
, or formulated as H_0: P(y|x,t) = P(y|x)
, where t
is discrete and y,x
can be continuous.
I guess they could... be formulated as the same arguments in the API. Perhaps I can refactor the code if there is a lot of similarity between ci
and cd
tests to a common just conditional_testing
submodule."
Originally posted by @adam2392 in #82 (comment)
Is your feature request related to a problem? Please describe.
We need more ways to compare a discovered graph to a ground truth graph.
Describe the solution you'd like
Structural hamming distance (SHD) is one solution. In simple terms, this is the number of edge insertions, deletions or flips in order to transform one graph to another graph. Tsamardinos I, Brown LE, Aliferis CF (2006). "The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm". Machine Learning, 65(1):31–78.
Describe alternatives you've considered
Treating edges as labels and providing a confusion matrix is already a work in progress.
Created an issue for comparing graphs through visualization
Additional context
Implementation in CDT
Tracking what it will take to do an initial release for v0.1. We should do a simultaneous release of pywhy-graphs
v0.1, so that way.
Some left-over PRs I want to get merged:
pywhy-graphs
to v0.1 once we release thereContext
and the constraint-based causal discoveryOther possible improvements, we can stack in are #29, but this really also requires someone to champion.
The classifier-based CI tests are nice because they are non-parametric generally, but slow because for each test, you need to re-fit a classifier. For modular components, we should optionally cache predictions.
For example:
Another issue is that we should ensure that the classification model is refit from scratch each time ci_estimator.test(...)
is called. Downstream classification models in scikit-learn might support warm-starts, and this may be the subject
Create a nested dictionary
For CCMI:
For CCIT:
Is your feature request related to a problem? Please describe.
We need an initial score-based algorithm.
Describe the solution you'd like
Use causal-learn as a dependency to build GES
Describe alternatives you've considered
We could build it on our own, but causal-learn has a solid implementation and there is no need to reinvent the wheel
Is your feature request related to a problem? Please describe.
Add a basic wrapper to the DECI algorithm in the causica library.
Describe the solution you'd like
At a first pass, implement with no constraints and no intervention model. Make this an optional dependency.
Describe alternatives you've considered
Causica is a good example of a SCM-based causal discovery algorithm that utilizes deep learning.
The results of causal discovery algorithms should be comparable to one another, such that a user can see how well an algorithm performed without having to understand the details of the algorithm.
Currently, we do copying of Context within the ContextBuilder
. I think this is fine for simplifying the API of the Context
, but I think adding new properties to Context
then necessarily means we update this one area of copying. It is possible that there are bugs that are introduced.
def make_context(context: Optional[Context] = None) -> ContextBuilder:
result = ContextBuilder()
if context is not None:
# this part needs to be updated manually every time
result.graph(deepcopy(context.init_graph))
result.edges(deepcopy(context.included_edges), deepcopy(context.excluded_edges))
result.variables(copy(context.observed_variables), copy(context.latent_variables))
result.state_variables(deepcopy(context.state_variables))
return result
Instead, I propose to dynamically get all possible parameter names similar to how scikit-learn does it. We would add _get_params()
and _set_params()
to the Context
class, and then change the above LOC to:
def make_context(context: Optional[Context] = None) -> ContextBuilder:
result = ContextBuilder()
if context is not None:
context_params = context.get_params()
result.set_params(context_params)
return result
We can make get_params/set_params
private to discourage users from manipulating their context classes themselves.
Implement the GIN causal discovery algorithm. GIN learns latent causes that cluster observed variables. I also learns causal orderings between those latent causes.
References:
This epic targets a variety of baseline causal discovery algorithms in dodiscover. The goal is to have a MVP as a discovery library.
Implement an algorithm from the class of causal discovery algorithms that learn a hierarchical tree over and observed set of variables.
The primary reference is: Xie, Huang Chen, He, Geng, Zhang, “Estimation of Linear Non-Gaussian Latent Hierarchical Structure,” arxiv 2022
Related algorithms:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.