Git Product home page Git Product logo

pywhy-graphs's People

Contributors

adam2392 avatar amit-sharma avatar aryan26roy avatar dependabot[bot] avatar jaron-lee avatar petergtz avatar siebert-julien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pywhy-graphs's Issues

Check the Validity of an MAG

Is your feature request related to a problem? Please describe.
Add the ability to check if the provided graph is a valid MAG.

This is a sub-procedure of #73

Describe the solution you'd like
A function in the algorithm section that returns a bool.

[ENH] Test the pag_to_mag function with PAG containing "--o" edge.

Is your feature request related to a problem? Please describe.
Currently there is a code path in the function that requires a PAG with a --o edge.

Describe the solution you'd like
Generate a MAG-PAG pair using selection bias and add it as a test.

Additional context
This is leftover work from #93.

Refactor or redesign implementation of graphs with directed and undirected edges

Is your feature request related to a problem? Please describe.
At the moment there is a graph class pywhy_graphs.CPDAG that handles graphs with directed and undirected edges. However, such graphs appear in other contexts throughout causal inference; for example, chain graphs. These graphs have the same representation but have a distinct and separate meaning.

However, simply using the CPDAG class to represent chain graphs could be confusing as these concepts are distinct in the literature.

Describe the solution you'd like
A simple way to address this is to rename the CPDAG class to something more generic. Unfortunately the correct term (Acyclic Directed Mixed Graph) already applies to graphs with bidirected edges in place of undirected edges; thus another term like "acyclic directed undirected graph" might be considered.

CPDAGs and chain graphs would use this new class to implement their graphical representations. The onus would be on the user to pass this to downstream functions in a way that makes sense (e.g. it would be up to the user to not take output from the PC algorithm and fit a chain graph model to it, for example).

Describe alternatives you've considered
Implement a separate chain graph class, with potential refactoring of the reusable parts for both into a parent class. This would leave recognizable API graph classes that might help users who are familiar with the literature to use the software.

Ability to have layout of graph returned by viz.draw match that of another graph.

Is your feature request related to a problem? Please describe.
Visual comparison of discovered graphs. For example, the following is awkward because the layouts are different.

image

Describe the solution you'd like
I'd like the layout of a graph returned by viz.draw to match the layout of another graph.

Describe alternatives you've considered
Something like this.

layout = get_dot_layout(G1)
draw(G2, layout=layout)

Setting the layout of a graph seems simple with the "pos" argument, but I'm not sure how to easily get the pos values when they are set by a layout engine like dot.

A function for enumerating front-door adjustment sets

Is your feature request related to a problem? Please describe.
In causal estimation tasks, one is interested in typically using front door adjustment to estimate the total causal effects given a fully specified causal graph (i.e. nx.DiGraph, or pywhy_graphs.ADMG).

Describe the solution you'd like
Implement https://arxiv.org/pdf/2210.05816.pdf and refactor https://github.com/CausalAILab/FrontdoorAdjustmentSets to use the pywhy_graphs API.

Describe alternatives you've considered
Note that the algorithm does not work for Markov equivalence classes.

Implement graph generator functions for common causal graphs

Is your feature request related to a problem? Please describe.
They should be under pywhy_graphs.generators.*, where each function corresponds to some graph that is generated.

Describe the solution you'd like
The following graphs are very standard in causal inference:

  1. napkin graph
  2. bow graph
  3. iv graph
  4. front door graph
  5. back door graph

These can be implemented using the ADMG class.

An example pseudocode would be:

def napkin_graph():
     # implementation

Note this differs from networkx-API for generators, because there is no concept of a create_using. Since these are causal graphs with mixed-edges, they would always use a MixedEdgeGraph class.

A function for enumerating d-separators in causal graphs

Is your feature request related to a problem? Please describe.
In many cases, we are interested in generating/querying d-separators from a causal graph. Polynomial time-delay algorithms exist for such a task and should be added to the pywhy_graphs.algorithms submodule.

Describe the solution you'd like
Implement https://proceedings.mlr.press/v115/van-der-zander20a.html for ADMG, nx.DiGraph and CPDAG.

This should operate over the subgraphs property in the MixedEdgeGraph that operates over different edge's subgraphs.

Additional reference: https://arxiv.org/pdf/1803.00116.pdf

Adding codecov and circleCI

Hi,

I do not have org privileges, so is it possible for someone to add us to circleCI and codecov and setup this repo?

@amit-sharma @robertness

My strategy for CI for most of my OSS work has always been:

  • all testing done on GH actions
  • building docs on circleCI to offload some burden
  • codecov to track unit-testing code coverage

[Networkx] Rewrite internals of `MixedEdgeGraph` to have a similar API, but simplified internals

A lot of the mixedEdgeGraph internals was essentially copied over from networkx. I think with the assumption that "maybe" we could've kept similar internal data structures like _adj, _nodes, etc. Pretty much the only thing that works is _nodes because _adj is a very specific format for storing nested dicts. This issue is just to track what needs to eventually get rewritten for robustness.

Some of the existing code can be simplified because we don't have some of these underlying internals. Rather we let the networkx subgraphs deal w/ them. For example:

  • subgraph does't work with subclasses

Add support for undirected edges in m-separation

Is your feature request related to a problem? Please describe.
Currently the m-separation function in pywhy_graphs/networkx/algorithms/causal/m_separation.py only supports directed and bidirected edges.

Describe the solution you'd like
Extend the m-separation function to incorporate logic over undirected edges.

Describe alternatives you've considered
Could implement this in a separate function, but I anticipate the implementation will leverage machinery that exists already in the m-separation function.

Additional context
This function would be useful because currently dodiscover/ci/oracle.py has an Oracle class that accepts any graph (including a PAG) but does not support dealing with undirected edges, since it uses. This allows a user to create a CI Oracle that does not detect dependence where undirected edges are involved, hampering downstream tasks like e.g. verifying FCI works correctly under selection bias.

A function for enumerating backdoor adjustment sets in causal graphs

Is your feature request related to a problem? Please describe.
In causal estimation tasks, one is interested in typically using covariate adjustment to estimate the total causal effects given a fully specified causal graph (i.e. nx.DiGraph, or pywhy_graphs.ADMG).

In addition, these sets exist for Markov equivalence classes as well: pywhy_graphs.CPDAG and pywhy_graphs.PAG

Describe the solution you'd like
Implement https://www.jmlr.org/papers/volume18/16-319/16-319.pdf construction algorithm.

The implementation should be careful to note computational complexity of the algorithms. For example, many graph set listing algorithms are exponential in the number of variables, so ideally this function returns a generator, rather than a fully specified list.

NetworkX related CI checks fail for Python 3.8

The networkX main tests keep failing in CI.

Checklist

  • I have verified that the issue exists against the main branch.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS:

Python version:

Output of pip freeze:

Steps to reproduce

Example source:

Uncovered circle paths in PAGs

Is your feature request related to a problem? Please describe.
The FCI algorithm R5 requires looking for uncovered circle paths, so we want a generic graph traversal function for circle paths.

Describe the solution you'd like
Implement pywhy-graphs function for uncovered circle paths that works on PAGs such that the FCI algorithm R5 can use it.

Describe alternatives you've considered
We may or may not refactor uncovered_pd_path.

Additional context
Original post: py-why/dodiscover#53 (comment)

cc: @jaron-lee

Checking the validity of a constructed PAG

Is your feature request related to a problem? Please describe.
Whenever a PAG is constructed, it is not necessary a valid PAG in the sense of the definition. I.e. it might be a PAG with additional orientations.

However, for many use-cases, one might want to construct only a valid PAG. So they would like to check its validity in a graphical algorithm.

Describe the solution you'd like
As suggested by @jdramsey in https://reader.elsevier.com/reader/sd/pii/S0004370208001008?token=1C0B24A74D739F3C8D05A0D2DEC40AA540835B1E58C74D189AE09D9C6FABBF30C941385A4A1F9FC9DFAE1E316E022D1E&originRegion=us-east-1&originCreation=20230320132639, Theorem 2 provides a method for turning a PAG into a MAG. This was originally suggested by Peter Spirtes.

Imo, this is an extremely valuable algorithm in terms of applications.

Then one can check the validity of the MAG using ancestrality and maximality. If there is a violation of the definition of the MAG, then the PAG is essentially invalid.

This can be implemented as a graph algorithm.

Additional context
xref: cmu-phil/tetrad#1556

Convert PAG to MAG

Is your feature request related to a problem? Please describe.
I want to be able to convert a PAG to an MAG.

Describe the solution you'd like
A function that takes in a PAG and outputs a valid MAG.

Describe alternatives you've considered

Additional context
Needed for #67

[ENH] Array representation enabling compatibility with pcalg, causal-learn and other packages

Is your feature request related to a problem? Please describe.
In most causal graph packages, the assumption of the underlying data structure is a structured array representation where certain numbers mean certain edges/endpoints. This is partially due to the lack of a robust graph data structure that is commonly accepted in the community (eg. NetworkX does not support causal graphs with mixed edges).

While this package aims to stand up causal graphs with an API layer that leverages networkx, it seems plausible and beneficial to be able to store with roundtrip IO array-like representations of the causal graphs. Moreover, it would be beneficial to have these array-like representations able to go back and forth with the structured array representations of pcalg in R, causal-learn and more.

Describe the solution you'd like
There should be a number of functions implemented:

def graph_to_array(G: MixedEdgeGraph, pattern: str = 'causal-learn'):
    if pattern == 'causal-learn':
          arr = _graph_to_causal_learn(G)
    elif pattern == 'pcalg':
          arr = _graph_to_pcalg(G)
   return arr

def _graph_to_causal_learn(G):
     # defines how to convert edges to an array for G
     ...

def _graph_to_pcalg(G):
    # defines how to convert edges to an array for G
     ...

In this way, it is very easy to then also implement compatibility among also the different packages that use structured array backends:

def causal_learn_to_pcalg(arr):
    # example
    arr[np.argwhere(arr == Endpoint.tail)] = pcalgEndPoint.tail
    ...

causal-learn enumeration: https://github.com/cmu-phil/causal-learn/blob/main/causallearn/graph/Endpoint.py
pcalg enumeration for admg/cpdag/pag: https://cran.r-project.org/web/packages/pcalg/pcalg.pdf

Note that pcalg relies on an asymmetric matrix, while causal-learn can leverage essentially an upper-triangular matrix to represent each graph.

Additional context
There are some open questions that need to be discussed with pywhy team that are independent of adding this feature in pywhy-graphs:

  1. What should the return types / internal representation in dodiscover need to be? What about dowhy? etc.?
  2. What should the dependency strategy among then pywhy repos be eventually? Rn obviously dowhy is the base. But does that make sense? E.g. dowhy contains graph stuff, conditional independence tests, etc. At least for CI tests it is presumed that would be migrated to an upstream package altogether.
  3. To interface with these graphs in a similar API that graph objects would have (e.g. neighbors, has_edge, add_edge, etc.), should the API be housed in pywhy-graphs?

Convert MAG to PAG

Is your feature request related to a problem? Please describe.
A MAG is a mixed-edge graph with directed and bidirected edges and has the 'maximality' property.

This is a sub-procedure of #73 and will enable that to function properly.

Describe the solution you'd like
Implement a function mag_to_pag, which takes in a valid MAG and outputs the corresponding PAG.

Additional context
@aryan26roy if you're still interested, this can be the next step before revisiting #73.

I presume tetrad has a function written in java. More importantly, the paper linked in the issue for #73 should contain the relevant algorithm written in the paper to convert a MAG to a PAG.

Implement pre-commit hooks

Is your feature request related to a problem? Please describe.
Previously pre-commit hooks were implemented in dodiscover, this functionality would greatly improve the development experience for pywhy_graphs.

Describe the solution you'd like
Implement pre-commit hooks as was implemented in dodiscover

Describe alternatives you've considered
All pre-commit functions can be done with poetry run poe commands but these are easy to forget.

Bug in CPDAG drawing

Checklist

  • I have verified that the issue exists against the main branch.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

When drawing a CPDAG with 4 nodes (even 3 nodes) the drawing shows more edges than the edge list displays.

{'directed': OutEdgeView([('xy', 'x'), ('x', 'z')]), 'undirected': EdgeView([])}

cpdag

{'directed': OutEdgeView([('xy', 'x'), ('x', 'z'), ('z', 'w')]), 'undirected': EdgeView([])}

cpdag

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS: Ubuntu 22.10

Python version: 3.10.7

Output of pip freeze:

accessible-pygments==0.0.4
alabaster==0.7.13
ananke==0.1
attrs==22.2.0
Babel==2.12.1
backcall==0.2.0
bandit==1.7.4
beautifulsoup4==4.12.0
black==22.12.0
CacheControl==0.12.11
causal-learn==0.1.3.3
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
cleo==2.0.1
click==8.1.3
cloudpickle==2.2.1
codespell==2.2.4
contourpy==1.0.7
crashtest==0.4.1
cryptography==40.0.1
cycler==0.11.0
Cython==0.29.33
decorator==5.1.1
distlib==0.3.6
docutils==0.19
dowhy==0.9.1
dulwich==0.20.50
dunamai==1.16.0
econml==0.14.0
exceptiongroup==1.1.1
filelock==3.10.7
flake8==5.0.4
fonttools==4.39.3
-e git+ssh://[email protected]/scikit-hep/formulate.git@1658ad732fb441c11ce8fac08e3a4eda0d43d44f#egg=formulate
gitdb==4.0.10
GitPython==3.1.31
graphviz==0.20.1
html5lib==1.1
idna==3.4
imagesize==1.4.1
importlib-metadata==6.1.0
iniconfig==2.0.0
ipython==7.34.0
isort==5.12.0
jaraco.classes==3.2.3
jedi==0.18.2
jeepney==0.8.0
Jinja2==3.1.2
joblib==1.2.0
jsonschema==4.17.3
keyring==23.13.1
kiwisolver==1.4.4
lark==1.1.5
latexcodec==2.0.1
lightgbm==3.3.5
llvmlite==0.39.1
lockfile==0.12.2
MarkupSafe==2.1.2
matplotlib==3.7.1
matplotlib-inline==0.1.6
mccabe==0.7.0
more-itertools==9.1.0
mpmath==1.3.0
msgpack==1.0.5
mypy==0.971
mypy-extensions==1.0.0
networkx==2.8.8
numba==0.56.4
numpy==1.23.5
numpydoc==1.5.0
packaging==23.0
paho-mqtt==1.6.1
pandas==1.5.3
parso==0.8.3
pastel==0.2.1
pathspec==0.11.0
patsy==0.5.3
pbr==5.11.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkginfo==1.9.6
platformdirs==3.1.0
pluggy==1.0.0
poethepoet==0.16.5
poetry==1.3.0
poetry-core==1.4.0
poetry-dynamic-versioning==0.21.4
poetry-plugin-export==1.3.0
pre-commit-hooks @ file:///home/aryan/.cache/pre-commit/repolkdstyo6
prompt-toolkit==3.0.38
ptyprocess==0.7.0
pybtex==0.24.0
pybtex-docutils==1.0.2
pycodestyle==2.9.1
pycparser==2.21
pydata-sphinx-theme==0.13.3
pydocstyle==6.3.0
pydot==1.4.2
pyflakes==2.5.0
Pygments==2.14.0
pyparsing==3.0.9
pyrsistent==0.19.3
pytest==7.3.0
python-dateutil==2.8.2
pytz==2023.3
-e git+ssh://[email protected]/aryan26roy/pywhy-graphs.git@c13d202c7af244ff94a1c8d29d7740e62a3e7c93#egg=pywhy_graphs
PyYAML==6.0
rapidfuzz==2.13.7
requests==2.28.2
requests-toolbelt==0.10.1
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
ruff==0.0.263
scikit-learn==1.1.3
scipy==1.10.1
SecretStorage==3.3.3
shap==0.40.0
shellingham==1.5.0.post1
six==1.16.0
slicer==0.0.7
smmap==5.0.0
snowballstemmer==2.2.0
soupsieve==2.4
sparse==0.14.0
Sphinx==5.3.0
sphinx-copybutton==0.5.1
sphinx-gallery==0.12.2
sphinx-issues==3.0.1
sphinx_design==0.3.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-bibtex==2.5.0
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
statsmodels==0.13.5
stevedore==5.0.0
sympy==1.11.1
sync-with-poetry @ file:///home/aryan/.cache/pre-commit/repoat95e_7n
threadpoolctl==3.1.0
tokenize-rt==5.0.0
toml==0.10.2
tomli==2.0.1
tomlkit==0.7.2
tqdm==4.65.0
traitlets==5.9.0
trove-classifiers==2023.3.9
typing_extensions==4.5.0
urllib3==1.26.15
virtualenv==20.21.0
wcwidth==0.2.6
webencodings==0.5.1
zipp==3.15.0

Steps to reproduce

Example source:

import pywhy_graphs
from pywhy_graphs import CPDAG
from pywhy_graphs.viz import draw



cpdag = CPDAG()

cpdag.add_edge("xy", "x", cpdag.directed_edge_name)
cpdag.add_edge("x", "z", cpdag.directed_edge_name)
cpdag.add_edge("z", "w", cpdag.directed_edge_name) # comment out the edges to see it reproduced with less number of edges.

print(cpdag.edges())

dot_graph = draw(cpdag, name="bug")
dot_graph.render(outfile="cpdag.png", view=True)

Convert edge probability matrix(matrices) into consensus network

Is your feature request related to a problem? Please describe.
Given some representation of a set of DAGs, we want to reduce this to one DAG. For example, given a collection of DAGs, or some graph posterior, construct a DAG.

Describe the solution you'd like
Input: A matrix of the probability of edge presence, and the probability of edge direction given presence
Output: A DAG.
Algo:

  1. Construct an undirected based on matrix 1 and a threshold.
  2. Construct a directed graph based on matrix 2 and a threshold.
  3. Remove edges based on some heuristic:
    • Removes or reverses as few edges as possible
    • Minimizes probabilistic loss (e.g. reverse X->Y is bad if it has a 99% chance of being true, but not so bad if it has a 50% chance of being true

Describe alternatives you've considered

https://rdrr.io/cran/bnlearn/src/R/averaged.network.R

Might need to work with only one matrix (the product of the two matrices I described). Many algorithms return such a matrix as output, especially those that use variational inference.

In my proposed approach a collection of graphs would be rendered into matrices. It might make sense to derive a consensus graph from a list of graphs directly. For example, one might try to preserve colliders across graphs in the collection.

Conversions to equivalence class representations that satisfy constraints

Suppose the ground truth DAG is A🡢B🡢C. Suppose you apply a score-based discovery algorithm captures an intervention on C. This algorithm should give A🡢B🡢C and A🡠B🡢C have a higher score than A🡠B🡠C. Equivalently, include/exclude lists or a Bayesian prior might force or put high probability on B🡢C and exclude/put low probability on B🡠C.

However, the traditional algorithm for converting a DAG to a CPDAG will return A--B--C, which essentially weights these three DAGs as equivalent.

This task calls to implement an algorithm that converts a DAG to a CPDAG that freezes edges oriented by interventions or prior knowledge.

Here is a sketch of an algorithm: algo

Add ability to add title to viz.draw

Is your feature request related to a problem? Please describe.
I want to visualize several graphs and distinguish them with a title.

Describe the solution you'd like
An optional argument for an image title.

Describe alternatives you've considered
Adding title outside of image. A bit clumsy.

Additional context
I'm trying to compare graphs.

Update `intro` folder of Sphinx `examples/` to show intuition and concepts of inducing path

As a follow-up to #78.

Perhaps we can just use one graph to illustrate the point. Can we use figure 2 from the paper https://arxiv.org/pdf/1104.5617.pdf

Then proceed by:

  1. drawing the figure and describing the inducing path and figure
  2. By default adjacent nodes have a trivial inducing path (the edge between them)
  3. show that inducing paths among non-adjacent nodes relative to L={L1, L2}, S={} is only possible in the figure between X1 and X5.
  4. There is no inducing path relative to L and S between (X1, X6) and (X2, X4)
  5. By adding X6 to S, then there is now an inducing path relative to L and S among (X2, X4), (X1, X3), (X5, X3)

This walks through the core concepts we've walked through on this PR on a graph that is in a well-known publication and then describes some of the intricacies of the hyperparameters (L and S).

Originally posted by @adam2392 in #78 (comment)

cc: @aryan26roy may be interested :)

Check the Markov Equivalence of two MAGs

Is your feature request related to a problem? Please describe.
I want to be able to check if two MAGs are Markov equivalent with one function call.

Describe the solution you'd like
A function that takes in two MAGs and outputs a boolean indicating whether they are markov equivalent or not.

Additional context
Needed this feature to implement tests for #93

m-separation function does not work with only bidirected graphs

Checklist

  • I have verified that the issue exists against the main branch.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

I noticed that we have a bug in the existing implementation, in the case where we have an ADMG with only bidirected edges.

In particular,

  bigraph = nx.Graph([(1, 2), (2, 3)])
  G = pywhy_nx.MixedEdgeGraph([bigraph], ["bidirected"])

  assert pywhy_nx.m_separated(G, {1}, {3}, set())

fails.

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS: Mac

Python version:

Output of pip freeze:

Steps to reproduce

Example source:

A function to determine whether an inducing path exists between two nodes

Is your feature request related to a problem? Please describe.
An inducing path defined in [1]_ is central to extending reasoning of DAGs to MAG/PAGs, so it makes sense that we should expose an algorithm for computing inducing paths

Describe the solution you'd like
Implement a function in the algorithms submodule with the API:

inducing_path(G, node_x, node_y, relative_to=None) -> Tuple[bool, path], which checks whether a path exists and if it does, it returns the path. We do not necessarily need to check all paths though(?) That is up for discussion I guess.

Additional context
[1]: https://reader.elsevier.com/reader/sd/pii/S0004370208001008?token=4B17FC536FD415013F477F7092BFDCC8861C4A6C05FB9562DA2A15DE777CD23D132FA76C0ECAE9B59FE84EADAC54959F&originRegion=us-east-1&originCreation=20230322132043

This will then enable someone to implement the DAGtoMAG algorithm.

Check edge types allowed given a `graph_type` passed in export functions

          What happens if a user has `graph_type="dag"` instead here? Will this function detect that there are `o->` edges and prevent that? Clearly the imported file does not represent a DAG so we shouldn't let the user make this mistake. Or we should have clear tests to show what happens when you import a graph under a different graph assumption than the one it was created with

Originally posted by @jaron-lee in #60 (comment)

Conversion API to other causal libraries' graph

Is your feature request related to a problem? Please describe.
We should expose an API (there is one untested really rn) for exporting to the following packages:

  • tigramite (3D time-series graph that supports time-series DAG, ADMG, PAG) -> WIP
  • pgmpy (DAG and PDAG)
  • causal-learn (currently we have this implemented) -> done
  • dagitty (DAG, ADMG) -> handled by pygraphviz
  • pcalg -> done
  • networkx to a DiGraph with a specific structure
  • numpy (tigramite, causal-learn, pcalg are all numpy-like array structures) -> done
  • ananke -> @jaron-lee to handle
  • tetrad text -> done
  • bnlearn -> handled by pygraphviz I think

This will enable pywhy-graphs to be interoperable across many other packages and also allow other users to take their graphs and convert to pywhy-graphs for the sake of using it in dodiscover and dowhy.

Describe the solution you'd like
If the other package is not in R, we can directly test roundtrip compatibility.

If the other package is in R, we should just do some basic testing that the format is "as expected". I think loading in R runtime is a whole other beast we don't want to figure out how to do in the CI pipeline.

The solution requires the following for each export:

io roundtrip: That is the basic graph structure should be preserved. Some metadata might be lost depending on the format each package stores graphs in. When possible, metadata should be preserved. A unit test would look like the following where the edge and node structure is preserved.

pcalg_G = pywhy_graphs.export(G, 'pcalg')
G_copy = pywhy_graphs.import(G, 'pcalg')

for edge_type in G.edge_types:
    assert nx.is_isomorphic(G.get_graphs(edge_type), G_copy.get_graphs(edge_type))

Additional context
n/a

[DOC] User Guide

We should have a user-guide with high-level sections dedicated to the following:

  1. What do we mean by networkx-compliant? Explain design philosophy and what classes are useful (i.e. MixedEdgeGraph) and how to use networkx causal algorithms vs networkx algorithms (i.e. by querying the sub-graph of specific edge types).
  2. What is the interpretation of a causal graph? And link possibly relevant examples.
  3. What is the interpretation of interventions in causal graphs? (i.e. F-nodes)
  4. What is the interpretation of an equivalence class of graphs?

We should also have a glossary, where we define causal terms that might not be as readily accessible to new users: F-nodes, c-components, Markov equivalence class, SCM

M-separation does not work

Checklist

  • I have verified that the issue exists against the main branch.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

I encountered a test case where the m-separation implementation fails, and the way it fails leads me to think that the approach of extension of the d-separation algorithm from Darwiche 2007 (e.g. the networkx implementation) does not work, and I don't have confidence that this extension is sound without further detailed investigation.

In particular, consider the graph A -> B, B -> D, B <-> C. Currently the m-separation implementation fails to detect A independent of C marginally. I attempted to devise a fix, but encountered further problems in dealing with undirected edges.

I found https://arxiv.org/pdf/1803.00116.pdf which provides a linear time m-separation criterion for ancestral graphs. I propose to implement that instead.

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS:

Python version:

Output of pip freeze:

Steps to reproduce

Example source:

"Consistent" extension of PAGs?

Is your feature request related to a problem? Please describe.
There is an algorithm that takes a PDAG and returns a DAG that is a member of the equivalence class. The algorithm returns a "consistent extension" which might have some theoretical properties I'm not aware of. I'm not sure if such a "consistent extension" exists for PAGs, but it does seem useful to be able to get an ancestral graph member of the equivalence class the PAG represents.

Describe the solution you'd like
An algorithm that takes a PAG and returns a MAG.

Describe alternatives you've considered
Or just a ancestral graph in the equivalence class, may doesn't need to be maximal.

Integration tests that run against the latest stable version of dodiscover

Is your feature request related to a problem? Please describe.
Causal graphs and their implementation inherently affect all upstream causal inference procedures. Most importantly dodiscover relies on the API of pywhy-graphs. Any changes introduced here should not break the "next version of dodiscover". In order to catch these, we need integration tests.

Describe the solution you'd like
There are three options:

  1. integration tests in pywhy-graphs
  2. integration tests in dodiscover
  3. integration tests in both

I am leaning towards adding this to dodiscover only, or both. We can delete one of them down the line, but the CI times are already pretty fast, so we can sacrifice this for the sake of ensuring integrability across the two packages (and eventually dowhy).

The solution would just be to create a tight "accuracy" constraint on causal discovery algorithms that leverage each graph class and then this performance should not change across any changes done in pywhy_graphs.

Convert DAG to a MAG

Is your feature request related to a problem? Please describe.
I need to be able to convert a DAG to a PAG to generate tests for #93 .

Describe the solution you'd like
A function implementing the algorithm to convert DAGs to PAGs as given in Zhang. 2008

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.