Git Product home page Git Product logo

Comments (14)

nsalas24 avatar nsalas24 commented on June 4, 2024 2

Hey @dcompgriff,

You might find this repo useful as well: https://github.com/quantumblacklabs/causalnex

They implement the NO TEARS algorithm https://arxiv.org/abs/1803.01422

from dowhy.

amit-sharma avatar amit-sharma commented on June 4, 2024 2

thanks for sharing the link to causalnex @nsalas24. That looks like an excellent library for structure learning.

from dowhy.

dcompgriff avatar dcompgriff commented on June 4, 2024 2

Awesome, thanks @naslas24. I've been waiting for the quantum black folks to come out with their causal inference package. I'll definitely take a look at this.

from dowhy.

yangliu2 avatar yangliu2 commented on June 4, 2024 1

I don't have anything to add. But yes, add the causal discovery part to the package. So people can use both parts in a unified framework. This is nice btw.

from dowhy.

amit-sharma avatar amit-sharma commented on June 4, 2024

Thanks for the pointer @dcompgriff . The causal discovery toolbox (cdt) looks quite cool. I would definitely like to see causal discovery integrated with DoWhy.

However, @emrekiciman and I have been discussing on how exactly to integrate with DoWhy. One option is to use them upfront in the modelling stage. This has the benefit of helping people work with complex datasets, as you say. But many of the discovery algorithm do not handle unobserved confounders well, so any obtained graph may be susceptible to biases due to unobserved confounding. So we'll need some way of conveying to the users the exact assumptions on which the causal model is generated.

Another option is to let the user specify a graph in the model stage, but then use causal discovery algorithms to detect any obvious problems with the user's graph. Of course, to over-ride the user's graph, we will probably use only the edges on which the causal discovery algorithm is most certain about. This may need additional work (to identify which of the edges are more robust in the learnt causal graph), but may be a nice way to combine user's domain knowledge with the power of causal discovery algorithms. It may also convey to the user that causal discovery algorithms are better thought of as algorithmic suggestions, rather than the true correct graph. The downside, of course, is that the process (and the API) for doing this will look complicated. More generally, there's an opportunity to frame some of the causal discovery work as a refutation of the user's model.

What do you think about these two alternatives?
As a library, it might make sense for DoWhy to provide both options to the user, but it will be good to discuss how we would like the default experience to be.

from dowhy.

emrekiciman avatar emrekiciman commented on June 4, 2024

from dowhy.

dcompgriff avatar dcompgriff commented on June 4, 2024
  1. When to integrate causal discovery?
    Causal discovery can be done completely before construction of the 'CausalModel' object. For example, today I use causal discovery algorithms to generate a networkx graph, and then feed this graph structure into the CausalModel because I can convert the networkx graph ito a glm format. Truthfully, this could probably just be it's own sub-module of dowhy, one that doesn't even have to change the API already available because it doesn't touch anything in the stages of analysis after you have the graph defined.

  2. Discovery limitations.
    You bring up a great point about limitations of causal discovery, and these should for sure be outlined as either a warning or just in the documentation.

  3. Checking/refuting provided models?
    I think this may be interesting to include as a optional flag during the modeling stage. When the CausalModel object is first created, there can simply be an optional flag for whether to validate the model's graph using causal discovery algorithms.

  4. Ambiguous edges?
    The way I deal with ambiguous edges is to manually examine the graph output from causal discovery, and then attempt to orient the edges I can using domain knowledge. Not exactly automated, but then again, classical causal inference already has it's own assumptions on the entire graph, so I feel this is ok. However, I think i've seen some packages that will output causal estimates for all graphs (even with ambiguous edges) by enumerating the edges in the graph, and then applying causal estimation. I'm less of a fan of this for more tha 2 ambiguous edges however, and don't think this should be incorporated regardless because this can already be done with the current package by having the user do this enumeration.

from dowhy.

emrekiciman avatar emrekiciman commented on June 4, 2024

from dowhy.

dcompgriff avatar dcompgriff commented on June 4, 2024

Speaking to the handling of ambiguous edges:
I think that if the current code for performing identification/estimation/sensitivity analysis requires a DAG, then adding error code (unless you have it already) for when a graph with bi-directional edges is passed is at least one way to deal with the issue of ambiguous edges, forcing users to at least set the direction of these edges themselves, or find causal estimates given both directions.

As for how to force users to evaluate their graphs constructed from causal discovery... There's only so much that can be done from an API perspective I think. Outputting a warning for sure would be good, but some of it may just come down to more documentation. While causal discovery and causal estimation have some nice theoretical foundations, I've found that I've needed to be more involved with validating the graphs output by causal discovery. From everyone I've talked to using this analysis in industry, I think it's still 'best practices' to sit down and visually validate proposed graphs. Causal discovery is useful, but not perfect. It can help in providing insight into unknown causal directions in some cases, but in others it's non-sensical. For example, i've had discovery algorithms try to tell me that 'number of products purchased' was causal of 'total users' for a customer in one of my projects. I think the best thing to do is to output warnings about the potential issues of discovered graphs, and provide good tutorial&API documentation discussing these issues. But either way I still feel discovery algorithms are valuable to have. I'm personally wary of fully specifying the causal graph structure myself without at least trying FIC (Fast IC), GES (Greedy Equivalence Search), and other algorithms.

from dowhy.

emrekiciman avatar emrekiciman commented on June 4, 2024

from dowhy.

dcompgriff avatar dcompgriff commented on June 4, 2024

Sure. I can work on making this contribution. I've been meaning to implement some of these algorithms anyways because I have some challenges with the existing ones and the constraints they allow me to specify before performing discovery.

from dowhy.

emrekiciman avatar emrekiciman commented on June 4, 2024

from dowhy.

BoltzmannBrain avatar BoltzmannBrain commented on June 4, 2024

FWIW my team has found problems with the aforementioned CausalNex NOTEARS for causal discovery, summarized well by others here: https://arxiv.org/abs/2104.05441

If there's initiative for adding causal discovery to dowhy @amit-sharma, happy to help in some capacity.

from dowhy.

amit-sharma avatar amit-sharma commented on June 4, 2024

yeah, I'd seen that paper too and realized that NOTEARS-like continuous optimizers are not ready yet for causal discovery.

Thanks for restarting this thread @BoltzmannBrain We just added an experimental implementation of causal discovery in DoWhy. It leans on the existing implementations of standard algorithms, and simply provides an API wrapper to standardize and allow multiple methods. Here's a notebook: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy_causal_discovery_example.ipynb

As you can see, this is very basic (and still the different methods do not agree). Would you like to try it out and see how we can extend it?

from dowhy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.