Causal Inference and Discovery in Python by Packt Publishing

License: MIT License

Jupyter Notebook 99.83% Python 0.17%

causal-inference-and-discovery-in-python's Introduction

Causal Inference and Discovery in Python

This is the code repository for Causal Inference and Discovery in Python, published by Packt.

Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more

What is this book about?

Causal methods present unique challenges compared to traditional machine learning and statistics. Learning causality can be challenging, but it offers distinct advantages that elude a purely statistical mindset. Causal Inference and Discovery in Python helps you unlock the potential of causality.

You’ll start with basic motivations behind causal thinking and a comprehensive introduction to Pearlian causal concepts, such as structural causal models, interventions, counterfactuals, and more. Each concept is accompanied by a theoretical explanation and a set of practical exercises with Python code.

Next, you’ll dive into the world of causal effect estimation, consistently progressing towards modern machine learning methods. Step-by-step, you’ll discover Python causal ecosystem and harness the power of cutting-edge algorithms. You’ll further explore the mechanics of how “causes leave traces” and compare the main families of causal discovery algorithms.

The final chapter gives you a broad outlook into the future of causal AI where we examine challenges and opportunities and provide you with a comprehensive list of resources to learn more.

This book covers the following exciting features:

Master the fundamental concepts of causal inference
Decipher the mysteries of structural causal models
Unleash the power of the 4-step causal inference process in Python
Explore advanced uplift modeling techniques
Unlock the secrets of modern causal discovery using Python
Use causal inference for social impact and community benefit

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

preds = causal_bert.inference(
    texts=df['text'],
    confounds=df['has_photo'],
)[0]

Following is what you need for this book:

This book is for machine learning engineers, data scientists, and machine learning researchers looking to extend their data science toolkit and explore causal machine learning. It will also help developers familiar with causality who have worked in another technology and want to switch to Python, and data scientists with a history of working with traditional causality who want to learn causal machine learning. It’s also a must-read for tech-savvy entrepreneurs looking to build a competitive edge for their products and go beyond the limitations of traditional machine learning.

With the following software and hardware list you can run all code files present in the book (Chapter 1-15).

Software and Hardware List

Chapter	Software required	OS required
1-15	Python 3.9	Windows macOS, or Linux
1-15	DoWhy 0.8	Windows, macOS, or Linux
1-15	EconML 0.12.0	Windows, macOS, or Linux
1-15	CATENets 0.2.3	Windows, macOS, or Linux
1-15	gCastle 1.0.3	Windows, macOS, or Linux
1-15	Causica 0.2.0	Windows, macOS, or Linux
1-15	Causal-learn 0.1.3.3	Windows, macOS, or Linux
1-15	Transformers 4.24.0	Windows, macOS, or Linux

Join our Discord server

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at Discord

Get to Know the Author

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve. He's the host of the Causal AI-centered Causal Bandits Podcast.

Note from the Author:

Environment installation

See the section Using graphviz and GPU below
To install the basic environment run: conda env create -f causal_book_py39_cuda117.yml
To install the environment for notebook Chapter_11.2.ipynb run: conda create -f causal-pymc.yml

Selecting the kernel

After a successful installation of the environment, open your notebook and select the kernel causal_book_py39_cuda117

For notebook Chapter_11.2.ipynb change kernel to causal-pymc

Using `graphviz` and GPU

Note: Depending on your system settings, you might need to install graphviz manually in order to recreate the graph plots in the code. Check https://pypi.org/project/graphviz/ for instructions specific to your operating system.

Note 2: To use GPU you'll need to install CUDA 11.7 drivers. This can be done here: https://developer.nvidia.com/cuda-11-7-0-download-archive

Citation

BibTeX

@book{Molak2023,
    title={Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more},
    author={Molak, Aleksander},
    publisher={Packt Publishing},
    address={Birmingham},
    edition={1.},
    year={2023},
    isbn={1804612987},
    note={\url{https://amzn.to/3RebWzn}}
}

APA

Molak, A. (2023). Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more. Packt Publishing.

‼️ Known mistakes // errata

For known errors and corrections check:

If you spotted a mistake, let us know at book(at)causalpython.io or just open an issue in this repo. Thank you 🙏🏼

causal-inference-and-discovery-in-python's People

Contributors

Stargazers

Watchers

Forkers

python-repository-hub eastrain517 mattburnham chenweichen glapierr darciogm xiemeigongzi javiervicho nashquant mateuscichelero bestcourses-ai ccaballeroh michaelallen1966 andri301 snowdj georgi-petkov vinothmdev fitzhugor bakht-zaman seanby arturo-kaxanuk danielzaretsky codeaudit animesh fscipioni avudzor wuzhipeng2014 avsolatorio wesleyz anhngv102 duyamin saibaldasprivate anhnguyendepocen pauljw28 ri-moura danniuiuc etusien jcamacaro profsingletary omarashkar espron enriquemascote sunshineluyao mirthir galvin-mj lssantos dineshdyne diegoascanio eodenyire vincent-wq kishorkukreja sarkaft chakrs dabblingfrancis saurabhr cyoungyoung luoylin restevesd jh2737 garima1221 teuffy wangkun543604 dulakshiv nstepka rajaramkuberan chaoliu-kellogg alainlompo berengereg sayanddude gregorycrane surajrepo pr-124 pudja2001 nboitout cuma-yigit waykole davidkim0523 sdumza ymazari hendrikvandoorn edithso thbland iamnagesh fdoperezi techthiyanes hbcbh1999 nataliarosa9 huawen-poppy teja-p kccheng1988 buriburizymon9 vidyasagarbhargava mekongdelta-mind zqcsrz yamassindir jinghuayao nisar-1234 m-rath yuzhangsjtu ericagyemang

causal-inference-and-discovery-in-python's Issues

[Chapter9] Causal_estimator.effect for prediction

Hello, I am Jake Lee from Korea, a passionate reader of your book

I have found that couple of codes not working for prediction part in Ch09.
Your code flow as following

Instantiate Causal model
Estimand
Estimate
Predict test data using .causal_estimator.effect

However #4 is not working from my side (description said there is no object of causal_estimator)
It would be appreciate if you give me help on it, especially in case that code is running in latest DoWhy version (11.0)

Thanks in advance!
Jake

Chapter 7 notebook array shape error messages

Possibly related to the numba deprecation warning , the following code spits out an array shape error, which then propogates errors in the remaining cells in the chapter 7 notebook. I tried upgrading shap to 0.42.0 which resolved some but not all of the errors.

`estimate = model.estimate_effect(
identified_estimand=estimand,
method_name='backdoor.econml.dml.DML',
method_params={
'init_params': {
'model_y': GradientBoostingRegressor(),
'model_t': GradientBoostingRegressor(),
'model_final': LassoCV(fit_intercept=False),
},
'fit_params': {}}
)

print(f'Estimate of causal effect (DML): {estimate.value}')
`

A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().

Environment for M1 Silicon

Hi Alex,

Please find below an environment file that successfully runs GPU related codes in Chapters 11.1 and 14 (as .txt as it won't accept yml - should just be able to change the extension back)
causal_book_py39_for_m1.txt. The changes from the yml provided in your repo are:

remove - nvidia from channels; remove - pytorch and -pytorch-cuda=11.7 from dependencies
add - notebook=6.5 to dependencies

Then replace the set device cell with

# Set device
device = "mps" if torch.backends.mps.is_available() else "cpu"

I still then had to pip install CausalPy once the env was activated.

The full yml as exported by conda is
causal_book_py39_applem1.txt

Notes:

This has only been tested to run on notebooks 11.1 and 14 but I did not closely monitor whether the results were the same. I'm only assuming at this point it should run fine on the other chapters
In notebook 14, "Expert knowledge" section, in the cell after the one with augmented Lagrangian loss objects (first line assert len(dataset_train.batch_size) == 1, "Only 1D batch size is supported"), an errors occurs with message "NotImplementedError: The operator 'aten::triu_indices' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS."

Dowhy CausalModel does not have 'causal_estimator' attribute

Both chapter 9 and chapter 10 notebooks have code like effect_pred = model.causal_estimator.effect().
I got an error running them: AttributeError: 'CausalModel' object has no attribute 'causal_estimator'

The book states that it uses DoWhy 0.8 but I am currently using DoWhy 0.10.1 (just want to keep my learning experience up-to-date) but I cannot determine if that's the cause of it. If it is, then how to implement the model on test dataset with current version of DoWhy? If not, then what have I missed?

Thanks!

Small errata for p27 of book (post-June 2023)

Loving that these resources are on GitHub - thank you @AlxndrMlk!

Quick notational suggestion: on p27, the line $X_{sample} = 1.9 < X < -1.9$ doesn't make sense at the moment.
The right-side reads as a boolean, and the condition can never be satisfied, as written.

Would be clearer what's meant if it was "...sampled according to the condition $X < - 1.9$ or $X > 1.9$".

packtpublishing / causal-inference-and-discovery-in-python Goto Github PK