Git Product home page Git Product logo

mrosol / nonlincausality Goto Github PK

View Code? Open in Web Editor NEW
70.0 3.0 12.0 218 KB

Python package for Granger causality test with nonlinear forecasting methods.

License: MIT License

Python 100.00%
causality-analysis granger-causality nonlinear-models python causality causality-test nonlinear-causality prediction time-series forecasting neural-networks recurrent-neural-networks multilayer-perceptron-network lstm-neural-networks gru-neural-networks

nonlincausality's Introduction

nonlincausality

Python package for Granger causality test with nonlinear forecasting methods.

The traditional Granger causality test, which uses linear regression for prediction, may not capture more complex causality relations. This package enables the utilization of nonlinear forecasting methods for prediction, offering an alternative to the linear regression approach found in traditional Granger causality.

For each tested lag, this function creates two models: the first forecasts the present value of X based on the n=current lag past values of X, and the second forecasts the same value based on n=current lag past values of both X and Y time series. If the prediction error of the second model is statistically significantly smaller than that of the first model, it indicates that Y Granger-causes X (Y➔X). The comparison of errors is performed using the Wilcoxon signed-rank test.

The package supports the use of neural networks (MLP, GRU, and LSTM as presented in the paper), Scikit-learn, and ARIMAX models for causality analysis (both bivariate and conditional). Another innovative feature of the package is the ability to study changes in causality over time for a given time window (w1) with a step (w2). The measure of the change in causality over time is expressed by the following equation:

Equation 1

Author

Maciej Rosoł [email protected], [email protected]
Warsaw University of Technology

Reference

Maciej Rosoł, Marcel Młyńczak, Gerard Cybulski
Granger causality test with nonlinear neural-network-based methods: Python package and simulation study.
Computer Methods and Programs in Biomedicine, Volume 216, 2022
https://doi.org/10.1016/j.cmpb.2022.106669

Example usage

Assume that there are two signals X and Y, which are stored in the variable data, where X is in the first column and Y in the second. The variable data has been split into data_train (first 60% of the data), data_val (the next 20% of the data), and data_test (last 20% of the data). To test the presence of causality Y➔X for the given lag values (defined as a list e.g. [50, 150]) the following functions can be used (note that all arguments are examples and may vary depending on the data).

NN

Using nonlincausalityNN, all types of neural networks presented in the paper can be utilized (GRU, LSTM, MLP). Below is an example for MLP:

results = nlc.nonlincausalityNN(
    x=data_train,
    maxlag=lags,
    NN_config=['d','dr','d','dr'],
    NN_neurons=[100,0.05,100,0.05],
    x_test=data_test,
    run=1,
    epochs_num=[50, 50],
    learning_rate=[0.0001, 0.00001],
    batch_size_num=32,
    x_val=data_val,
    reg_alpha=None,
    callbacks=None,
    verbose=True,
    plot=True,
)

Sklearn

Using nonlincausality_sklearn, any Scikit-learn model can be utilized with hyperparameter optimization applied (based on mean squared error minimization). Below is an example for SVR::

from sklearn.svm import SVR

parametres = {
    'kernel':['poly', 'rbf'],
    'C':[0.01,0.1,1], 
    'epsilon':[0.01,0.1,1.]
}

results_skl = nlc.nonlincausality_sklearn(    
    x=data_train,
    sklearn_model=SVR,
    maxlag=lags,
    params=parametres,
    x_test=data_test,
    x_val=data_val,
    plot=True)

ARIMA

results_ARIMA = nonlincausalityARIMA(x=data_train, maxlag=lags, x_test=data_train)

Change of causality over time

For a deeper understanding of the dependency between the signals, the change of causality over time might be studied using the above-mentioned functions. The example usage for MLP:

results = nlc.nonlincausalitymeasureNN(
    x=data_train,
    maxlag=lags,
    window=100,
    step=1,
    NN_config=['d','dr','d','dr'],
    NN_neurons=[100,0.05,100,0.05],
    x_test=data_test_measure,
    run=3,
    epochs_num=[50,50],
    learning_rate=[0.0001, 0.00001],
    batch_size_num=32,
    x_val=data_val,
    verbose=True,
    plot=True,
)

Conditional causality

nonlincausality package also allows to study conditional causality (with signal Z).

results_conditional = nlc.nonlincausalityNN(
    x=data_train,
    maxlag=lags,
    NN_config=['d','dr','d','dr'],
    NN_neurons=[100,0.05,100,0.05],
    x_test=data_test,
    run=1,
    z=z_train,
    z_test=z_test,
    epochs_num=[50, 50],
    learning_rate=[0.0001, 0.00001],
    batch_size_num=32,
    x_val=data_val,
    z_val=z_val,
    reg_alpha=None,
    callbacks=None,
    verbose=True,
    plot=True,
)

Release Note

2.0.0 - All types of neural networks (GRU, LSTM, MLP) addressed by nonlincausalityNN (depreciation of nonlincausalityGRU, nonlincausalityLSTM and nonlincausalityMLP). Added Scikit-learn models utilization as a kernel for causal analysis.

nonlincausality's People

Contributors

mrosol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nonlincausality's Issues

nonLincausality is not running on CPU machine

Hello mrosol,

I am trying to run the following code on the CPU machine. but getting the error pasted below the code. Please help to resolve this error. I am using python 3.7 and latest version of nonLincausality.

import nonlincausality as nlc
import numpy as np
lags = [10,20,30]
n_obs = 53
result_of_non_causality=[]
for col in columns_candidate_for_x:
initial_list=[col]
print(f'non-causality for {initial_list}')
initial_list.append('Nifty_Price')
col_nifty_df=sentiment_all_exch_rate_all_indices_normal_Df[initial_list]
df_train, df_test = col_nifty_df[0:-n_obs], col_nifty_df[-n_obs:]
results = nlc.nonlincausalityGRU(x=np.array(df_train), maxlag=lags, GRU_layers=2, GRU_neurons=[25,25], Dense_layers=2, Dense_neurons=[100, 100], x_test=np.array(df_test), run=2, add_Dropout=True, Dropout_rate=0.01, epochs_num=[100], learning_rate=[0.001], batch_size_num=128, verbose=False, plot=False)
for lag in lags:
single_record={}
single_record['X_value']=col

p_value = results[lag].p_value
test_statistic = results[lag].test_statistic

best_errors_X = results[lag].best_errors_X
best_errors_XY = results[lag].best_errors_XY
cohens_d = np.abs(
    (np.mean(np.abs(best_errors_X)) - np.mean(np.abs(best_errors_XY)))
    / np.std([best_errors_X, best_errors_XY])
)
single_record['lag']=lag
single_record['test_statistic']=test_statistic
single_record['p_value']=p_value
single_record['cohens_d']=cohens_d
result_of_non_causality.append(single_record)

del results


KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_22704\273327593.py in
10 col_nifty_df=sentiment_all_exch_rate_all_indices_normal_Df[initial_list]
11 df_train, df_test = col_nifty_df[0:-n_obs], col_nifty_df[-n_obs:]
---> 12 results = nlc.nonlincausalityGRU(x=np.array(df_train), maxlag=lags, GRU_layers=2, GRU_neurons=[25,25], Dense_layers=2, Dense_neurons=[100, 100], x_test=np.array(df_test), run=2, add_Dropout=True, Dropout_rate=0.01, epochs_num=[100], learning_rate=[0.001], batch_size_num=128, verbose=False, plot=False)
13 for lag in lags:
14 single_record={}

~\Anaconda3\envs\thesispy37\lib\site-packages\nonlincausality\nonlincausality.py in nonlincausalityGRU(x, maxlag, GRU_layers, GRU_neurons, Dense_layers, Dense_neurons, x_test, run, z, z_test, add_Dropout, Dropout_rate, epochs_num, learning_rate, batch_size_num, regularization, reg_alpha, callbacks, verbose, plot)
713 input_layer = Input((data_shape[1], data_shape[2]))
714
--> 715 layers_dense = Dense(Dense_neurons[0], activation="relu")(input_layer)
716 # Adding Dropout
717 if add_Dropout:

~\Anaconda3\envs\thesispy37\lib\site-packages\nonlincausality\nonlincausality.py in run_nonlincausality(network_architecture, x, maxlag, Network_layers, Network_neurons, Dense_layers, Dense_neurons, x_test, run, z, z_test, add_Dropout, Dropout_rate, epochs_num, learning_rate, batch_size_num, regularization, reg_alpha, callbacks, verbose, plot, functin_type)
258 # Appending RSS, models, history of training and prediction errors to results object
259 result_lag.append_results(
--> 260 sum(error_X ** 2),
261 sum(error_XY ** 2),
262 model_X,

~\Anaconda3\envs\thesispy37\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

~\Anaconda3\envs\thesispy37\lib\site-packages\keras\engine\training.py in tf__train_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False

KeyError: in user code:

File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\engine\training.py", line 1249, in train_function  *
    return step_function(self, iterator)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\engine\training.py", line 1233, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\engine\training.py", line 1222, in run_step  **
    outputs = model.train_step(data)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\engine\training.py", line 1027, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 527, in minimize
    self.apply_gradients(grads_and_vars)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 1140, in apply_gradients
    return super().apply_gradients(grads_and_vars, name=name)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 634, in apply_gradients
    iteration = self._internal_apply_gradients(grads_and_vars)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 1169, in _internal_apply_gradients
    grads_and_vars,
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 1217, in _distributed_apply_gradients_fn
    var, apply_grad_to_update_var, args=(grad,), group=False
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 1213, in apply_grad_to_update_var  **
    return self._update_step(grad, var)
File "C:\Users\00006262\Anaconda3\envs\thesispy37\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 217, in _update_step
    f"The optimizer cannot recognize variable {variable.name}. "

KeyError: 'The optimizer cannot recognize variable gru_8/gru_cell_8/kernel:0. This usually means you are trying to call the optimizer to update different parts of the model separately. Please call `optimizer.build(variables)` with the full list of trainable variables before the training loop or use legacy optimizer `tf.keras.optimizers.legacy.{self.__class__.__name__}.'

Error running example.py

Hi, thanks for open sourcing this. I am trying to understand how to use your code by running the example.py but getting an error. Seems something is broken in the function definitions/usage. Please check.

TypeError Traceback (most recent call last)
Cell In [68], line 41
38 data_train = data[:7000, :]
39 data_test = data[7000:, :]
---> 41 results = nlc.nonlincausalityMLP(
42 x=data_train,
43 maxlag=lags,
44 Dense_layers=2,
45 Dense_neurons=[100, 100],
46 x_test=data_test,
47 run=1,
48 add_Dropout=True,
49 Dropout_rate=0.01,
50 epochs_num=[50, 100],
51 learning_rate=[0.001, 0.0001],
52 batch_size_num=128,
53 verbose=True,
54 plot=True,
55 )
57 #%% Example of obtaining the results
58 for lag in lags:

File ~/anaconda3/envs/nlctest/lib/python3.8/site-packages/nonlincausality/nonlincausality.py:817, in nonlincausalityMLP(x, maxlag, Dense_layers, Dense_neurons, x_test, run, z, z_test, add_Dropout, Dropout_rate, epochs_num, learning_rate, batch_size_num, verbose, plot)
743 def nonlincausalityMLP(
744 x,
745 maxlag,
(...)
758 plot=False,
759 ):
760 """
761 Parameters
762 ----------
(...)
815
816 """
--> 817 results = run_nonlincausality(
818 MLP_architecture,
819 x,
820 maxlag,
821 None,
822 None,
823 Dense_layers,
824 Dense_neurons,
825 x_test,
826 run,
827 z,
828 z_test,
829 add_Dropout,
830 Dropout_rate,
831 epochs_num,
832 learning_rate,
833 batch_size_num,
834 verbose,
835 plot,
836 "MLP",
837 )
839 return results

TypeError: run_nonlincausality() missing 3 required positional arguments: 'verbose', 'plot', and 'functin_type'

`run_nonlincausality` Arguments not filled

verbose,
plot,
"LSTM",

The three argument is not aligned to the definition of run_nonlincausality and will cause the following issue:

TypeError: run_nonlincausality() missing 3 required positional arguments: 'verbose', 'plot', and 'functin_type'

Suggestion:

verbose=verbose,
plot=plot,
function_type='LSTM'

The same thing should be fixed in the definitions of other neural network types.

GRBF functionality

In your paper, you showed results for AR and GRBF NN models.
I was wondering if the GRBF architecture is included in this library? Based on NN_architecture function, it seems like only LSTMs GRU or standard MLP (dense with ReLU) are available.

Coefficients

Hi!

Thank you for sharing the code, this is really impressive.
Is it possible to extract or look into the assumed coefficients for each lag used within fitting the model?

Best regards

TypeError: missing positional arguments

Hi Maciej,

thanks for interesting package. There is an error when any of example usage is run:

TypeError: run_nonlincausality() missing 3 required positional arguments: 'verbose', 'plot', and 'functin_type'

compouned causality

Hi,

Base on the example, I want to add Z into X's generation, making Y & Z causing X at the same time.

How do I use more than 2 columns (X,Y,Z) to predict X, testing whether Y & Z are the compouned causality of X?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.