pyfts / notebooks Goto Github PK

Code examples for pyFTS

Jupyter Notebook 100.00%

data-science forecasting fuzzy-logic fuzzy-time-series pyfts time-series time-series-analysis

notebooks's Introduction

pyFTS - Fuzzy Time Series for Python

What is pyFTS Library?

This package is intended for students, researchers, data scientists or whose want to exploit the Fuzzy Time Series methods. These methods provide simple, easy to use, computationally cheap and human-readable models, suitable for statistic laymans to experts.

This project is continously under improvement and contributors are well come.

How to reference pyFTS?

Silva, P. C. L. et al. pyFTS: Fuzzy Time Series for Python. Belo Horizonte. 2018. DOI: 10.5281/zenodo.597359. Url: http://doi.org/10.5281/zenodo.597359

How to install pyFTS?

First of all pyFTS was developed and tested with Python 3.6. To install pyFTS using pip tool

pip install -U pyFTS

Ou pull directly from the GitHub repo:

pip install -U git+https://github.com/PYFTS/pyFTS

What are Fuzzy Time Series (FTS)?

Fuzzy Time Series (FTS) are non parametric methods for time series forecasting based on Fuzzy Theory. The original method was proposed by [1] and improved later by many researchers. The general approach of the FTS methods, based on [2] is listed below:

Data preprocessing: Data transformation functions contained at pyFTS.common.Transformations, like differentiation, Box-Cox, scaling and normalization.
Universe of Discourse Partitioning: This is the most important step. Here, the range of values of the numerical time series Y(t) will be splited in overlapped intervals and for each interval will be created a Fuzzy Set. This step is performed by pyFTS.partition module and its classes (for instance GridPartitioner, EntropyPartitioner, etc). The main parameters are:

the number of intervals
which fuzzy membership function (on pyFTS.common.Membership)
partition scheme (GridPartitioner, EntropyPartitioner[3], FCMPartitioner, CMeansPartitioner, HuarngPartitioner[4])

Check out the jupyter notebook on notebooks/Partitioners.ipynb for sample codes.

Data Fuzzyfication: Each data point of the numerical time series Y(t) will be translated to a fuzzy representation (usually one or more fuzzy sets), and then a fuzzy time series F(t) is created.
Generation of Fuzzy Rules: In this step the temporal transition rules are created. These rules depends on the method and their characteristics:

order: the number of time lags used on forecasting
weights: the weighted models introduce weights on fuzzy rules for smoothing [5],[6],[7]
seasonality: seasonality models depends [8]
steps ahead: the number of steps ahed to predict. Almost all standard methods are based on one-step-ahead forecasting
forecasting type: Almost all standard methods are point-based, but pyFTS also provides intervalar and probabilistic forecasting methods.

Forecasting: The forecasting step takes a sample (with minimum length equal to the model's order) and generate a fuzzy outputs (fuzzy set(s)) for the next time ahead.
Defuzzyfication: This step transform the fuzzy forecast into a real number.
Data postprocessing: The inverse operations of step 1.

Usage examples

There is nothing better than good code examples to start. Then check out the demo Jupyter Notebooks of the implemented method os pyFTS!.

A Google Colab example can also be found here.

MINDS - Machine Intelligence And Data Science Lab

This tool is result of collective effort of MINDS Lab, headed by Prof. Frederico Gadelha Guimaraes. Some of research on FTS which was developed under pyFTS:

2020
- ORANG, Omid; Solar Energy Forecasting With Fuzzy Time Series Using High-Order Fuzzy Cognitive Maps. IEEE World Congress On Computational Intelligence 2020 (WCCI).
- ALYOUSIFI, Y; FAYE, Othman M; SOKKALINGAM, I; SILVA, P. Markov Weighted Fuzzy Time-Series Model Based on an Optimum Partition Method for Forecasting Air Pollution. International Journal of Fuzzy Systems, 2020. http://doi.org/10.1007/s40815-020-00841-w
- SILVA, Petrônio CL et al. Forecasting in Non-stationary Environments with Fuzzy Time Series. https://arxiv.org/abs/2004.12554
- SILVA, Petrônio CL et al. Distributed Evolutionary Hyperparameter Optimization for Fuzzy Time Series. IEEE Transactions on Network and Service Management, 2020. http://doi.org/10.1109/TNSM.2020.2980289
- ALYOUSIFI, Yousif et al. Predicting Daily Air Pollution Index Based on Fuzzy Time Series Markov Chain Model. Symmetry, v. 12, n. 2, p. 293, 2020. http://doi.org/10.3390/sym12020293
2019
- SILVA, Petrônio C. L. Scalable Models of Fuzzy Time Series for Probabilistic Forecasting. PhD Thesis. https://doi.org/10.5281/zenodo.3374641
- SADAEI, Hossein J. et al. Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy, v. 175, p. 365-377, 2019. http://doi.org/10.1016/j.energy.2019.03.081
- SILVA, Petrônio CL et al. Probabilistic forecasting with fuzzy time series. IEEE Transactions on Fuzzy Systems, 2019. http://doi.org/10.1109/TFUZZ.2019.2922152
- SILVA, Petrônio C. L.; LUCAS, Patrícia de O.; GUIMARÃES, Frederico Gadelha. A Distributed Algorithm for Scalable Fuzzy Time Series. In: International Conference on Green, Pervasive, and Cloud Computing. Springer, Cham, 2019. p. 42-56. http://doi.org/10.1007/978-3-030-19223-5_4
- SILVA, Petrônio Cândido de Lima et al. A New Granular Approach for Multivariate Forecasting. In: Latin American Workshop on Computational Neuroscience. Springer, Cham, 2019. p. 41-58. http://doi.org/10.1007/978-3-030-36636-0_4
- ALVES, Marcos Antonio et al. Otimizaçao Dinâmica Evolucionária para Despacho de Energia em uma Microrrede usando Veıculos Elétricos. Em: Anais do 14º Simpósio Brasileiro de Automação Inteligente. Campinas : GALOÁ. 2019. http://doi.org/10.17648/sbai-2019-111524
- LUCAS, Patrícia de O.; SILVA, Petrônio C. L.; GUIMARAES, Frederico G. Otimização Evolutiva de Hiperparâmetros para Modelos de Séries Temporais Nebulosas.Em: Anais do 14º Simpósio Brasileiro de Automação Inteligente. Campinas : GALOÁ. 2019. http://doi.org/10.17648/sbai-2019-111141
2018
- ALVES, Marcos Antônio et al. An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series. In: ESANN. 2018.
2017
- SEVERIANO, Carlos A. et al. Very short-term solar forecasting using fuzzy time series. In: 2017 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, 2017. p. 1-6. http://doi.org/10.1109/FUZZ-IEEE.2017.8015732
- SILVA, Petrônio C. L.; et al. Probabilistic forecasting with seasonal ensemble fuzzy time-series. In: XIII Brazilian Congress on Computational Intelligence, Rio de Janeiro. 2017. http://doi.org/10.21528/CBIC2017-54
- COSTA, Francirley R. B.; SILVA, Petrônio C. L.; GUIMARAES, Frederico G. REGRESSÃO LINEAR APLICADA NA PREDIÇÃO DE SERIES TEMPORAIS FUZZY. Simpósio Brasileiro de Automação Inteligente (SBAI), 2017.
2016
- SILVA, Petrônio C. L.; SADAEI, Hossein Javedani; GUIMARAES, Frederico G. Interval forecasting with fuzzy time series. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2016. p. 1-8. http://doi.org/10.1109/SSCI.2016.7850010

References

Q. Song and B. S. Chissom, “Fuzzy time series and its models,” Fuzzy Sets Syst., vol. 54, no. 3, pp. 269–277, 1993.
S.-M. Chen, “Forecasting enrollments based on fuzzy time series,” Fuzzy Sets Syst., vol. 81, no. 3, pp. 311–319, 1996.
C. H. Cheng, R. J. Chang, and C. A. Yeh, “Entropy-based and trapezoidal fuzzification-based fuzzy time series approach for forecasting IT project cost”. Technol. Forecast. Social Change, vol. 73, no. 5, pp. 524–542, Jun. 2006.
K. H. Huarng, “Effective lengths of intervals to improve forecasting in fuzzy time series”. Fuzzy Sets Syst., vol. 123, no. 3, pp. 387–394, Nov. 2001.
H.-K. Yu, “Weighted fuzzy time series models for TAIEX forecasting”. Phys. A Stat. Mech. its Appl., vol. 349, no. 3, pp. 609–624, 2005.
R. Efendi, Z. Ismail, and M. M. Deris, “Improved weight Fuzzy Time Series as used in the exchange rates forecasting of US Dollar to Ringgit Malaysia,” Int. J. Comput. Intell. Appl., vol. 12, no. 1, p. 1350005, 2013.
H. J. Sadaei, R. Enayatifar, A. H. Abdullah, and A. Gani, “Short-term load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search,” Int. J. Electr. Power Energy Syst., vol. 62, no. from 2005, pp. 118–129, 2014.
C.-H. Cheng, Y.-S. Chen, and Y.-L. Wu, “Forecasting innovation diffusion of products using trend-weighted fuzzy time-series model,” Expert Syst. Appl., vol. 36, no. 2, pp. 1826–1832, 2009.

notebooks's People

Contributors

Stargazers

Watchers

notebooks's Issues

Forecasting real future

Hi Petronio.
Thank you for your quick response to my problem.
I'm coming again to ask about predictions in real future. In you predict method you need a dataset. What if I want to forecast "n" days ahead from today where there is no data. What dataset will I use in predict method when I use all data (until today) for training (it does not matter any test set).
Let's assume we have the model below:

fs = Grid.GridPartitioner(data=all_data, npart=190)
model = hofts.HighOrderFTS(partitioner=fs, order = 4, alpha_cut=.4)
model.fit(all_data.values.flatten())

Let's say I keep the last 4 points from "all_data" to accomodate the order of the model.How can I call model.predict method to forecast future days (until end of August)?

I forgot to write that when I use steps_ahead = n I get n times the same number. If you ask about my dataset, it is public and it is the close price of Bitcoin

Best regards
George

Need advice for modeling pipeline and hyper optimization

hi @petroniocandido
Following my comment in #1. I have 100 series that need one step ahead forecast. for your information, I have other series that can be used as external information. short description about my data

each series has different behavior. some of them are stationary, non-stationary, some show strong seasonality, and some show drifting (sudden value jumping). to tackle this complexity, I am thinking to build one model for each series (what do you think?)
data is monthly data where some series are complete and some series have missing values
data range is very huge. it can be between negative millions and positive millions

I plan to try different approach both univariate (without any external information) and multivariate (with external information).
question apply for both scenarios :

I see there are a lot of FTS model implementation. in order to build robust automatic modeling pipeline for my problem, would you mind mapping what FTS model that work best for what kind of problem or type of series ? is there any model that can be used for any type of series but what I need is to get best hyperparameter setting ?
would you mind summarizing which model that work for 1 order and which one that can work with more than 1 order ?
does the implementation require complete data (no missing values) ? if yes, I am thinking to input the missing values but I build another series indicating whether the row is missing or not. what do you think ? is it the good practice ?
I see there is hyperparam search modul but I couldn't find examples to exercise. do you have examples of it ? my experiments on using FTS for my data showing that type of transformation, order, model impact much to the result. different options give very different result. So, I think, automatic search hyperparam would be very helpful. However, I haven't tried a different partitioner.

question apply for multivariate scenario :

how does the multivariate approach work here ? does the other series that act as external information can be used as future information (use t+1 external information to predict target y t+1) ?
if I can't use future value of external informations (use t + 0 as the latest external information to predict target y t+1). does the implementation support it ?

please advice,

thank you

About the issue raised earlier

Hi! What are the results of the discussion of the issue raised on 12.07.2019 by georgevarelas - the sequel did not follow. And the question is: can the duration be considered normal in 4 hours of processing sets in this Notepad?
Sincerely, Eugene.

will it work for multivariate time series

great code thanks
may you clarify :
will it work for multivariate time series prediction both regression and classification
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color        weight     gender  height  age

1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32

Exception on Chen - ConventionalFTS.ipynb

I downloaded the notebook and changed cell 9,( after section 1.4 Partitioning) to use the local node (not use distributed setup) and I got exceptions

from pyFTS.partitioners import Grid, Util as pUtil
from pyFTS.benchmarks import benchmarks as bchmk
from pyFTS.models import chen

tag = 'chen_partitioning'
_type = 'point'

for dataset_name in dataset_names:
    dataset = get_dataset(dataset_name)

    bchmk.sliding_window_benchmarks(dataset, 1000, train=0.8, inc=0.2,
                                    methods=[chen.ConventionalFTS],
                                    benchmark_models=False,
                                    transformations=[None],
                                    partitions=np.arange(10,100,2), 
                                    progress=False, type=_type,
                                    distributed=False,# nodes=['192.168.0.110', '192.168.0.107','192.168.0.106'],
                                    file="benchmarks.db", dataset=dataset_name, tag=tag)

    bchmk.sliding_window_benchmarks(dataset, 1000, train=0.8, inc=0.2,
                                    methods=[chen.ConventionalFTS],
                                    benchmark_models=False,
                                    transformations=[tdiff],
                                    partitions=np.arange(3,30,1), 
                                    progress=False, type=_type,
                                    distributed=False,# nodes=['192.168.0.110', '192.168.0.107', '192.168.0.106'],
                                    file="benchmarks.db", dataset=dataset_name, tag=tag)

EXCEPTION! CFTS 1 Grid 27 Differential(1)
Traceback (most recent call last):
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/benchmarks/benchmarks.py", line 281, in sliding_window_benchmarks
job = experiment_method(deepcopy(model), deepcopy(partitioner), train, test, **kwargs)
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/benchmarks/benchmarks.py", line 367, in run_point
mfts.fit(train_data, **kwargs)
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/common/fts.py", line 384, in fit
self.train(mdata, **kwargs)
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/models/chen.py", line 53, in train
tmpdata = self.partitioner.fuzzyfy(data, method='maximum', mode='sets')
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/partitioners/partitioner.py", line 144, in fuzzyfy
mv = self.fuzzyfy(inst, **kwargs)
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/partitioners/partitioner.py", line 157, in fuzzyfy
tmp = self[ix].membership(data)
File "/home/steve/anaconda3/lib/python3.7/site-packages/pyFTS/partitioners/partitioner.py", line 288, in getitem
raise ValueError("The fuzzy set index must be between 0 and {}.".format(self.partitions))
ValueError: The fuzzy set index must be between 0 and 27.

upper and lower bounds of universe of discourse

hello, i want ask you something. You use the upper and lower bounds for universe of discourse by 10% as a default, but i want try different value. how to change it from 10% to another values?

Issue Regarding Multiple Step Forecasting

Hi Petrônio,

I am currently working on a real-world problem which is to forecast the next 24-hrs demand. However, when I try to run model.predict(input_data, steps_ahead=24), it only gives me constant values that are not reasonable.

I referred to the article you published and the graph shows the results look quite nice. I am keen to know how to achieve this multi-step forecasting. Are there any pre-processing steps needed for the input test data before predicting？

In addition, I noticed that some of the examples you gave as following:

ax[count].plot(dataset[train_split:train_split+200])
model1 = cUtil.load_obj('model1'+dataset_name+str(order))
forecasts = model1.predict(dataset[train_split:train_split+200])
ax[count].plot(forecasts)

From my understanding, if the input test data is from time t, the forecasts should be starting from time (t+1). We need to use the previous data to predict the next one. Then ax[count].plot(dataset[train_split:train_split+200]) should be changed to ax[count].plot(dataset[train_split+1:train_split+201]), right?

Looking forward to your reply.!
Thank you so much!