Git Product home page Git Product logo

nowcasting's Introduction

CRAN_Status_Badge downloads Build Status

nowcasting

An R Package for Forecasting Models with Real-Time Data.

The nowcasting package contains useful tools for using dynamic factor models. In this version of the package we present three methods, based on the articles of Giannone et al. 2008 and Bańbura et al. 2011. Furthermore, the package offers auxiliary functions to treat variables, constuct vintages, visualize results, etc.

The package is in development. Reviews, comments and pull requests are welcome.

Installation

github

devtools::install_github('nmecsys/nowcasting')

CRAN

install.packages('nowcasting')

How to use the nowcasting package

Two examples of how the nowcasting package can be used are discussed below. In the first example we use the dataset from Giannone et al. (2008), which uses a time series panel for the US economy. In the second example we show how to make forecasts based on pseudo real time vintages using Brazilian data and how to use information criteria for determining the number of factors and shocks to use in the model.

nowcasting US GDP 1/2

The dataset used in Giannone et al. (2008) can be added to the environment with the following command.

library(nowcasting)
data(USGDP)

USDGP is a list with two data frames:

  • USGDP$base: this is an unbalanced panel with the model´s variables;
  • USGDP$legend: this contains the legend with the relevant information for each variable in the USGDP$base dataframe.

In order to use the nowcast function we require stationary variables. This can be done by using Bpanel. This function creates a balanced panel using an unbalanced panel as input. The default option is to substitute missing observations and outliers using the outlier correction methodology from Giannone et al. (2008). The function includes most usual transformations to obtain stationary variables. For this particular example, the object USGDP$legend contains all the transformations used in Giannone et al. (2008). All the explanatory variables are monthly while the GDP observation is quarterly as captured by the frequency vector. Note however, that the data object is a monthly mts. It is usual to include quarterly variables as monthly time series where the two first months of each quarter appear as NA. In this example the quarterly value is repeated.

data <- Bpanel(base = USGDP$base,
              trans = USGDP$legend$Transformation,
              aggregate = FALSE)
              
frequency <- c(rep(12, ncol(data) -1), 4)

Once these variables have been treated, the nowcast function can be used to estimate the model´s parameters according to the selected estimation method, the number r of dynamic factors, the lag order of the factors p and the number q of shocks to the factors. For this example we use the Two-Stage - With aggregation method that will aggregate monthly factors as in Mariano and Murasawa (2003). The arguments r, p and q were defined according to Giannone et al. (2008).

nowcastUSGDP <- nowcast(formula = RGDPGR ~ ., data = data, r = 2, p = 2, q = 2, 
                    method = '2s_agg', frequency = frequency)

The in sample evaluation from Giannone et al. (2008) could be reproduced by looking at the ACF of the residuals of the model specified above.

res <- ts(nowcastUSGDP$reg$residuals, start = start(data), frequency = 4)
acf(window(res, start = c(1985,1), end = c(2004,4)))

The results can be accessed from the object nowcastUSGDP.

# y forecasts
tail(nowcastUSGDP$yfcst,8)

# the regression between y and its factors can be accessed using `$reg`.
summary(nowcastUSGDP$reg)

# the results related to the estimation of factors 
tail(nowcastUSGDP$factors$dynamic_factors) # factors
head(nowcastUSGDP$factors$Lambda) # Lambda matrix
nowcastUSGDP$factors$A # A matrix
nowcastUSGDP$factors$BB # BB': u's variance covariance matrix (factor equation)
diag(nowcastUSGDP$factors$Psi) # Psi: epsilon's variance covariance matrix (x equation)

# the forecasts of the explanatory variables are in `$xfcst`.
tail(nowcastUSGDP$xfcst[,1:5]) # x forecasts (first 5 variables)

The graphs available with the nowcast.plot function allow the to visualize some results of interest.

 # y fcst
nowcast.plot(nowcastUSGDP, type = "fcst")

# factors
nowcast.plot(nowcastUSGDP, type = "factors") 

 # how much of the variability in the dataset is explained by each factor 
nowcast.plot(nowcastUSGDP, type = "eigenvalues")

# importance of each variable in the first factor
nowcast.plot(nowcastUSGDP, type = "eigenvectors") 

nowcasting US GDP 2/2

In this example we work with the data the Federal Reserve of New York uses in its weekly nowcasting report. The explanatory variables are mixed frequencies including both monthly and quarterly series.

library(nowcasting)
data(NYFED)

Similarly to the previous working example, the object NYFED contains all the necessary information to run the nowcast function. The block structure, the transformations to make the variables stationary and the frequencies of the variables can be loaded as illustrated below.

base <- NYFED$base
blocks <- NYFED$blocks$blocks
trans <- NYFED$legend$Transformation
frequency <- NYFED$legend$Frequency

The data-set x can be prepared by using the function Bpanel. For the EM algorithm, we do not want to replace the missing values that are not part of the jagged edges as was the case with the Two-Stage method. This can be done by telling the function not to replace those particular missing values, i.e. NA.replace = F. We also do not want to discard series with many missing values and therefore use na.prop = 1.

x <- Bpanel(base = base, trans = trans, NA.replace = F, na.prop = 1)

The same setting as the NY FED is used. We therefore limit the number of factors, r, per block to one and define the factor process as a VAR(1). The algorithm displays the convergence of the loglikelihood function every 5 iterations. As opposed, to the Two-Stage method, the x represents the entire data-set and y represents the name of the variable that is being forecast.

nowEM <- nowcast(formula = GDPC1 ~ ., data = data, r = 1, p = 1, 
                  method = "EM", blocks = blocks, frequency = frequency)

The forecasts can be visualized using the function nowcast.plot as illustrated below.

nowcast.plot(nowEM)

Nowcasting Brazilian GDP using vintages

A vintage is a dataset observed on a specific date. The latter is useful to evaluate the out of sample performance of our model. The nowcasting package contains a dataset of Brazilian economic time series.

library(nowcasting)
data(BRGDP)

The PRTDB function is intended to construct pseudo real time vintages of any dataset. The function excludes observations from time series based on the lag information provided by the user and simulates what would be observed on the reference date. In this case we have construct a 10 year database ending in 2015-06-01.

vintage <- PRTDB(mts = BRGDP$base, delay = BRGDP$delay, vintage = "2015-06-01")
base <- window(vintage, start = c(2005,06), frequency = 12)
x <- Bpanel(base = base, trans = BRGDP$trans)

The variable to be forecast is then made stationary. We also use the month2qtr function to cast GDP as a quaterly variable.

GDP <- base[,which(colnames(base) == "PIB")]
y <- diff(diff(GDP,3),12)

Information criteria can be used in order to help determine the number of factors r and shocks to the factors q that the model should have.

ICR1 <- ICfactors(x = x, type = 1)
ICR2 <- ICfactors(x = x, type = 2)
ICQ1 <- ICshocks(x = x, r = 2, p = 2)

The user is now ready to forecast the variable of interest. The summary of the regression can be accessed as illustrated below.

data <- cbind(y,x)
colnames(data) <- c("y",colnames(x))
frequency <- c(4,rep(12,ncol(x)))
now <- nowcast(formula = y~., data = data, r = 2, q = 2 , p = 2, frequency = frequency)
summary(now$reg)

Finally the in- and out of sample forecasts for this particular vintage can be visualized using the nowcast.plot function.

nowcast.plot(now, type = "fcst")

nowcasting's People

Contributors

daianemarcolino avatar greedblink avatar guilbran avatar stjdevalk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nowcasting's Issues

Column order matters?

Not sure if this is a bug or my own ignorance, but I've noticed that the ordering of columns in the data seem to matter. Example:

library(nowcasting)
data(NYFED)

base <- NYFED$base
blocks <- NYFED$blocks$blocks
trans <- NYFED$legend$Transformation
frequency <- NYFED$legend$Frequency

#randomly generated reordering
shuffle <- c(7, 11, 24, 21, 10, 15, 2,1, 22, 25, 19, 5, 18, 17, 4, 12, 16, 6, 14, 23, 8, 9, 3, 13, 20)

base_shuffle <- base[,shuffle]
frequency_shuffle <- frequency[shuffle]
blocks_shuffle <- blocks[shuffle,]
trans_shuffle <- trans[shuffle]

x <- Bpanel(base = base, trans = trans, NA.replace = F, na.prop = 1)
x_shuffle <- Bpanel(base = base_shuffle, trans = trans_shuffle, NA.replace = F, na.prop = 1)

nowEM <- nowcast(formula = GDPC1 ~ ., data = x, r = 1, p = 1, 
                 method = "EM", blocks = blocks, frequency = frequency)
nowEM_shuffle <- nowcast(formula = GDPC1 ~ ., data = x_shuffle, r = 1, p = 1, 
                 method = "EM", blocks = blocks_shuffle, frequency = frequency_shuffle)

You can verify that nowEM$yfcst and nowEM_shuffle$yfcst are different. Why is this? Do you have any recommendations for a "correct" column order?

Nowcasting with the EM algorithm without imposing block restrictions on the factors.

Hello,

I want to start by infinitely thanking the contributors/builders of this package! It has greatly smoothed the process of my application. However, I stumbled across a problem and I really hope that someone can help me with it!

I want to nowcast my dataset using the EM algorithm and only 1 factor without using the block structure sub-variable restrictions.
I really hope that someone can guide me through how to tweak the nowcast function into the matter, or if it is possible for me to achieve this through this package.

Thank you!
Best,
Carlos.

Back to CRAN?

Thanks for the package!

Do you intend to bring this back to CRAN? The dependency 'matlab' is back on CRAN, so it should be easy enough?

Since you wrote a nice article for the r-Journal, it would be fantastic to have it back.

remNANs_spline() should not fail when rows have less than 80% of NANs

The following code fails when rowSums(indNAN) is always <= N*0.8 because then nanLead is empty and therefore X[-nanLE,] contains no rows and t1 and t2 will be infinite and the function stats::spline() will fail.

Instead the code should work in all cases including when there is no rows full of NaNs.
I don't understand why there is this 80% threshold.

}else if(options$method == 2){ # replace missing values after removing leading and closing zeros
   
   rem1 <- (rowSums(indNaN)>N*0.8)
   nanLead <- which(rem1)
   # nanEnd <- which(rem1[length(rem1):1])
   # nanLE <- c(nanEnd,nanLead)
   nanLE<-nanLead
   X<-X[-nanLE,]
   indNaN=is.na(X)
   for (i in 1:N){  
     x = X[,i]
     isnanx = is.na(x)
     t1 = min(which(!isnanx))
     t2 = max(which(!isnanx))
     
     x1<-stats::spline(x[t1:t2],xout = 1:(t2-t1+1))

Wrong result while replicating NYFED model

Hi!

First of all, thank you for this great package, it is a pleasure to see a lot of tools (especially vintages based on the delay) made in such a simple form.

I've been working with an NYFED-like model for several years for Ukraine, but I've done it in MATLAB (where the original model is coded). Then I was asked to replicate it in R, my primary language.

I've found several possible issues, one of which is that results from MATLAB and R codes don't correspond to each other. Databases are exactly the same. There was an issue with transformations (they're different in raw codes), but I've fixed it and have made the same transformations for both cases (yoy for all variables, I know that economically its nonsense, but I want to simplify the statistical part as strong as possible).

But still, the result isn't the same. The intriguing part is that not only a nowcast doesn't correspond, the missed/filtered observations (for example GDP) are different.

Thank you for your attention and help!

using quarterly data to forecast quarterly variable

sorry in advance, But I honestly don't know where can I ask for help regarding this question.
I just want to ask for advice regarding using explanatory variables that are of frequency=4 to forecast gdp with frequency of 4, is that possible using 2s and 2s_agg method
thank you in advance any help will be much appreciated

Dúvida

Logo no início da função "nowcast" aparece esse código:

  if(is.null(q) & is.null(r) & is.null(p)){
    warnings('Parameters q, r and p must be specified.')
  }

Do que eu entendi o texto do warning não condiz com o código escrito. Se para entrar no if statement vc exige que os parâmetros q e r e p sejam não nulos, significa que se um deles não for nulo o statement será pulado. Daí o texto que eu entendo que deveria acompanhar seria "Parameters q, r or p must be specified.'"

Depois dá uma olhada nisso. Valeu!

Kalman filtering and EM in Rcpp

Hi,

It's great to finally see someone making a decent nowcasting package for R. A few years ago I made this (there's a more up-to-date version on gitlab) but never took the time to package it into something nice.

One thing I recall is that R was pretty slow in running EM, especially with long and large datasets, which led me to rewrite the Kalman filter and EM-algorithm in RcppArmadillo instead. I guess there should be significant amount of overlap and some of the C++ code might still re-usable in your case.

You went much further in the implementation though. 👍

Source code reference

I was testing the nowcast function, only to find that there is a function called EM_DFM_SS_block_idioQARMA_restrMQ in Nowcast.R

Res <- EM_DFM_SS_block_idioQARMA_restrMQ(x,Par)

Could you tell me how can I read the source code of this function? I may need to change some parameters. Thanks

Fatores dinâmicos

Reparei que os fatores dinâmicos apresentados não são os mesmo utilizados na regressão de y, por que isso ocorre?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.