Introduction
State-space models were originally developed by control engineers, particularly for applications that require continuous updating of the current position. An example, from the field of navigation systems, is updating an user equipment’s position. The models have also found increasing use in many types of time-series problems, including parameter estimation, smoothing, and prediction. Structural time-series models are state-space models for time-series data. They are useful in practice because they are
flexible : a very large class of models can be expressed in state space forms, including all ARIMA and VARMA models;
modular : the model can be assembled from a library of state-component sub-models to capture important features of the data. Several widely used state components are available for capturing the trend, seasonality, or effects of holidays.
The bsts R package is a tool for fitting structural time series models using Bayesian methods and bsts stands for Bayesian structural time series. The bsts can be configured for short term or long term forecasting, incorporating one or more seasonal effects, or fitting explanatory models if forecasting is not the primary goal.
General Form
A general form of (univariate) structural time-series model can be written as \[\begin{split} & y_t = Z_t^T \alpha_t + \epsilon_t, \qquad (\text{observation equation})\\ & \alpha_{t+1} = T_t\alpha_{t} + R_t\eta_t, \qquad (\text{transition or state equation})\\ & \epsilon_t\sim N(0,\sigma^2_t),\quad \eta_t \sim N(0, Q_t) \end{split}\] where \(y_t\) is the observed value of a time series at time \(t\), \(\alpha_t\) is a \(m\)-dimensional state vector, \(Z_t\) is a \(m\)-dimensional output vector, \(T_t\) is a \(m\times m\) transition matrix, \(R_t\) is a \(m\times q\) control matrix, \(\epsilon_t\) is a scalar observation error, and \(\eta_t\) is a \(q\)-dimensional system error with a \(q\times q\) state-diffusion matrix \(Q_t\) where \(q\leq m\).
The observation equation links the observed data \(y_t\) with the unobserved latent state \(\alpha_t\).
The state vector \(\alpha_t\) is of prime importance and is usually unobserved or partially known. Although it may not be directly observable, it is often reasonable to assume that we know how it changes over time, and we denote the updating equation by the transition or state equation above. This equation defines how the latent space evolves over time.
The arrays \(Z_t, T_t, R_t\) typically contain a mix of known values (often 0 and 1) and unknown parameters. The 0’s and 1’s indicate which bits of \(\alpha_t\) are relevant for a particular computation.
The \(\epsilon_t\) and \(\eta_t\) are generally assumed to be serially uncorrelated and also to be uncorrelated with each other at all time periods. In practice, we often assume \(\epsilon_t\sim N(0,\sigma_t^2)\).
The term \(R_t\eta_t\) allows us to incorporate state components of less than full ranks. A model for seasonality will be the most important example.
The analyst chooses the structure of \(\alpha_t\) based on the specific data and task, such as whether it is for short or long term forecast, whether the data contains seasonal effects, and whether and how regressors are to be included. Many of these models are standard, and can be fit using a variety of tools, such as the StructTS function distributed with base R or one of several R packages for fitting these models, such as the dlm package for dynamic linear model. The bsts package handles all the standard cases, but it also includes several useful extensions.
State Components
Static intercept
We can add a static intercept term to a model, \[y_t = c+\epsilon_t,\] where \(c\) is a constant value. If the structural time-series model includes a traditional trend component (e.g. local level, local linear trend, etc) then a separate intercept is not needed (and will probably cause trouble, as it will be confounded with the initial state of the trend model). However, if there is no trend, or the trend is an AR process centered around zero, then adding a static intercept will shift the center to a data-determined value.
# R code
ss <- list()
ss <- bsts::AddStaticIntercept(ss, y)
Trend
Local level
The local level model assumes the trend is a random walk: \[\begin{split} &y_t = \mu_t +\epsilon_t,\\ &\mu_{t+1} = \mu_{t} + \eta_{t} \end{split}\]
# R code
ss <- list()
ss <- bsts::AddLocalLevel(ss, y)
AR
An autoregressive (AR) model is a representation of a type of random process. The AR model specifies that the time series \(y_t\) depends linearly on its own previous values and on a stochastic term. In general, the AR model of order \(p\), i.e. \(AR(p)\), can be written as \[\begin{split} &y_t = \mu_t +\epsilon_t,\\ &\mu_{t+1} = \sum_{i=0}^{p-1} \phi_i \mu_{t-i} + \eta_{t} \end{split}\] where \(\phi_0,\cdots,\phi_{p-1}\) are the parameters of the model.
# R code
ss <- list()
ss <- bsts::AddAr(ss, y, lags = p)
The bsts package also supports sparse AR(p) process for large \(p\), where a spike and slab prior is applied on the autoregression coefficients \(\phi_0,\cdots, \phi_{p-1}\). This model differs from the one in AddAr()
only in that some of its coefficients may be set to zero.
# R code
ss <- list()
ss <- bsts::AddAutoAr(ss, y, lags = p)
Local linear trend
The local linear trend is a popular choice for modelling trends because it quickly adapts to local variation, which is desirable when making short-term predictions. However, this degree of flexibility may not be desired when making long-term predictions, as such predictions often come with implausibly wide uncertainty intervals.
A local linear trend model for \(y_t\) can be written as, \[\begin{split} & y_t = \mu_t +\epsilon_t, \\ & \mu_{t+1} = \mu_{t} + \delta_t + \eta_{\mu,t}, \qquad \text{(stochastic level component)}\\ & \delta_{t+1} = \delta_t + \eta_{\delta,t}, \qquad \text{(stochastic slope component)}\\ & \eta_{\mu, t} \sim N(0,\sigma^2_{\mu, t}),\quad \eta_{\delta, t} \sim N(0,\sigma^2_{\delta, t}) \end{split}\] where \(\mu_t\) is the value of the trend at time \(t\), \(\delta_t\) is the expected increase in \(\mu\) between time \(t\) and \(t+1\) so it can be thought of as the slope at time \(t\), \(\eta_{\mu,t}\) and \(\eta_{\delta, t}\) are error terms independent from each other. A local linear trend allows both level (\(\mu_t\)) and slope (\(\delta_t\)) to be stochastic. It assumes that both the level and the slope follow random walks.
The model can also be expressed in the general form \[\begin{split} & y_t = [1\quad 0]\left[\begin{matrix}\mu_t\\\delta_t\end{matrix}\right] +\epsilon_t, \\ & \left[\begin{matrix}\mu_{t+1}\\\delta_{t+1}\end{matrix}\right] = \left[\begin{matrix}1 & 1\\ 0 & 1 \end{matrix}\right] \left[\begin{matrix}\mu_t\\\delta_t\end{matrix}\right] + \left[\begin{matrix}1 & 0 \\ 0 & 1 \end{matrix}\right] \left[\begin{matrix}\eta_{\mu, t}\\\eta_{\delta, t}\end{matrix}\right], \end{split}\] where \(Z_t = (1,0)^T\), \(\alpha_t = (\mu_t, \delta_t)^T\), \(T_t = \left[\begin{smallmatrix}1 & 1\\ 0 & 1 \end{smallmatrix}\right]\), \(R_t=\left[\begin{smallmatrix}1 & 0 \\ 0 & 1 \end{smallmatrix}\right]\), and \(\eta_t = (\eta_{\mu,t}, \eta_{\delta, t})^T\).
# R code
ss <- list()
ss <- bsts::AddLocalLinearTrend(ss, y)
Semi-local linear trend
The semi-local linear trend is similar to the local linear trend, but more useful for long-term forecasting. The model can be written as \[\begin{split} & y_t = \mu_t +\epsilon_t, \\ & \mu_{t+1} = \mu_{t} + \delta_t + \eta_{\mu,t}, \qquad \text{(stochastic level component)}\\ & \delta_{t+1} = a + \phi\times (\delta_t-a) + \eta_{\delta,t}, \qquad \text{(stochastic slope component)}\\ & \eta_{\mu, t} \sim N(0,\sigma^2_{\mu, t}),\quad \eta_{\delta, t} \sim N(0,\sigma^2_{\delta, t}) \end{split}\] where the slope component \(\delta_{t+1}\) is modeled by an AR(1) process centered at value \(a\). A stationary AR(1) process is less variable than a random walk so it often gives more reasonable uncertainty estimates when making long term forecasts.
The model can be expressed in the general form with \(Z_t = (1,0, a)^T\), \(\alpha_t = (\mu_t+a, \delta_t-a, -1)^T\), \(T_t = \left[\begin{smallmatrix}1 & 1 & 0 \\ 0 & \phi & 0 \\ 0 & 0 & 1 \end{smallmatrix}\right]\), \(R_t=\left[\begin{smallmatrix}1 & 0 \\ 0 & 1\\0&0 \end{smallmatrix}\right]\), and \(\eta_t = (\eta_{\mu,t}, \eta_{\delta, t})^T\).
# R code
ss <- list()
ss <- bsts::AddSemilocalLinearTrend(ss, y)
Seasonality
Regression with Seasonal Dummy Variables
There are several commonly used state-component models to capture seasonality. The most frequently used model in the time domain is \[\begin{split} & y_t = \gamma_t +\epsilon_t, \\ & \gamma_{t+d} = - \sum_{i=0}^{s-2}\gamma_{t-i\times d} + \eta_{\gamma, t}, \end{split}\] where \(s\) is the number of seasons and \(d\) is the seasonal duration (number of time periods in each season, often set to 1). The model can be thought of as a regression on \(s\) dummy variables representing \(s\) seasons and \(\gamma_{t}\) denotes their joint contribution to the observed response \(y_t\). The mean of \(\gamma_{t+d}\) is such that the total seasonal effect is zero when summed over \(s\) seasons (i.e. \(E(\gamma_{t+d}+\sum_{i=0}^{s-2}\gamma_{t-i\times d}) = 0\)). The model can be rewritten as \[\begin{split} & y_t = [1\quad 0 \quad \cdots\quad 0]\left[\begin{matrix}\gamma_{t}\\\gamma_{t-d}\\ \vdots\\ \gamma_{t-(s-2)d}\end{matrix}\right] +\epsilon_t, \\ & \left[\begin{matrix}\gamma_{t+d}\\\gamma_t\\\gamma_{t-d}\\ \vdots\\ \gamma_{t-(s-4)d}\\ \gamma_{t-(s-3)d}\end{matrix}\right] = \left[\begin{matrix} -1 & - 1 & \cdots & -1 & -1 \\ 1 & 0 & \cdots &0& 0\\ 0 & 1 & \cdots & 0 &0 \\ \vdots &\vdots &\vdots &\vdots &\vdots &\\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 0 \\ \end{matrix}\right] \left[\begin{matrix}\gamma_{t}\\\gamma_{t-d}\\\gamma_{t-2d}\\\vdots \\ \gamma_{t-(s-3)d}\\ \gamma_{t-(s-2)d}\end{matrix}\right] + \left[\begin{matrix}1\\0\\0\\ \vdots\\ 0\\ 0\end{matrix}\right]\eta_{\gamma, t} \end{split}\]
The seasonal model can be generalized to allow for multiple seasonal components with different periods.
# R code
# Suppose that y is a time series collected hourly
ss <- list()
# daily seasonality
ss <- bsts::AddSeasonal(ss, y, nseasons = 24, season.duration = 1)
# weely seasonality
ss <- bsts::AddSeasonal(ss, y, nseasons = 7, season.duration = 24)
Trigonometric Seasonal model
Another way to model seasonality is to use trigonometric seasonal model, \[\begin{split} & y_t = \gamma_t +\epsilon_t, \\ & \gamma_{j, t+1} = \gamma_{j,t}\times \cos(\lambda_j) - \gamma^*_{j,t}\times \sin(\lambda_j) + \omega_{j, t},\\ & \gamma^*_{j, t+1} = \gamma^*_{j,t}\times \cos(\lambda_j) - \gamma_{j,t}\times \sin(\lambda_j) + \omega^*_{j,t},\\ \end{split}\] where \(\gamma_t = \sum_{j=1}^{k}\gamma_{j,t}\), \(\lambda_j = 2\pi j/s\) is the \(j\)-th seasonal frequency, \(j = 1,\cdots, k\), and \(s\) is the number of time steps required for the longest cycle to repeat (i.e. number of seasosns).
# R code
ss <- list()
# The harmonic method is strongly preferred to the direct method.
# For a time series collected hourly with daily
# seasonality, we can try
# the component below, where frequencies = 1:4 specifies the frequencis of sin, cos functions we will use.
ss <- bsts::AddTrig(ss, y, period = 24, frequencies = 1:4, method = "harmonic")
Linear regression
The covariates in structural time series models are assumed to be contemporaneous. The coefficients of the contemporaneous covariates \(\mathbf{x}_t\) can be static or time-varying.
A static regression can be written in state-space form by setting \(Z_t = \beta^T \mathbf{x}_t\) and \(\alpha_t =1\), where \(\beta\) is static.
# R code
ss <- list()
bsts::bsts(y ~ x, ss, niter = 500)
A dynamic regression component can be written as \[\begin{split} & \mathbf{x}_t^T\beta_t = \sum_{j=1}^J x_{j,t} \beta_{j,t}\\ & \beta_{j, t+1} = \beta_{j,t} + \eta_{\beta,j,t}, \end{split}\] where \(\beta_{j,t}\) is the coefficient for the \(j\)-th covariate \(x_{j,t}\) at time \(t\). The state-space form is given as \(Z_t = \mathbf{x}_t, \alpha_t =\beta_t, T_t = R_t = I_{J\times J}\).
# R code
ss <- list()
ss <- bsts::AddDynamicRegression(ss, y ~ x)
bsts::bsts(y, ss, niter = 500)
Assemble multiple state components
Independent state components can be combined by concatenating their observation vectors \(Z_t\) and arranging the other model matrices as elements in a block diagonal matrix. For example, we can combine the local linear trend with seasonality and have the following model matrices: \[Z_t = \left[\begin{smallmatrix} 1 \\ 0 \\ 1 \\ 0 \\ \vdots\\ 0\end{smallmatrix}\right], T_t = \left[\begin{smallmatrix} 1 & 1 & \\ 0 & 1 & \\ & & -1 & - 1 & \cdots & -1 & -1 \\ & & 1 & 0 & \cdots &0& 0\\ & & 0 & 1 & \cdots & 0 &0 \\ & & \vdots &\vdots &\vdots &\vdots &\vdots &\\ & & 0 & 0 & \cdots & 1 & 0 \\ & & 0 & 0 & \cdots & 0 & 0 \\ \end{smallmatrix}\right], R_t=\left[\begin{smallmatrix} 1 & 0 \\ 0 & 1 \\ & & 1 \\ & & 0 \\ & & \vdots \\ & & 0 \\ \end{smallmatrix}\right] \]
# R code
ss <- list()
ss <- bsts::AddLocalLinearTrend(ss, y)
ss <- bsts::AddSeasonality(ss, y, nseason = s, season.duration = d)
bsts::bsts(y, ss, niter = 500)
Similarly, we can add other state components such as Regression, Local level, AR process, Random walk for holiday effect, etc (see bsts Package for more details). As mentioned in the Introduction, the model is modular and can be easily extended by using the bsts package. For example,
# R code
ss <- list()
ss <- bsts::AddLocalLevel(ss, y)
ss <- bsts::AddAr(ss, y)
ss <- bsts::AddSeasonal(ss, y, nseasons = 24)
ss <- bsts::AddDynamicRegression(ss, y ~ x1)
bsts::bsts(y ~ x2, ss, niter = 500)
Some examples of application to real data analyses can be found here.
Model diagnostic
Prediction errors
As part of the model fitting process, the algorithm in bsts generates the one-step-ahead prediction errors \(y_tbE(y_t|Y_{tb1},\theta)\) , where \(Y_{tb1}=y_1,\cdots,y_{tb1}\), and the vector of model parameters \(\theta\) is fixed at its current value in the MCMC algorithm. The one-step-ahead prediction errors can be obtained from the bsts model by calling
bsts.prediction.errors(model1)
The one step prediction errors are a useful diagnostic for comparing several bsts models that have been fit to the same data. They are used to implement the function CompareBstsModels
, which is called as shown below.
CompareBstsModels(list("Model 1" = model1,
"Model 2" = model2,
"Model 3" = model3),
colors = c("black", "red", "blue"))
Model assumptions
In BSTS models, the residual term is often assumed to follow a Gaussian distribution. It is important to check the validity of such assumption to see if the model is valid or not.
The output of bsts model contains a matrix of Monte Carlo draws of residual errors. Each row is a Monte Carlo draw, and each column is an observation.
The qqdist
function sorts the columns of draws by their mean, and plots the resulting set of curves against the quantiles of the standard normal distribution. A reference line is added, and the mean of each column of draws is represented by a blue dot. If the dots fall around the straight line, the normality assumption holds well for the residuals. If the dots depart greatly from the line, the normality assumption may violate. In this case, we may either do data transformation to obtain more normally distributed data or assume a different distribution on the data (e.g. t-distribution)
The AcfDist
function plots the posterior distribution of the autocorrelation function (ACF) of the residuals using a set of side-by-side boxplots. If the boxplot of lag \(k\) contains 0, we may consider the residuals are uncorrelated at lag \(k\). If the ACF does not dampen out (i.e. falling to zero) within about 15 to 20 lags, the residual term is nonstationary and we should try a different model.
References
Steven L. Scott. (2017). Fitting Bayesian structural time series with the bsts R package. http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html.
Chatfield, Chris. (2016). The analysis of time series: an introduction. CRC press.
Montgomery, Douglas C., Cheryl L. Jennings, and Murat Kulahci. (2015). Introduction to time series analysis and forecasting. John Wiley & Sons.