Data-Driven Time-Series Prediction

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$

Chapter 5 ARX

The ARX (Auto-Regressive with eXtra input or Auto-Regressive eXogenic) model extends the AR approach by incorporating an additional input signal $x[n]$ that influences the output $y[n]$. The term "exogenous" indicates that the additional input signal comes from outside the system, unlike an AR model that relies solely on past values of the output.

Systems classification Two class of models:

• Endogenic/endogenous system is a system without inputs.
• Exogenic/exogenous is a system with inputs.

Goal: Extension for AR model to ARX model.

The $ARX(p,q)$ model is given by

\begin{equation} \begin{aligned} y[n] &= a_1y[n-1] + \cdots + h_py[n-p] \\ &\quad + b_1x[n-1] + \cdots + b_qx[n-k] + \epsilon [n], \end {aligned} \end{equation}

where

• $h_i$ are AR coefficients related to the past of $y[n]$,
• $b_i$ are he coefficients that relate the current output $y[n]$ to past values of the exogenous input $x[n]$
• $\epsilon [n]$ is noise or modeling error.

5.1 Cross-Correlation Function

Goal: Whereas the ACF measures how a single signal correlates with its own time-shifted versions, the cross-correlation function measures the relationship between two different signals

The goal is to predict $y[n]$ from $x[n-k]$ with a single exogenous term at lag $k$ by $b_k$ coefficient,

\begin{equation} \hat {y}[n]=b_k x[n-k], \end{equation}

The resulting MSE-based loss function is of the form

\begin{equation} \mathcal {L}(b) = \frac {1}{2}\sum _n \left (y[n] - b_kx[n-k])\right )^2 \end{equation}

with the solution by

\begin{equation} \frac {d\mathcal {L}(b)}{db} =\sum _n (y[n]-b_kx[n-k])(-x[n-k])=0 \end{equation}

The corresponding solution is

\begin{equation} b_k = \frac {\sum _n y[n]x[n-k]}{\sum _n x^2[n-k]}. \end{equation}

Cross-Correlation Function The resulting coefficients are related to the cross-correlation function,

\begin{equation} \label {eq-ccf} R_{\bx \by }[k]=\sum _n x[n]y[n-k],k=-L+1,\ldots ,L-1 \end{equation}

Similar to the ACF, cross-correlation can also be defined in biased, unbiased, or normalized forms:

\begin{align} R_{\bxy ,biased}[k] &= \frac {1}{L}R_\bxy [k]\\ R_{\bxy ,unbiased}[k] &= \frac {1}{L-\abs {k}}R_\bxy [k]\\ R_{\bxy ,norm}[k] &= \frac {R_\bxy [k]}{\sqrt {R_\bx [0]R_\by [0]}} \end{align} Note, these modification are available only if $x[n]$ and $y[n]$ are of the same length. Otherwise, only Eq. (5.6) is used.

The normalized cross-correlation function has correlation coefficient interpretation,

\begin{equation} R_{\bxy ,norm}[k] \approx \rho _{\bxy }[k] \end{equation}

Properties:

\begin{align} R_{\mathbf {xy}}[k] &= R_{\mathbf {yx}}[-k]\\ R_{\mathbf {xy}}[-k] &= R_{\mathbf {yx}}[k]\\ \left | R_{\mathbf {xy}}[k]\right | &\leqslant \sqrt {R_\mathbf {x}[0]R_\mathbf {y}[0]}\\ \left | R_{\mathbf {xy}}[k]\right | &\leqslant \frac {1}{2}\left [R_\mathbf {x}[0]+R_\mathbf {y}[0]\right ] \end{align}

Example 5.1: The solution in Eq. (3.12) is
$\seteqnumber{0}{5.}{14}$
\begin{equation} \hat {A} = \frac {R_\bxy [0]}{R_\bx [0]} \end{equation}

Interpretation If there is a strong correlation at some lag $k$, it suggests that $y[n]$ is influenced by $x[n-k]$. In an ARX setting, identifying the lag $k$ at which the cross-correlation peaks can guide the selection of $q$ and help determine which past inputs are most relevant for predicting $y[n]$.

5.1.1 Cross-Covariance Function

For simplicity, a zero-average, $\bar {x}[n]=\bar {y}[n]=0$, was assumed. When either of the signals is non-zero mean, the subtraction of signal average from the signal before cross-correlation calculation is termed as cross-covariance.

It is similar to auto-correlation and auto-covariance functions in Sec. 4.1.5.

5.2 ARX(0,q) model

The ARX(0,q) model describes a scenario where the output $y[n]$ depends purely on the past values of an external (exogenous) input $x[n]$, without feedback from its own past outputs [6, Example 4.3, pp. 90]

\begin{equation} \begin{aligned} y[n] &= b_1x[n-1] + \cdots + b_{q}x[n-q] + \epsilon [n]\\ &= \sum _{k=1}^q b_kx[n-k] + \epsilon [n] \end {aligned} \end{equation}

In matrix form, the past values are arranged into a matrix $\bX $,

\begin{equation} \underbrace {\begin{bmatrix} \hat {y}[1] \\ \hat {y}[2]\\ \vdots \\ \hat {y}[L-1] \end {bmatrix}}_{\hat {\by }} = \underbrace {\left [\begin{bmatrix} x[0] & 0 & \cdots & 0 \\ x[1] & x[0] & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ x[L-2] & x[L-3] & \vdots & x[L-m-2] \end {bmatrix}\right ]}_{\bX } \underbrace {\begin{bmatrix} b_1 \\ \vdots \\ b_{q} \end {bmatrix}}_{\bb } \end{equation}

with $\hat {\by }\in \Re ^{L-1},\bX \in \Re ^{(L-1)\times q},\bb \in \Re ^{q}$. The resulting $\bb $ coefficients are found by the corresponding LS minimization. Similar ot AR model, the solution is also comprised of the corresponding auto-correlations $R_{\bx \bx }[k]$ resulted for $\bX ^T\bX $ and cross-correlations $R_{\bx \by }[k]$ resulted from $\bX ^T\by $. This reveals that the solution leverages both the structure of the input’s autocorrelation and the input-output cross-correlation.

Interpretation The model is essentially a linear filter of $x[n]$.

5.3 General ARX model

In a general ARX(p,q) model, the output is represented as a linear combination of both its own past values and the past values of an exogenous input. LS formulation involves matrix $\bX $ that is constructed from past values of $x[n]$ and $y[n]$, shifted according to the lags involved. The vector $\bw $ includes both $h_i$ and $b_k$ values.

Example 5.2: ARX(3,3) model with signals
$\seteqnumber{0}{5.}{17}$
\begin{align*} x[n] &= x[0],x[1],\ldots ,x[7]\\ y[n] &= y[0],y[1],\ldots ,y[7] \end{align*} The required difference equation is
$\seteqnumber{0}{5.}{17}$
\begin{equation} \begin{aligned} \hat {y}[n] &= h_1y[n-1] + h_2y[n-2] + h_3y[n-2] \\ &\quad + b_1x[n-1] + b_2x[n-2] + b_3x[n-3] \end {aligned} \end{equation}

Find prediction of $\hat {y}[8]$.
- Solution: The coefficients are given by
  $\seteqnumber{0}{5.}{18}$
  \begin{equation} \arg \min \limits _{\bw }\norm {\by -\bX \bw } \end{equation}
  
  where
  $\seteqnumber{0}{5.}{19}$
  \begin{align} \bX = \begin{bmatrix} x[0] & 0 & 0 & y[0] & 0 & 0 \\ x[1] & x[0] & 0 & y[1] & y[0] & 0 \\ x[2] & x[1] & x[0] & y[2] & y[1] & y[0] \\ x[3] & x[2] & x[1] & y[3] & y[2] & y[1] \\ x[4] & x[3] & x[2] & y[4] & y[3] & y[2] \\ x[5] & x[4] & x[3] & y[5] & y[4] & y[3] \\ x[6] & x[5] & x[4] & y[6] & y[5] & y[4] \end {bmatrix}, \\ \bw = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ h_1 \\ h_2 \\ h_3 \end {bmatrix}, \by = \begin{bmatrix} y[1] \\ y[2] \\ y[3]\\ y[4]\\ y[5] \\y[6] \\ y[7] \end {bmatrix}\nonumber \end{align} The prediction of $\hat {y}[8]$ is straightforward after finding the prediction coefficients by LS minimization. The resulting calculation is comprised of the corresponding $R_{\bx \bx }[k]$ and $R_{\bx \by }[k]$ values.

The synthetic prediction example is show in Fig. 5.2. In the provided figure, a synthetic prediction example is shown, where the input is a noise-corrupted binary signal processed through an unknown filter. The ARX model, using estimated parameters, approximates this filtering process and predicts future outputs.

5.4 Time-Domain Filtering

Goal: Use AR model for signal enhancement. In this problem there two common versions that include signal and noise combinations that differs by signal availability during the training. We assume both signal and noise are stationary. In he first one, clean and another noisy versions are available,
$\seteqnumber{0}{5.}{20}$
\begin{equation} \begin{cases} x[n]\\ y[n]=x[n] + \epsilon [n] \end {cases} \end{equation}

In another one (Wiener filter), noise and noisy versions are available,
$\seteqnumber{0}{5.}{21}$
\begin{equation} \begin{cases} \varepsilon [n]\\ y[n]=x[n] + \epsilon [n] \end {cases} \end{equation}

In either case, the learned AR(p) model can be used.

Noise filtering (denoising)

We have training data $\left \lbrace x[n], y[n]\right \rbrace _{n=0}^{L-1}$ and we are interested to learn coefficients $h_k$, such that

\begin{equation} \hat {x}[n] = \sum _{k=0}^{p}h_k y[n-k] \end{equation}

The key idea is to construct the AR matrix $\bX $ (as in (4.36)) from the shifted versions of $y[n]$ rather than $x[n]$. By fitting an AR model the resulting coefficients effectively learn how to reconstruct the clean signal from the noisy input. The matrices are

\begin{equation} \begin{aligned} \bX &= \begin{bmatrix} y[p] & y[p-1] & \cdots & y[0]\\ y[p+1] & y[p] & \ldots & y[1]\\ \vdots & \cdots & \ddots & \vdots \\ y[L-1] & y[L-2] & \cdots & y[L-1-p] \end {bmatrix},\\ \by &= \begin{bmatrix} x[p] \\ x[p+1] \\ \vdots \\ x[L-1] \end {bmatrix} \end {aligned} \end{equation}

and the resulting coefficients can be found by the normal equation.

Using notation in Sec. 4.3.1, the problem can be rewritten as matrix $\bR $ of $R_\byy [k]$ elements and vector $\br $ with $R_\bxy [k]$ elements. The resulting $h_k$ values are optimal in the sense of MSE.

This approach sets the foundation for adaptive filtering methods, where the model continuously adjusts its parameters to best estimate the clean signal under changing noise conditions.

5.5 ARMAX

Goal: The ARMAX model extends the concepts of ARMA, and ARX models by combining their elements to capture more complex dynamics.

The model that combines ARMA (signal history and noise) together with exogenous input (ARX model),

\begin{equation} \begin{aligned} y[n] &= h_1y[n-1] + \cdots + h_{n_a}y[n-n_a] \\ &+ b_1x[n-1] + \cdots + b_{n_b}x[n-n_b]\\ &+ c_1\epsilon [n-1] + \cdots + c_{n_c}\epsilon [n-n_c]+ \epsilon [n] \end {aligned} \end{equation}

5.6 ARI, ARIMA, ARIMAX

Goal: Handle model with trend.

When time series data exhibit trends, the basic AR, MA, and ARMA models may not be directly suitable. Trends refer to a systematic change in the mean level of the series, often increasing or decreasing over time. To accommodate such trends, de-trending can be applied before fitting ARMA-type models.

De-trending

The basic model with linear trend is

\begin{equation} y[n] = A + Bn + \epsilon [n], \end{equation}

where $B$ is the slope of the trend. Let’s define

\begin{equation} \begin{aligned} y[n] - y[n-1] &= A + Bn \epsilon [n] - A - B(n-1) - \epsilon [n-1]\\ &= B + \epsilon [n] - \epsilon [n-1] \end {aligned} \end{equation}

This is known as first-order differencing that effectively removes the constant slope.

The quadratic (or parabolic) trend is given by

\begin{equation} x(t) = at^2 + bt+c \end{equation}

Applying the differencing twice,

\begin{align*} y'[n] &= y[n] - y[n-1]\\ y''[n] &= y'[n] - y'[n-1]\\ &= y[n] - y[n-1] - (y[n-1] - y[n-2]) \\ &= y[n] - 2y[n-1] + y[n-1] \end{align*} can remove a quadratic trend.

In practice, differencing is often done as a preliminary step. If a single differencing is needed to achieve stationarity, this is referred to as $d=1$; if twice, $d=2$, and so forth.

ARI(p,d) If an AR model (p) is applied to data that have been differenced $d$ times to remove trend. The "I" stands for “Integrated," indicating differencing to remove trend.

ARIMA(p,d,q) ARMA model with de-trending is termed ARIMA(p,d,q). ARIMA(p,0,q) is actually ARMA(p,q).

ARIMAX(p,d,q) Similar to ARMAX, ARIMAX includes exogenous inputs (X) along with the ARIMA model.

5.7 Non-linear AR and ARX Models

A non-linear autoregressive model replaces the linear combination $h_{1}y[n-1]+\cdots +h_{p}y[n-p]$ with a non-linear mapping implemented by a non-linear model approximation, e.g. by neural network:

\begin{equation} \hat {y}[n]= f(y[n-1],\ldots , y[n-p]) + \epsilon [n] \end{equation}

where $f(\cdot ;\bw )$ is the model and $\bw $ its weights.

For neural network with purely linear activations the NAR($p$) collapses to the classical AR($p$).

The corresponding NARX model is

\begin{equation} \hat {y}[n] \;=\; f(\bx ,\by ) + \epsilon [n] \end{equation}

The network learns an arbitrary non-linear mapping from the selected lags of $y[n]$ and $x[n]$ to the current output.

5.8 Multivariate AR

The goal is to use AR multivariate prediction of an $N$-dimensional signal $\by [n]$ by its $L$ historic values,

\begin{equation} \begin{Bmatrix} \begin{bmatrix} y_0[0]\\[3pt] y_1[0]\\[3pt] \vdots \\ y_{N-1}[0] \end {bmatrix}, \begin{bmatrix} y_0[1]\\[3pt] y_1[1]\\[3pt] \vdots \\ y_{N-1}[1] \end {bmatrix} \begin{matrix} \cdots \\[3pt] \cdots \\[3pt] \cdots \\[5pt] \cdots \end {matrix} \begin{bmatrix} y_0[L-1]\\[3pt] y_1[L-1]\\[3pt] \vdots \\ y_{N-1}[L-1] \end {bmatrix} \end {Bmatrix} \end{equation}

VAR(1) Model

For example, the VAR(1) model of 2-dimensional signal $(N=2)$ is

\begin{equation} \begin{aligned} \begin{bmatrix} \hat {y}_0[n] \\[5pt] \hat {y}_1[n] \end {bmatrix} =\begin{bmatrix} a_{00} & a_{01}\\[5pt] a_{10} & a_{11}\\ \end {bmatrix} \begin{bmatrix} y_0[n-1]\\[5pt] y_1[n-1]\\ \end {bmatrix} + \begin{bmatrix} \epsilon _0[n]\\[5pt] \epsilon _1[n]\\ \end {bmatrix} \end {aligned} \end{equation}

In compressed matrix notation,

\begin{equation} \hat {\by }[n] = \bA \by [n-1] + \bm {\epsilon }[n] \end{equation}

where $\bA \in \real ^{N\times N},\bm {\epsilon }[n]\in \real ^N$.

History The relation to MA model is similar to univariate case,

\begin{equation} \begin{aligned} \hat {\by }[n] &= \bA \by [n-1] + \bm {\epsilon }[n]\\ &= \bA (\bA \by [n-2] + \bm {\epsilon }[n-1]) + \bm {\epsilon }[n]\\ &= \bA ^2\by [n-2] + \bA \bm {\epsilon }[n-1] + \bm {\epsilon }[n]\\ &= \bA ^3\by [n-3] + \bA ^2\bm {\epsilon }[n-2] + \bA \bm {\epsilon }[n-1]+ \bm {\epsilon }[n]\\ \end {aligned} \end{equation}

The stability criterion is $\abs {\lambda _{max}(\bA )}<1$.

Biased form In biased form,

\begin{equation} \hat {\by }[n] = \bm {\mu } + \bA \by [n-1] + \bm {\epsilon }[n] \end{equation}

biases $\bm {\mu } = \begin {bmatrix}\mu _0 \\ \mu _1\end {bmatrix}$.

VAP(p) Model

The further extension to $p$ values history is straightforward,

\begin{equation} \hat {\by }[n] = \bA _1\by [n-1] + \bA _2\by [n-2] + \ldots + \bA _p\by [n-p] \bm {\epsilon }[n] \end{equation}

Again, biased version is possible.

Coefficients

LS solution is straightforward by the organization of $\by [n]$ values into the corresponding LS problem matrices.

Diagonal $\bA $ matrix If matrix $\bA $ is diagonal, the model reduces to a set of univariate independent models.

Trending The standard VAR models does not supply de-trending capabilities similar to the univariate ARI model. If de-trending is required, it is a part of signal pre-processing pipeline.

Bibliography

[1] Tomas Andersson. Selected topics in frequency estimation. PhD thesis, KTH Royal Institute of Technology, 2003.
[2] Dima Bykhovsky. Experimental lognormal modeling of harmonics power of switched-mode power supplies. Energies, 15(2), 2022.
[3] Dima Bykhovsky and Asaf Cohen. Electrical network frequency (ENF) maximum-likelihood estimation via a multitone harmonic model. IEEE Transactions on Information Forensics and Security, 8(5):744–753, 2013.
[4] Sharon Gannot, Zheng-Hua Tan, Martin Haardt, Nancy F Chen, Hoi-To Wai, Ivan Tashev, Walter Kellermann, and Justin Dauwels. Data science education: The signal processing perspective [sp education]. IEEE Signal Processing Magazine, 40(7):89–93, 2023.
[5] Monson H Hayes. Statistical Digital Signal Processing and Modeling. John Wiley & Sons, 1996.
[6] Steven M. Kay. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, 1993.
[7] Boaz Porat. Digital processing of random signals: theory and methods. Courier Dover Publications, 2008.