Machine Learning & Signals Learning

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $ \newcommand {\abs }[1]{\lvert #1\rvert } $ $ \DeclareMathOperator {\sign }{sign} $ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}$ $\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}$ $\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}$ $\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}$ $\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}$ $\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}$ $\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}$ $\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}$ $\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}$ $\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}$ $\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}$ $\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}$ $\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}$ $\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}$ $\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}$ $\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}$ $\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}$ $\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}$ $\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}$ $\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}$ $\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}$ $\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}$ $\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}$ $\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}$ $\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bc }{\mathbf {c}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bf }{\mathbf {f}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bi }{\mathbf {i}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bo }{\mathbf {o}}$ $\newcommand {\bp }{\mathbf {p}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bs }{\mathbf {s}}$ $\newcommand {\bt }{\mathbf {t}}$ $\newcommand {\bu }{\mathbf {u}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bC }{\mathbf {C}}$ $\newcommand {\bD }{\mathbf {D}}$ $\newcommand {\bH }{\mathbf {H}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bM }{\mathbf {M}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bS }{\mathbf {S}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bphi }{\bm {\phi }}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\indFunc }{\mathbb {1}}$ $\newcommand {\btx }{\tilde {\bx }}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\SSE }{\mathrm {SSE}}$ $\newcommand {\MSE }{\mathrm {MSE}}$ $\newcommand {\RMSE }{\mathrm {RMSE}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$ $\require {colortbl}$ $\let \LWRorigcolumncolor \columncolor $ $\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigrowcolor \rowcolor $ $\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigcellcolor \cellcolor $ $\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }$

22 Statistical Inference

Goal: Quantify how much of an observed signal-processing estimate is signal versus sampling noise, and provide thresholds that separate the two.

22.1 Overview

Every quantity computed from a finite record of length $L$ (autocorrelation, cross-correlation, spectral density, fitted coefficient, multi-step forecast) is a random estimate, not a fixed number. A confidence bound is the threshold beyond which the estimate is unlikely to lie under a stated null hypothesis, allowing significant structure to be separated from finite-sample noise.

In the context of the ARMA and ARX chapters, several distinct confidence-bound problems arise. Their construction follows the same recipe (sampling distribution under $H_0$, plus a threshold $z_{\alpha /2}\sqrt {\Var [\cdot )]}$), but the variance formula differs:

• ACF bound (Sec. 22.2). Used to identify MA order, validate model residuals, and decide whether a lag is significant. Variance under white-noise $H_0$: $1/L$ (Bartlett’s formula in its simple form, generalized for autocorrelated processes).
• CCF bound (Sec. 22.3). Used in ARX model identification to pick relevant input lags. Bartlett’s general formula accounts for the autocorrelation of both signals to avoid spurious detections.
• PSD confidence interval. The bound is multiplicative (asymmetric on linear scale, symmetric on dB scale) rather than additive.
• Coefficient confidence intervals. AR/ARMA/ARX parameter estimates from least squares or Yule-Walker have approximately Gaussian sampling distributions. The resulting interval $\hat {w}_i \pm z_{\alpha /2}\sqrt {\Var [\hat {w}_i]}$ tests whether a coefficient is meaningfully nonzero.
• Forecast/prediction intervals. The $h$-step-ahead prediction error variance grows with the horizon and depends on the model’s MA representation; the resulting interval widens with $h$.

This chapter develops the ACF and CCF cases in detail.

Hypotheses

A confidence bound is most often used as a decision rule in a hypothesis test. Two complementary statements are formulated about the unknown parameter $\theta $:

• Null hypothesis $H_0$: the “no effect” or baseline statement. In this chapter, $H_0$ typically asserts the absence of structure, e.g. $\rho _{\bx \bx }[k]=0$ (white-noise ACF), $\rho _{\bx \by }[k]=0$ (independent signals), or $w_i=0$ (an irrelevant model coefficient).
• Alternative hypothesis $H_1$: the complementary statement that some structure is present, e.g. $\rho _{\bx \bx }[k]\ne 0$, $\rho _{\bx \by }[k]\ne 0$, or $w_i\ne 0$.

The sampling distribution of $\hat {\theta }$ (and hence its SE) is derived under $H_0$. The CI built from this distribution acts as the acceptance region: if the estimate $\hat {\theta }$ falls outside the bound, $H_0$ is rejected in favor of $H_1$ at significance level $\alpha $; otherwise, $H_0$ is retained as consistent with the data.

Standard error vs. confidence interval

The sampling distribution of $\hat {\theta }$ under $H_0$ is summarized by two related but distinct quantities:

• The standard error (SE) is its standard deviation,
$\seteqnumber{0}{}{0}$
\begin{equation} \label {eq:se-definition} \mathrm {SE}(\hat {\theta }) = \sqrt {\Var [\hat {\theta }\mid H_0]}. \end{equation}

It is a single non-negative number describing the typical spread of $\hat {\theta }$ around the value asserted by $H_0$, and depends only on the estimator and the sample size $L$, not on the chosen significance level $\alpha $.
• The confidence interval (CI) at level $1-\alpha $ is the acceptance region introduced above, expressed via the SE,
$\seteqnumber{0}{}{1}$
\begin{equation} \label {eq:ci-definition} \hat {\theta } \pm z_{\alpha /2}\cdot \mathrm {SE}(\hat {\theta }), \end{equation}

under the approximately Gaussian sampling distribution implied by the CLT. The half-width $c_\alpha = z_{\alpha /2}\cdot \mathrm {SE}(\hat {\theta })$ is the threshold whose violation rejects $H_0$.

The two are linked but not interchangeable:

• SE is a scale (one number); CI is an interval (two endpoints).
• Tightening the test (smaller $\alpha $) widens the CI but leaves the SE unchanged.
• Increasing the sample size $L$ shrinks the SE, and therefore the CI, at a rate proportional to $1/\sqrt {L}$.

Theoretical vs. estimated SE The SE in (22.1) is a population quantity: $\Var [\hat {\theta }\mid H_0]$ is computed from the true (typically unknown) distribution of the data, so the resulting expression involves population parameters such as the true ACFs $\rho _{\bx \bx }[m]$, $\rho _{\by \by }[m]$. We refer to this as the theoretical SE and write it $\mathrm {SE}(\hat {\theta })$. To actually evaluate a CI on data, the population parameters are replaced by their sample estimates (e.g. $\rho _{\bx \bx }[m]\to R_{\bx \bx ,norm}[m]$). The resulting plug-in quantity is the estimated SE,

\begin{equation} \label {eq:se-estimated} \widehat {\mathrm {SE}}(\hat {\theta }) = \sqrt {\Var [\hat {\theta }\mid H_0]}\Big |_{\text {sample plug-in}}, \end{equation}

and is what enters the operational confidence bound $c_\alpha = z_{\alpha /2}\cdot \widehat {\mathrm {SE}}(\hat {\theta })$. The two coincide in the limit $L\to \infty $, but for finite $L$ the estimated SE itself carries sampling error. In the white-noise case (e.g. the simple $1/\sqrt {L}$ rule) no plug-in is needed, since $\Var [\hat {\theta }\mid H_0]$ depends only on $L$; in Bartlett’s formulas the distinction is non-trivial because the variance depends on unknown ACFs.

All bound formulas in this chapter (the simple $1/\sqrt {L}$ rule, Bartlett’s formulas, coefficient and forecast intervals) are recipes for $\mathrm {SE}(\hat {\theta })$ under the relevant $H_0$; substituting sample quantities yields $\widehat {\mathrm {SE}}(\hat {\theta })$, and multiplying by $z_{\alpha /2}$ yields the CI half-width $c_\alpha $ used to accept or reject $H_0$.

Notation

Throughout the chapter we distinguish theoretical (population) quantities from their sample-based estimates:

• Normalized correlations:
- – $\rho _{\bx \bx }[k]$, $\rho _{\bx \by }[k]$ denote the population correlation coefficients, defined as the $L\to \infty $ limit of the sample Cauchy–Schwarz form of Eq. (20.21),
  $\seteqnumber{0}{}{3}$
  \begin{equation} \label {eq:rho-population-limit} \rho _{\bx \bx }[k] \triangleq \lim _{L\to \infty }\frac {R_{\bx \bx }[k]}{\sqrt {R_{\bx \bx }[0]\,\sum _n x^2[n-k]}}, \end{equation}
  
  and similarly for $\rho _{\bx \by }[k]$. This is the standard time-series convention used in Bartlett’s formulas below.
- – $R_{\bx \bx ,norm}[k]$, $R_{\bx \by ,norm}[k]$ denote the sample estimates computed from a finite record of length $L$, as defined in Sec. 20.1 and Sec. 21.1. Under the stationarity approximation Eq. (20.19), the sample $\rho _{\bxx }[k]$ of (20.21) reduces to $R_{\bxx ,norm}[k]$, so the latter serves as the operational plug-in for $\rho _{\bxx }[k]$ in this chapter.
• Standard error:
- – $\mathrm {SE}(\hat {\theta })$ for the theoretical SE (function of population parameters such as $\rho _{\bx \bx }[m]$, $\rho _{\by \by }[m]$)
- – $\widehat {\mathrm {SE}}(\hat {\theta })$ for the plug-in (estimated) SE in which population parameters are replaced by their sample counterparts; see (22.1) and (22.3).

Stationarity prerequisite

The ACF and CCF sampling distributions and the corresponding analysis are derived under second-order stationarity: constant mean, and ACF/CCF that depend only on the lag, not on absolute time. For example, in the cross-correlation case, two independent random walks routinely produce large $R_{\bx \by ,norm}[k]$ (the spurious-regression effect).

22.2 ACF

Goal: Determine whether the estimated ACF (Sec. 20.1) at lag $k \ne 0$ is statistically significant or merely a result of random chance.

Practical summary

• White-noise series (or model residuals tested against white noise): compare $\abs {R_{\bx \bx ,norm}[k]}$ to the simple bound $z_{\alpha /2}/\sqrt {L}$ (from (22.6)).
• Autocorrelated series:
- – Pick a truncation lag $M$ (rule of thumb $M\approx \lfloor \sqrt {L}\rfloor $ or $M\approx 10\log _{10}L$),
- – compute the plug-in Bartlett $\widehat {\mathrm {SE}}[k] = \sqrt {\tfrac {1}{L}\bigl (1 + 2\sum _{j=1}^{M} R_{\bx \bx ,norm}^2[j]\bigr )}$,
- – bound $c_\alpha [k] = z_{\alpha /2}\,\widehat {\mathrm {SE}}[k]$ (Sec. 22.2.2).
• Decision rule: reject white-noise $H_0$ at lag $k$ iff $\abs {R_{\bx \bx ,norm}[k]} > c_\alpha [k]$.
• Common choice $\alpha =0.05$, $z_{\alpha /2}=1.96$.

The ACF estimate at lag $k$ averages products $x[n]x[n-k]$ across many time indices $n$. Even for pure white noise, where the theoretical ACF is zero for all $k\ne 0$, the estimated ACF will not be exactly zero due to finite sample effects. A confidence bound quantifies how large these random fluctuations can be, so that values exceeding the bound can be attributed to genuine correlation rather than chance.

The bounds below presuppose stationarity.

22.2.1 Simple Case: White-Noise Signal

Null hypothesis and distribution Consider the null hypothesis $H_0$: the underlying process is white noise, i.e., $\rho _{\bx \bx }[k]=0$ for all $k\ne 0$ (population coefficient, Eq. (22.4)). Under the assumptions of stationarity and weak dependence, the Central Limit Theorem (CLT) applies: for large $L$, the ACF estimate at each lag is approximately normally distributed, regardless of the original distribution of $x[n]$. Specifically, under $H_0$, the variance of the normalized sample ACF is approximately $\frac {1}{L}$ (a special case of Bartlett’s formula derived below), so that

\begin{equation} R_{\bx \bx ,norm}[k] \sim \mathcal {N}\!\left (0,\;\frac {1}{L}\right ),\quad k\ne 0. \end{equation}

The corresponding standard error is

\begin{equation} \label {eq:se-acf-simple} \mathrm {SE} = \frac {1}{\sqrt {L}}. \end{equation}

Confidence bound For a significance level $\alpha $, the confidence bound is $\pm c_\alpha $, where

\begin{equation} c_\alpha = z_{\alpha /2}\cdot \mathrm {SE} = \dfrac {z_{\alpha /2}}{\sqrt {L}} \end{equation}

and $z_{\alpha /2}$ is the $z$-score (the number of standard deviations from the mean, (4.15)) obtained from the inverse of the standard normal CDF, $z_{\alpha /2}=\Phi ^{-1}(1-\alpha /2)$. Common values are

.
Confidence level	$\alpha $	$z_{\alpha /2}$
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

For the most common choice of $\alpha =0.05$ (95% confidence), the bound becomes $c_{0.05} = 1.96/\sqrt {L}$.

Interpretation If the estimated ACF at a particular lag exceeds the confidence bound, i.e.,

\begin{equation} \abs {R_{\bx \bx ,norm}[k]} > c_\alpha , \end{equation}

the null hypothesis is rejected, the correlation at that lag is statistically significant and unlikely to arise by chance alone. If the value remains within the bound, the observed correlation is consistent with white noise.

An example of the ACF with 95% and 99% confidence bounds for white noise is shown in Fig. 22.1.

The variance $1/L$ is a large-sample approximation. The exact (unapproximated) variance at lag $k$ is $1/(L-\abs {k})$, giving a lag-dependent standard error and bound

\begin{equation} \mathrm {SE}[k] = \frac {1}{\sqrt {L-\abs {k}}},\qquad c_\alpha [k] = z_{\alpha /2}\cdot \mathrm {SE}[k] = \dfrac {z_{\alpha /2}}{\sqrt {L-\abs {k}}} \end{equation}

which widens at larger lags where fewer sample pairs contribute.

22.2.2 General Case: Bartlett’s Formula

The simple $1/L$ variance assumes the process is white noise under $H_0$. When the process is autocorrelated (e.g., an MA or AR signal), this assumption breaks down: the simple bound becomes anti-conservative at higher lags and can flag spurious correlations.

Bartlett’s formula (1946) corrects for this by accounting for the internal autocorrelation structure of the process. For a stationary process with population ACF $\rho _{\bx \bx }[m]$ (the $L\to \infty $ limit, Eq. (22.4)), the variance of the normalized sample ACF at lag $k$ is

\begin{equation} \label {eq:bartlett-acf-general} \Var [R_{\bx \bx ,norm}[k]] \approx \frac {1}{L}\sum _{m=-\infty }^{\infty }\Bigl (\rho _{\bx \bx }^2[m] + \rho _{\bx \bx }[m+k]\,\rho _{\bx \bx }[m-k] - 4\rho _{\bx \bx }[k]\,\rho _{\bx \bx }[m]\,\rho _{\bx \bx }[m-k] + 2\rho _{\bx \bx }^2[k]\,\rho _{\bx \bx }^2[m]\Bigr ). \end{equation}

The four terms arise because the same series enters the estimator twice: the leading $\rho _{\bx \bx }^2[m]$ matches the white-noise variance, the cross product $\rho _{\bx \bx }[m+k]\rho _{\bx \bx }[m-k]$ couples symmetric lags, and the remaining $\rho _{\bx \bx }[k]$-dependent terms correct for the bias introduced when the true ACF at lag $k$ is itself nonzero.

Time-limited (truncated) variant

The expression (22.10) is theoretical: it depends on the true ACF $\rho _{\bx \bx }[m]$, which is unknown in practice. Two simplifications turn it into a usable formula.

First, restrict the sum. Beyond a finite lag the true ACF is negligible, so the infinite sum is truncated at a chosen lag $M$. Under $H_0$ at lag $k>0$, where $\rho _{\bx \bx }[k]=0$, the $\rho [k]$-dependent corrections in (22.10) vanish; using ACF symmetry $\rho [-m]=\rho [m]$ and $\rho _{\bx \bx }[0]=1$, the theoretical SE becomes

\begin{equation} \label {eq:se-acf-bartlett-theory} \mathrm {SE}[k] = \sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{j=1}^{M}\rho _{\bx \bx }^2[j]\Bigr )}. \end{equation}

Second, replace the unknown $\rho _{\bx \bx }[j]$ by its sample estimate $R_{\bx \bx ,norm}[j]$. This yields the plug-in (estimated) SE actually computed from data,

\begin{equation} \label {eq:bartlett-acf-trunc} \widehat {\mathrm {SE}}[k] = \sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{j=1}^{M}R_{\bx \bx ,norm}^2[j]\Bigr )}, \end{equation}

so only one-sided sample ACF lags need to be evaluated.

Choosing $M$ The truncation lag $M$ should be large enough to include all non-negligible ACF mass (past the visible decay of $R_{\bx \bx ,norm}[m]$), but small enough that the included terms are not dominated by sampling noise. Typical rules of thumb are $M\approx 10\log _{10}L$ or $M=\lfloor \sqrt {L}\rfloor $.

Choosing $L$ (absolute sample size) The truncation lag $M$ is dictated by the process (how fast its ACF decays); the record length $L$ must then satisfy two separate constraints, both expressed as absolute floors on $L$:

• CLT validity for $R_{\bx \bx ,norm}[k]$. The Gaussian approximation underlying $z_{\alpha /2}$ thresholds is reliable only once $L$ is large in absolute terms, independently of $M$. A practical floor is $L\gtrsim 50\text {--}100$; below this the sampling distribution at small lags is visibly non-Gaussian and the bounds are unreliable regardless of how the variance is estimated.
• Truncation validity, $M\ll L$. Inverting the rule of thumb $M\approx \lfloor \sqrt {L}\rfloor $ gives the lower bound $L\gtrsim M^2$. A safer engineering rule is $L\gtrsim 10M$ to $20M$, which keeps each included sample ACF $R_{\bx \bx ,norm}[j]$ ($j\le M$) accurate enough that the plug-in $\widehat {\mathrm {SE}}[k]$ approaches the theoretical $\mathrm {SE}[k]$. Recall that an individual sample ACF has SE of order $1/\sqrt {L-j}$, so each squared term in (22.12) contributes its own $O(1/L)$ noise.

Combining the two:

\[ L \;\ge \; \max \bigl (50\text {--}100,\; 10M\text { to }20M,\; M^2\bigr ). \]

Concrete examples: $M=3$ requires $L\gtrsim 50\text {--}100$ (the CLT floor dominates; the squared rule alone would give only $L\ge 9$, far too few); $M=5$ requires $L\gtrsim 100$; $M=10$ requires $L\gtrsim 100\text {--}200$ (the $10M$ rule now dominates). Small $M$ is a feature of a short-memory process, not a license to shrink $L$: even when the simple bound is nearly correct, $L$ must still be large enough for both the sampling distribution to be Gaussian and the plug-in $\widehat {\mathrm {SE}}$ to converge to $\mathrm {SE}$.

MA($q$) cutoff special case

A particularly clean special case arises when the process is MA($q$), so $\rho _{\bx \bx }[m]=0$ for $\abs {m}>q$. The truncation lag in (22.12) then collapses to $M=q$, and the variance of the normalized sample ACF at lag $k>q$ is

\begin{equation} \label {eq:bartlett-acf} \Var [R_{\bx \bx ,norm}[k]] \approx \frac {1}{L}\Bigl (1 + 2\sum _{m=1}^{q}\rho _{\bx \bx }^2[m]\Bigr ), \quad k>q, \end{equation}

giving the theoretical standard error

\begin{equation} \label {eq:se-acf-bartlett-theory-mq} \mathrm {SE}[k] = \sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{m=1}^{q}\rho _{\bx \bx }^2[m]\Bigr )}, \quad k>q. \end{equation}

Substituting the sample ACFs $R_{\bx \bx ,norm}[m]$ for the unknown $\rho _{\bx \bx }[m]$ yields the plug-in estimate

\begin{equation} \label {eq:se-acf-bartlett} \widehat {\mathrm {SE}}[k] = \sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{m=1}^{q}R_{\bx \bx ,norm}^2[m]\Bigr )}, \quad k>q, \end{equation}

and the corresponding confidence bound is

\begin{equation} c_\alpha [k] = z_{\alpha /2}\cdot \widehat {\mathrm {SE}}[k] = z_{\alpha /2}\sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{m=1}^{q}R_{\bx \bx ,norm}^2[m]\Bigr )}, \quad k>q. \end{equation}

For white noise ($q=0$) the sum vanishes and $\widehat {\mathrm {SE}}=1/\sqrt {L}$, recovering the simple bound $c_\alpha = z_{\alpha /2}/\sqrt {L}$. The bound widens with the strength of the autocorrelation at lower lags, so that higher lags are tested against a more conservative threshold.

Choosing $q$ The order $q$ is rarely known in advance and is typically selected from the data:

• Cutoff inspection (Box-Jenkins). Plot the ACF with the simple bound. Set $q$ to the largest lag at which $\abs {R_{\bx \bx ,norm}[m]}$ exceeds $z_{\alpha /2}/\sqrt {L}$, then re-evaluate higher lags using the Bartlett bound.
• Recursive widening. Start with $q=0$; if lag $1$ is significant, set $q=1$ and re-test lag $2$ with the wider bound. Continue until the next lag falls within the bound.
• Conservative upper bound. If no prior information is available, pick a safe upper limit (e.g., $q=\lfloor \sqrt {L}\rfloor $). This sacrifices statistical power but avoids underestimating the variance.

Example 22.1 (Bartlett bound for an AR(1) signal): Consider an AR(1) signal of length $L=200$ with normalized sample ACF $R_{\bx \bx ,norm}[1]=0.6$. Apply Bartlett’s MA($q$) approximation with $q=1$ (treating higher-lag autocorrelations as negligible) and compute the 95% confidence bound at lag $k=2$ under (a) the simple white-noise assumption and (b) Bartlett’s formula.
- Solution: With $z_{\alpha /2}=1.96$.
  - 1. Simple bound:
    $\seteqnumber{0}{}{16}$
    \begin{equation*} c_{0.05} = \frac {1.96}{\sqrt {200}} \approx 0.139. \end{equation*}
  - 2. Bartlett bound, with $q=1$ and $R_{\bx \bx ,norm}^2[1]=0.36$:
    $\seteqnumber{0}{}{16}$
    \begin{equation*} c_{0.05}[2] = 1.96\sqrt {\frac {1+2(0.36)}{200}} = 1.96\sqrt {\frac {1.72}{200}} \approx 0.182. \end{equation*}
  The Bartlett bound is roughly $31\%$ wider, reflecting that the lag-1 correlation inflates the sampling variability of $R_{\bx \bx ,norm}[k]$ at higher lags. An estimated $R_{\bx \bx ,norm}[2]=0.16$ would be flagged as significant by the simple bound, but is consistent with white-noise residual behavior under the Bartlett bound.
A simulated AR(1) realization with $\phi =0.6$ and $L=200$ is shown in Fig. 22.2: for $k>1$ the Bartlett bound (blue) widens above the constant simple bound (red), so several mid-lag fluctuations that pierce the simple threshold remain within the Bartlett threshold and are correctly attributed to chance.

Figure 22.2: Sample ACF of an AR(1) signal ($\phi =0.6$, $L=200$) with simple and Bartlett 95% confidence bounds. For $k>q=1$ the Bartlett bound widens to account for the lag-1 autocorrelation, avoiding spurious detections that the simple bound would flag.

22.3 CCF

Goal: Determine whether the estimated CCF (Sec. 21.1) at lag $k$ reflects a genuine relationship between $x[n]$ and $y[n]$, or is merely a result of random chance.

Practical summary

• At least one signal is white noise: compare $\abs {R_{\bx \by ,norm}[k]}$ to the simple bound $z_{\alpha /2}/\sqrt {L-\abs {k}}$ (from (22.18)).
• Both signals autocorrelated:
- – Pick a truncation lag $M$ (rule of thumb $M\approx \lfloor \sqrt {L}\rfloor $ or $M\approx 10\log _{10}L$),
- – compute the plug-in Bartlett $\widehat {\mathrm {SE}} = \sqrt {\tfrac {1}{L}\bigl (1 + 2\sum _{j=1}^{M} R_{\bx \bx ,norm}[j]\,R_{\by \by ,norm}[j]\bigr )}$ from the one-sided sample ACFs of $\bx $ and $\by $,
- – bound $c_\alpha [k] = z_{\alpha /2}\,\widehat {\mathrm {SE}}$ (eq. (22.24)).
• Conservative alternative: the decoupled bound (Sec. 22.3, “Conservative decoupled form”), which removes the joint ACF product at the cost of a wider threshold.
• Decision rule: reject independence $H_0$ at lag $k$ iff $\abs {R_{\bx \by ,norm}[k]} > c_\alpha [k]$.
• Common choice $\alpha =0.05$, $z_{\alpha /2}=1.96$.

Even for two independent signals, the estimated CCF will not be exactly zero due to finite sample effects. A confidence bound quantifies how large these random fluctuations can be. In practice, confidence intervals are computed for the normalized cross-correlation (bounded between $-1$ and $1$), rather than the raw cross-covariance, which is scale-dependent. As with the ACF, the bounds below assume both series are stationary; on integrated data the spurious-regression effect produces large $R_{\bx \by ,norm}[k]$ even between independent inputs.

22.3.1 Simple Case: at Least One White-Noise Signal

Consider the null hypothesis $H_0$: the signals $x[n]$ and $y[n]$ are independent, i.e., $\rho _{\bx \by }[k]=0$ for all $k$. If at least one of the signals is white noise (no autocorrelation), then the normalized sample CCF is approximately (analogous to the ACF case in Sec. 22.2)

\begin{equation} R_{\bx \by ,norm}[k] \sim \mathcal {N}\!\left (0,\;\frac {1}{L-\abs {k}}\right ), \end{equation}

with standard error

\begin{equation} \label {eq:se-ccf-simple} \mathrm {SE}[k] = \frac {1}{\sqrt {L-\abs {k}}}. \end{equation}

The confidence bound is $\pm c_\alpha [k]$, where

\begin{equation} c_\alpha [k] = z_{\alpha /2}\cdot \mathrm {SE}[k] = \dfrac {z_{\alpha /2}}{\sqrt {L-\abs {k}}}. \end{equation}

For large $L$, $L-\abs {k}\approx L$, yielding the common approximation $\mathrm {SE}\approx 1/\sqrt {L}$ and $c_\alpha [k]\approx z_{\alpha /2}/\sqrt {L}$. If $\abs {R_{\bx \by ,norm}[k]} > c_\alpha [k]$, the null hypothesis is rejected: the cross-correlation at lag $k$ is statistically significant.

22.3.2 General Case: Bartlett’s Formula

The simple $1/L$ variance assumes that at least one signal is white noise. When both $x[n]$ and $y[n]$ are autocorrelated (e.g., periodic), this assumption breaks down and can produce spurious correlations: falsely significant CCF values between two actually independent signals.

Bartlett’s formula corrects for this by accounting for the internal autocorrelation structure of both signals, following the same theory$\to $truncation$\to $plug-in pattern as the ACF case in Sec. 22.2.2. Under $H_0$ (independence), the variance of the normalized sample CCF at lag $k$ is

\begin{equation} \label {eq:bartlett-ccf} \Var [R_{\bx \by ,norm}[k]] \approx \frac {1}{L}\sum _{m=-\infty }^{\infty } \rho _{\bx \bx }[m]\,\rho _{\by \by }[m] \end{equation}

where $\rho _{\bx \bx }[m]$ and $\rho _{\by \by }[m]$ are the population autocorrelation coefficients of $x[n]$ and $y[n]$ at lag $m$ (the $L\to \infty $ limit, Eq. (22.4)), respectively. The corresponding theoretical standard error is

\begin{equation} \label {eq:se-ccf-bartlett-theory} \mathrm {SE} = \sqrt {\frac {1}{L}\sum _{m=-\infty }^{\infty } \rho _{\bx \bx }[m]\,\rho _{\by \by }[m]}. \end{equation}

If either signal is white noise (theoretically, $\rho _{\cdot }[m]=0$ for $m\ne 0$), only the $m=0$ term survives and $\mathrm {SE}=1/\sqrt {L}$, recovering $c_\alpha [k] = z_{\alpha /2}/\sqrt {L}$.

Time-limited (truncated) variant

As in Sec. 22.2.2, the expression (22.21) is theoretical: it depends on the unknown true ACFs. Truncating the sum at a chosen lag $M$ beyond which the ACFs are negligible gives the truncated theoretical SE,

\begin{equation} \label {eq:se-ccf-bartlett} \mathrm {SE} \approx \sqrt {\frac {1}{L}\sum _{m=-M}^{M} \rho _{\bx \bx }[m]\,\rho _{\by \by }[m]}. \end{equation}

Substituting the sample ACFs $R_{\bx \bx ,norm}[m]$ and $R_{\by \by ,norm}[m]$ for the unknown $\rho _{\bx \bx }[m]$ and $\rho _{\by \by }[m]$ yields the plug-in (estimated) SE

\begin{equation} \label {eq:bartlett-ccf-trunc} \widehat {\mathrm {SE}} \approx \sqrt {\frac {1}{L}\sum _{m=-M}^{M} R_{\bx \bx ,norm}[m]\,R_{\by \by ,norm}[m]}, \end{equation}

and using ACF symmetry $R[-m]=R[m]$ together with $R_{\bx \bx ,norm}[0]=R_{\by \by ,norm}[0]=1$,

\begin{equation} \label {eq:bartlett-ccf-trunc-onesided} \widehat {\mathrm {SE}} \approx \sqrt {\frac {1}{L}\Bigl (1 + 2\sum _{j=1}^{M} R_{\bx \bx ,norm}[j]\,R_{\by \by ,norm}[j]\Bigr )}, \end{equation}

so only one-sided sample ACF lags need to be evaluated. The confidence bound is $c_\alpha [k] = z_{\alpha /2}\cdot \widehat {\mathrm {SE}}$.

Choosing $M$ and $L$ The same considerations as in the ACF case (Sec. 22.2.2, “Choosing $M$” and “Choosing $L$”) apply, with $M$ now required to exceed the visible ACF decay of both signals (e.g., Fig. 22.3a and 22.3b). The rules of thumb $M\approx \lfloor \sqrt {L}\rfloor $ or $M\approx 10\log _{10}L$ and the absolute floor $L\geq \max (50\text {--}100,\,10M\text { to }20M,\,M^2)$ carry over unchanged.

Conservative decoupled form

A cleaner conservative form of the plug-in (estimated) SE is sometimes used in practice, since it removes the need to evaluate the joint ACF product. Starting from (22.23),

\[ \widehat {\mathrm {SE}}^2 = \frac {1}{L}\sum _{m=-M}^{M} R_{\bx \bx ,norm}[m]\,R_{\by \by ,norm}[m], \]

apply the elementary inequality (valid lag-by-lag, in absolute value)

\[ \abs {R_{\bx \bx ,norm}[m]\,R_{\by \by ,norm}[m]} \le R_{\bx \bx ,norm}[m]^2 + R_{\by \by ,norm}[m]^2, \]

and use ACF symmetry $R[-m]=R[m]$ together with $R_{\bx \bx ,norm}[0]=R_{\by \by ,norm}[0]=1$:

\[ \sum _{m=-M}^{M} R_{\bx \bx ,norm}[m]^2 = 1 + 2\sum _{j=1}^{M} R_{\bx \bx ,norm}[j]^2, \]

and likewise for $\by $. Combining the two terms,

\[ \widehat {\mathrm {SE}} \le \frac {1}{\sqrt {L}}\sqrt {\,1 + 2\sum _{j=1}^{M} R_{\bx \bx ,norm}[j]^2 + 2\sum _{j=1}^{M} R_{\by \by ,norm}[j]^2\,}. \]

The bound decouples into two independent one-sided ACF sums, so each series can be evaluated in isolation.

Example 22.2: Fig. 22.3 illustrates the difference: two independent but autocorrelated signals are generated, and their CCF is plotted against the simple, Bartlett, and conservative one-sided confidence bounds. The simple bound (narrow) falsely flags several lags as significant; the Bartlett bound widens to account for the autocorrelation and avoids the spurious detections; the conservative one-sided bound is wider still, since it replaces the joint ACF product by the sum of squared single-series ACFs.

(a) ACF of $x[n]$

(b) ACF of $y[n]$

(c) CCF with simple, Bartlett, and conservative bounds

Figure 22.3: Two independent AR(1) signals ($L=200$): (a) and (b) show that each signal is strongly autocorrelated, violating the white-noise assumption of the simple bound; (c) the resulting CCF crosses the simple bound (red dashed, $\approx \pm 0.14$) at several lags (false positives), while the Bartlett bound (blue dash-dotted, $\approx \pm 0.33$) correctly accounts for the internal autocorrelation of both signals. The conservative one-sided bound (black dotted, $\approx \pm 0.55$) is even looser.

22.4 Cross-Coherence

Goal: Decide whether the estimated cross-coherence (Sec. 21.2.2) at frequency bin $k$ reflects a stable linear coupling between $x[n]$ and $y[n]$, or is merely a result of finite-sample noise.

Practical summary

• Estimate $\hat {\gamma }_{\bxy }^2[k]$ by averaging $M$ quasi-independent spectral replicates (Welch segments or multitaper tapers); for $M=1$ the estimate is identically $1$ and uninformative.
• Compare $\hat {\gamma }_{\bxy }^2[k]$ to the threshold $\gamma _{th}^2 = 1 - \alpha ^{1/(M-1)}$ from (22.27).
• Decision rule: reject independence $H_0$ at bin $k$ iff $\hat {\gamma }_{\bxy }^2[k] > \gamma _{th}^2$.
• For Welch with $50\%$ Hann overlap, replace $M$ by $M_{eff}\approx M/1.9$.

The true cross-coherence (Sec. 21.2.2) is a population quantity and must be estimated from a finite record. Both common estimators have the same form: average $M$ quasi-independent replicates of the cross- and auto-spectra and combine them as

\begin{equation} \label {eq:coh-estimator} \hat {\gamma }_{\bxy }^2[k] = \frac {\abs {\hat {S}_{\bxy }[k]}^2}{\hat {S}_{\bxx }[k]\,\hat {S}_{\byy }[k]}. \end{equation}

For $M=1$ the Cauchy-Schwarz inequality saturates and $\hat {\gamma }_{\bxy }^2[k]\equiv 1$ identically, so $M>1$ is mandatory; the role of $M$ is to inject independent realizations of the spectra into the ratio.

22.4.1 Hypothesis Test

The null hypothesis is the absence of linear coupling at bin $k$,

\begin{equation} H_0:\ \gamma _{\bxy }^2[k] = 0,\qquad H_1:\ \gamma _{\bxy }^2[k] > 0. \end{equation}

For two independent Gaussian signals with $M$ independent spectral replicates, the null distribution of $\hat {\gamma }_{\bxy }^2[k]$ admits a closed-form upper quantile, and the $(1-\alpha )$ significance threshold is

\begin{equation} \label {eq-coh-threshold} \gamma _{th}^2 = 1 - \alpha ^{1/(M-1)}. \end{equation}

Reject $H_0$ at bin $k$ iff $\hat {\gamma }_{\bxy }^2[k] > \gamma _{th}^2$. The threshold decreases as $M$ grows, so more averaging makes weaker couplings detectable; the cost is frequency resolution, which depends on the chosen estimator.

22.4.2 Estimators

Welch The classical estimator partitions the record into $M$ overlapping segments of length $N_{seg}$, applies a window, and averages the resulting periodograms (Sec. 20.2.3). The frequency resolution is $\Delta f = f_s / N_{seg}$. With $50\%$ Hann overlap the segments are not strictly independent: the standard correction is to replace $M$ in (22.27) by an effective count $M_{eff}\approx M/1.9$, which avoids an optimistic threshold.

Multitaper The multitaper estimator keeps the record intact and multiplies it by $K$ orthogonal Slepian (DPSS) tapers, averaging the resulting tapered cross- and auto-spectra,

\begin{equation} \label {eq:multitaper-spectra} \hat {S}_{\bxy }[k] = \frac {1}{K}\sum _{j=1}^{K}{X^{(j)}[k]}^*\,Y^{(j)}[k],\qquad \hat {S}_{\bxx }[k] = \frac {1}{K}\sum _{j=1}^{K}\abs {X^{(j)}[k]}^2, \end{equation}

where $X^{(j)}[k]$ is the DFT of $x[n]$ multiplied by the $j$-th DPSS taper of time-bandwidth product $NW$ (typically $K=2NW-1$). The tapers are orthogonal by construction, so each acts as a quasi-independent realization and (22.27) applies with $M$ replaced by $K$. Multitaper preserves the original frequency resolution (no segmentation) and is preferred when the record is short.

Example 22.3 (Multitaper cross-coherence of delay-coupled sinusoids): Reuse the noisy delay-coupled sinusoid pair of Fig. 21.2, with $f_0=5$ Hz, $f_s=100$ Hz, $L=100$, and $y[n]$ a copy of $x[n]$ delayed by $n_0=6$ samples plus independent noise. The cross-coherence is estimated with $K=2NW-1=5$ DPSS tapers ($NW=3$), and the corresponding test bounds are
$\seteqnumber{0}{}{28}$
\begin{equation*} \gamma _{th}^2(\alpha ) = 1 - \alpha ^{1/(K-1)},\quad \gamma _{th}^2(0.05) \approx 0.527,\quad \gamma _{th}^2(0.01) \approx 0.684. \end{equation*}

Each individual signal is itself strongly autocorrelated (the cosine envelope dominates the noise), so the simple $1/\sqrt {L}$ ACF bound flags many spurious lags; the time-limited Bartlett bound with $M=\lfloor \sqrt {L}\rfloor =10$ widens to absorb that structure, as shown in Fig. 22.4a and 22.4b. The cross-correlation Fig. 22.4c peaks near lag $k=n_0=6$ as expected from the delay; the simple bound flags many spurious lags, while the Bartlett joint bound absorbs the autocorrelation of both signals and leaves only the delay peak above threshold. In the frequency domain the same coupling shows up as a coherence peak: Fig. 22.4d shows that $\hat {\gamma }_{\bxy }^2[k]$ pierces both the $95\%$ and $99\%$ thresholds at the sinusoid frequency $f_0=5$ Hz and stays below them elsewhere.

(a) ACF of $x[n]$

(b) ACF of $y[n]$

(c) CCF $R_{\bxy ,norm}[k]$

(d) Multitaper cross-coherence

Figure 22.4: Delay-coupled sinusoid pair of Fig. 21.2 ($L=100$, $NW=3$, $K=5$): (a) and (b) sample ACFs with the simple $1/\sqrt {L}$ bound (red dashed) and the time-limited Bartlett bound with $M=10$ (blue dash-dotted); (c) sample CCF with simple and Bartlett joint $95\%$ bounds, peaking at $k=n_0=6$; (d) multitaper cross-coherence with the $\alpha =0.05$ and $\alpha =0.01$ test bounds, $\approx 0.527$ and $\approx 0.684$, peaking at $f_0=5$ Hz.

22.5 Granger Causality

Goal: Decide whether the past of one signal $x[n]$ improves the prediction of another signal $y[n]$ beyond $y$’s own past: a directional, predictive notion of causality.

The null hypothesis states that the lagged-$x$ coefficients are jointly zero,

\begin{equation} \begin{aligned} H_0&:\ b_1 = b_2 = \cdots = b_p = 0,\\ H_1&:\ \exists \, i:\ b_i \ne 0. \end {aligned} \end{equation}

Assumption: both $x[n]$ and $y[n]$ are stationary.

22.5.1 Test statistic

The restricted model predicts $y[n]$ from its own past only, i.e. the AR($p$) model of Sec. 20.3,

\begin{equation} \label {eq:granger-restricted} \hat {y}[n] = \sum _{i=1}^{p} a_i\,y[n-i], \end{equation}

with residual sum of squares $\SSE _r$. The unrestricted model adds the past of $x[n]$,

\begin{equation} \label {eq:granger-unrestricted} \hat {y}[n] = \sum _{i=1}^{p} a_i\,y[n-i] + \sum _{i=1}^{p} b_i\,x[n-i], \end{equation}

with residual sum of squares $\SSE _u$.

Because the unrestricted model nests the restricted one, adding the $x$-regressors cannot increase the residual, so $\SSE _u \le \SSE _r$; the test asks whether this reduction is larger than sampling noise alone would produce.

Under $H_0$ the nested-model $F$-statistic is

\begin{equation} \label {eq:granger-F} F = \frac {(\SSE _r - \SSE _u)/p}{\SSE _u/(L-2p-1)} \end{equation}

Practical summary

• Fit the restricted model: AR($p$) of $y[n]$ on its own $p$ lags, giving residual sum of squares $\SSE _r$.
• Fit the unrestricted model: add the $p$ past lags of $x[n]$, giving $\SSE _u$.
• Form the statistic
$\seteqnumber{0}{}{32}$
\begin{equation*} F = \frac {(\SSE _r - \SSE _u)/p}{\SSE _u/(L-2p-1)} \sim F_{p,\,L-2p-1} \end{equation*}
• Reject “$x$ does not Granger-cause $y$” iff $F > F_\alpha $.
• Run the test in both directions (Sec. 22.5.3).

Granger causality formalizes a predictive sense of causality: $x[n]$ is said to Granger-cause $y[n]$ if the past of $x$ helps forecast $y$ better than the past of $y$ alone. It is a statement about forecast improvement, not about a physical mechanism.

22.5.2 Relation to VAR

The paired tests are exactly the coefficient tests of a bivariate VAR($p$) (Sec. 21.9): stacking the two signals into $\by [n]=[x[n],\,y[n]]^\top $, channel $j$ does not Granger-cause channel $i$ iff $(\bA _m)_{ij}=0$ for all lags $m=1,\ldots ,p$.

22.5.3 Granger causality comparisons

Causality is directional, so the test is run twice with the roles of the two signals swapped:

• $x \to y$: restricted = AR($p$) of $y$ on its own past; unrestricted adds lagged $x$. Null $H_0^{x\to y}$: the lagged-$x$ coefficients in the $y$ equation are all zero.
• $y \to x$: the symmetric test on the $x$ equation. Null $H_0^{y\to x}$: the lagged-$y$ coefficients in the $x$ equation are all zero.

The two outcomes combine into four mutually exclusive conclusions, Table 22.1.

Table 22.1: Granger causality comparisons: the four directional conclusions from the paired tests.

.
$x\to y$	$y\to x$	Conclusion
reject	retain	unidirectional $x\to y$
retain	reject	unidirectional $y\to x$
reject	reject	bidirectional feedback $x\leftrightarrow y$
retain	retain	no Granger causality (independence)

Example 22.4 (Granger F-test): A record of length $L=200$ is modeled with $p=2$ lags. The restricted (own-past) fit gives $\SSE _r=120$ and the unrestricted fit (adding the past of $x$) gives $\SSE _u=100$. Then
$\seteqnumber{0}{}{32}$
\begin{equation*} F = \frac {(120-100)/2}{100/(200-2\cdot 2-1)} = \frac {10}{100/195} \approx 19.5 \end{equation*}

Since $F_{0.05}\approx 3.04$, the statistic far exceeds the threshold and $H_0$ is rejected: $x$ Granger-causes $y$. A simulated coupled pair is shown in Fig. 22.5.

Figure 22.5: Granger causality on a simulated pair where $x[n]$ drives $y[n]$ with a lag. Adding the past of $x$ to the own-past AR model of $y$ markedly reduces the prediction residual, and the resulting $F$-statistic exceeds the significance threshold.

• Granger causality is predictive, not mechanistic: it detects forecast improvement, not a physical cause.
• The bivariate test can be misled by a latent third driver that leads both signals; partial out the other channels with a conditional (multivariate) test on the full VAR.
• It presupposes stationarity; on trending or integrated data, difference first, otherwise the spurious-regression effect inflates significance.
• It is linear by construction and captures only lagged coupling, not instantaneous (same-sample) dependence.
• Conclusions depend on the lag order $p$; select it by AIC/BIC.

Machine Learning & Signals Learning

22 Statistical Inference

22.1 Overview

Hypotheses

Standard error vs. confidence interval

Notation

Stationarity prerequisite

22.2 ACF

22.2.1 Simple Case: White-Noise Signal

22.2.2 General Case: Bartlett’s Formula

Time-limited (truncated) variant

MA(\(q\)) cutoff special case

22.3 CCF

22.3.1 Simple Case: at Least One White-Noise Signal

22.3.2 General Case: Bartlett’s Formula

Time-limited (truncated) variant

Conservative decoupled form

22.4 Cross-Coherence

22.4.1 Hypothesis Test

22.4.2 Estimators

22.5 Granger Causality

22.5.1 Test statistic

22.5.2 Relation to VAR

22.5.3 Granger causality comparisons

Further Reading

.
Confidence level	\(\alpha \)	\(z_{\alpha /2}\)
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

.
\(x\to y\)	\(y\to x\)	Conclusion
reject	retain	unidirectional \(x\to y\)
retain	reject	unidirectional \(y\to x\)
reject	reject	bidirectional feedback \(x\leftrightarrow y\)
retain	retain	no Granger causality (independence)