Data-Driven Time-Series Prediction

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$

Chapter 1 Descriptive Statistics Basics

Goal: Describe the concise characteristics of a data.

Preliminaries

We assume a uni-variate random experiment that is described by a real-valued random variable $X$ with

\begin{align*} \E [X] &=\mu \\ \Var [X] &= \sigma ^2. \end{align*}

Let

\[x=\{x_1,\dots ,x_n\}\]

be $n$ observations of $X$, where $n$ may be fixed in advance or chosen arbitrarily.

1.1 Central Tendency

1.1.1 Mean

Given observations $x_1,\dots ,x_n$, the sample mean $\bar {x}$ is given by

\begin{equation} \bar x = \frac 1n\sum _{i=1}^n x_i \end{equation}

Properties:

• As you gather more observations (higher $n$), $\bar x$ tends to stabilize: it fluctuates less, and its empirical distribution concentrates around the true center $\mu $.
• Moreover, for large $n$, the variability of $\bar x$ scales like $\sigma /\sqrt {n}$ (or $\Var [\bar x]\approx \frac {\sigma ^2}{n}$), meaning the more data we collect, the tighter our estimate of the process’s center becomes.
• In repeated experiments of size $n$, the average of those $\bar x$ values is itself approximately $\mu $; that is, $\E [\bar x]=\mu $, so on average the sample mean recovers the true mean.

1.1.2 Median

Median is a value or quantity lying at the midpoint of a frequency distribution of observed values or quantities, such that there is an equal probability of falling above or below it.

Equivalently, for a sample $x_1,\dots ,x_n$ with order statistics $x_{(1)}\le \cdots \le x_{(n)}$, the sample median is

\[ \mathrm {median}( x) = \begin {cases} x_{(\frac {n+1}2)}, & n\text { odd},\\[6pt] \dfrac {x_{(\frac n2)} + x_{(\frac n2 + 1)}}2, & n\text { even}. \end {cases} \]

1.1.3 Mode

For a finite sample $x_1,\dots ,x_n$, the sample mode is any value(s) that occur(s) most often among the observations.

1.2 Dispersion

1.2.1 Variance

Sample variance is given by

\begin{align} s_{unbiased}^2 =\frac 1{n-1}\sum _{i=1}^n\bigl (x_i-\bar {x}\bigr )^2 \end{align} This formulas is unbiased.

The more intuitive formula is

\begin{equation} s_{biased}^2 =\frac 1n\sum _{i=1}^n\bigl (x_i-\bar {x}\bigr )^2\\ \end{equation}

However, it is biased with $\E [s_{biased}^2]=\dfrac {n-1}{n}\sigma ^2$. It systematically underestimates the true population variance $\sigma ^2$.

Note, that MSE error of the biased formula is slightly lower than unbiased one.

1.2.2 Standard Deviation

The sample standard deviation (std) is simply the square-root of the (unbiased or biased) sample variance, $s$.

Table 1.1: The difference between variance and std.

.
Aspect	Variance ($s^2$)	Std. Dev. ($s$)
Units	(original unit)$^2$	original unit
Interpretation	“Mean squared deviation”	“Average deviation from the mean”
Ease of communication	Abstract (squared units)	Concrete ($\pm 5$ kg)

1.2.3 Bias

Let $\widehat \theta $ be an estimator of a parameter $\theta $. Its bias is

\[ \mathrm {Bias}(\widehat \theta )\;=\;\E [\widehat \theta ]\;-\;\theta . \]

If $\mathrm {Bias}(\widehat \theta )=0$, then $\E [\widehat \theta ]=\theta $ and $\widehat \theta $ is called unbiased.

Example 1.1: Sample mean is unbiased and its variance decays as $\sigma ^2/n$. The usual sample-variance estimators can be biased or unbiased. We illustrate all three properties by simulation:

1. Parameters.
- • True distribution: $X\sim \mathcal {N}(0,1)$, so $\mu =0$, $\sigma ^2=1$.
- • Sample sizes: $n\in \{5,\,20,\,100\}$.
- • Number of replicates: $R=5000$.
2. Data generation. For each $n$ and each replicate $r=1,\dots ,R$:

\[ x_{r,1},\dots ,x_{r,n}\;\overset {\mathrm {iid}}{\sim }\;\mathcal {N}(0,1), \quad \bar x_r = \frac {1}{n}\sum _{i=1}^n x_{r,i}. \]
3. Empirical estimates.
$\seteqnumber{0}{1.}{3}$
\begin{align*} \widehat \mu _n &= \frac {1}{R}\sum _{r=1}^R \bar x_r,\\ \widehat {\Var [\bar x]} &= \frac {1}{R-1}\sum _{r=1}^R\bigl (\bar x_r - \widehat \mu _n\bigr )^2\\ \end{align*} For the variance estimators,

\[ \begin {aligned} s_{r,\mathrm {biased}}^2 &= \frac {1}{n}\sum _{i=1}^n (x_{r,i}-\bar x_r)^2, \\ s_{r,\mathrm {unbiased}}^2 &= \frac {1}{n-1}\sum _{i=1}^n (x_{r,i}-\bar x_r)^2, \end {aligned} \]

and we compute

\[ \widehat \sigma ^2_{\mathrm {biased}} = \frac {1}{R}\sum _{r=1}^R s_{r,\mathrm {biased}}^2, \quad \widehat \sigma ^2_{\mathrm {unbiased}} = \frac {1}{R}\sum _{r=1}^R s_{r,\mathrm {unbiased}}^2. \]

We also compute the empirical mean-squared error (MSE) of each estimator relative to the true variance $\sigma ^2=1$:
$\seteqnumber{0}{1.}{3}$
\begin{align*} \widehat {\mathrm {MSE}}_{\mathrm {biased}} &= \frac {1}{R}\sum _{r=1}^R \bigl (s_{r,\mathrm {biased}}^2 - \sigma ^2\bigr )^2,\\ \widehat {\mathrm {MSE}}_{\mathrm {unbiased}} &= \frac {1}{R}\sum _{r=1}^R \bigl (s_{r,\mathrm {unbiased}}^2 - \sigma ^2\bigr )^2. \end{align*}

4. Results.

Table 1.2: Simulation results: sample mean, variance estimates, and their MSEs across different sample sizes.

.
$n$	$\widehat \mu _n$	Empirical $\Var [\bar x]$	Theoretical $\Var [\bar x]=1/n$	$\widehat \sigma ^2_{\mathrm {biased}}$	$\widehat \sigma ^2_{\mathrm {unbiased}}$	$\widehat {\mathrm {MSE}}_{\mathrm {biased}}$	$\widehat {\mathrm {MSE}}_{\mathrm {unbiased}}$
5	$0.001$	$0.198$	$0.200$	$0.79$	$0.99$	$0.36$	$0.50$
20	$-0.000$	$0.049$	$0.050$	$0.95$	$1.00$	$0.10$	$0.11$
100	$0.000$	$0.010$	$0.010$	$0.99$	$1.01$	$0.02$	$0.02$

Results in Table 1.2 agree closely with the theoretical values.

\begin{align*} \E [\bar x]&=0,\\ \Var [\bar x]&=\sigma ^2/n,\\ \E [s_{\mathrm {biased}}^2]&=\frac {n-1}{n}\sigma ^2, \\ \E [s_{\mathrm {unbiased}}^2]&=\sigma ^2,\\ \mathrm {MSE}(s_{\mathrm {biased}}^2) &=\frac {2n-1}{n^2}\,\sigma ^4, \\ \mathrm {MSE}(s_{\mathrm {unbiased}}^2) &=\Var [s_{\mathrm {unbiased}}^2] =\frac {2}{\,n-1\,}\,\sigma ^4 \end{align*}

The application of biased and unbiased estimators:

• Biased: in ML tasks (e.g., loss function), optimal (maximum-likelihood estimation) for Gaussian distribution
• Unbiased: when the particular value of $\sigma $ is of higher importance.

Code implementation defaults (note the difference):

• Python (numpy.var,std) biased default.
• Matlab (var,std) unbiased default.

1.3 Histogram

Goal: Visualization of the experimental data.

1.3.1 `count` type

Goal: Show how many times each discrete outcome occurs.

Consider an experiment with:

• $k$ possible distinct outcomes $x_{1},x_{2},\ldots ,x_{k}$, where $k$ is relatively small.
• A total of $N$ trials.
• Recorded results:
- – $n_{1}$ occurrences of $x_{1}$,
- – $n_{2}$ occurrences of $x_{2}$,
- – $\ldots $ and so on,
with $\sum _{i} n_{i}=N$.

A graphical representation of the outcomes is shown in Fig. 1.1(a).

1.3.2 `probability` type

Goal: Show the proportion of each discrete outcome, providing an empirical estimate of the PDF.

Approximation to the PDF: The probability of a particular outcome is approximated by the ratio of its count to the total number of trials,

\begin{equation} \label {eq:rand1:numeric_PDF_discr} p_X[x_i]\approx \frac {n_i}{N}, \qquad i = 1,\ldots ,k. \end{equation}

Naturally, the approximation improves as $N \to \infty $.

A graphical example of this histogram type is shown in Fig. 1.1(b).

1.3.3 Large number of outcomes

When the number of possible outcomes, $k$, is large (on the order of hundreds or more, or even continuous values) two main difficulties arise:

• Presenting the results in a compact, readable form.
• Some outcome categories contain very few observations because their probabilities are small.

A practical way to display the data is:

1. Record the extreme values, $x_{\max }$ and $x_{\min }$.
2. Partition the interval $\bigl [x_{\min },x_{\max }\bigr ]$ into $k$ equal-width bins of size $\Delta x$.
3. Mark each bin by its midpoint
$\seteqnumber{0}{1.}{4}$
\begin{equation} \label {eq:rand1:mid_point} \tilde {x}_{i}=x_{\min }+\Bigl (i-\frac 12\Bigr )\,\Delta x, \qquad i=1,\dots ,k, \end{equation}
4. Let $n_{1}$ be the count in the first bin, $n_{2}$ the count in the second, and so on.
5. Use the pairs the pairs $\left (\tilde {x}_{i},n_{i}\right )$ for count or probability type histograms.

An example of a count histogram for a large data set is shown in Fig. 1.2.

.
\(n\)	\(\widehat \mu _n\)	Empirical \(\Var [\bar x]\)	Theoretical \(\Var [\bar x]=1/n\)	\(\widehat \sigma ^2_{\mathrm {biased}}\)	\(\widehat \sigma ^2_{\mathrm {unbiased}}\)	\(\widehat {\mathrm {MSE}}_{\mathrm {biased}}\)	\(\widehat {\mathrm {MSE}}_{\mathrm {unbiased}}\)
5	\(0.001\)	\(0.198\)	\(0.200\)	\(0.79\)	\(0.99\)	\(0.36\)	\(0.50\)
20	\(-0.000\)	\(0.049\)	\(0.050\)	\(0.95\)	\(1.00\)	\(0.10\)	\(0.11\)
100	\(0.000\)	\(0.010\)	\(0.010\)	\(0.99\)	\(1.01\)	\(0.02\)	\(0.02\)

.
Aspect	Variance (\(s^2\))	Std. Dev. (\(s\))
Units	(original unit)\(^2\)	original unit
Interpretation	“Mean squared deviation”	“Average deviation from the mean”
Ease of communication	Abstract (squared units)	Concrete (\(\pm 5\) kg)