Machine Learning & Signals Learning
\(\newcommand{\footnotename}{footnote}\)
\(\def \LWRfootnote {1}\)
\(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\let \LWRorighspace \hspace \)
\(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\)
\(\newcommand {\TextOrMath }[2]{#2}\)
\(\newcommand {\mathnormal }[1]{{#1}}\)
\(\newcommand \ensuremath [1]{#1}\)
\(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \)
\(\newcommand {\setlength }[2]{}\)
\(\newcommand {\addtolength }[2]{}\)
\(\newcommand {\setcounter }[2]{}\)
\(\newcommand {\addtocounter }[2]{}\)
\(\newcommand {\arabic }[1]{}\)
\(\newcommand {\number }[1]{}\)
\(\newcommand {\noalign }[1]{\text {#1}\notag \\}\)
\(\newcommand {\cline }[1]{}\)
\(\newcommand {\directlua }[1]{\text {(directlua)}}\)
\(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\)
\(\newcommand {\protect }{}\)
\(\def \LWRabsorbnumber #1 {}\)
\(\def \LWRabsorbquotenumber "#1 {}\)
\(\newcommand {\LWRabsorboption }[1][]{}\)
\(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\)
\(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\)
\(\def \mathcode #1={\mathchar }\)
\(\let \delcode \mathcode \)
\(\let \delimiter \mathchar \)
\(\def \oe {\unicode {x0153}}\)
\(\def \OE {\unicode {x0152}}\)
\(\def \ae {\unicode {x00E6}}\)
\(\def \AE {\unicode {x00C6}}\)
\(\def \aa {\unicode {x00E5}}\)
\(\def \AA {\unicode {x00C5}}\)
\(\def \o {\unicode {x00F8}}\)
\(\def \O {\unicode {x00D8}}\)
\(\def \l {\unicode {x0142}}\)
\(\def \L {\unicode {x0141}}\)
\(\def \ss {\unicode {x00DF}}\)
\(\def \SS {\unicode {x1E9E}}\)
\(\def \dag {\unicode {x2020}}\)
\(\def \ddag {\unicode {x2021}}\)
\(\def \P {\unicode {x00B6}}\)
\(\def \copyright {\unicode {x00A9}}\)
\(\def \pounds {\unicode {x00A3}}\)
\(\let \LWRref \ref \)
\(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\)
\( \newcommand {\multicolumn }[3]{#3}\)
\(\require {textcomp}\)
\( \newcommand {\abs }[1]{\lvert #1\rvert } \)
\( \DeclareMathOperator {\sign }{sign} \)
\(\newcommand {\intertext }[1]{\text {#1}\notag \\}\)
\(\let \Hat \hat \)
\(\let \Check \check \)
\(\let \Tilde \tilde \)
\(\let \Acute \acute \)
\(\let \Grave \grave \)
\(\let \Dot \dot \)
\(\let \Ddot \ddot \)
\(\let \Breve \breve \)
\(\let \Bar \bar \)
\(\let \Vec \vec \)
\(\newcommand {\bm }[1]{\boldsymbol {#1}}\)
\(\require {physics}\)
\(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\)
\(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\)
\(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\)
\(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\)
\(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\)
\(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\)
\(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\)
\(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\)
\(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\)
\(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\)
\(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\)
\(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\)
\(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\)
\(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\)
\(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\)
\(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\)
\(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\)
\(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\)
\(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\)
\(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\)
\(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\)
\(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\)
\(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\)
\(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\)
\(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\)
\(\require {cancel}\)
\(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\)
\(\DeclareMathOperator *{\argmax }{argmax}\)
\(\DeclareMathOperator *{\argmin }{arg\,min}\)
\(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\)
\(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\)
\(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\)
\(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\)
\(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\)
\(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\)
\(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\)
\(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\)
\(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\)
\(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\)
\(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\)
\(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\)
\(\renewcommand {\real }{\mathbb {R}}\)
\(\newcommand {\ba }{\mathbf {a}}\)
\(\newcommand {\bb }{\mathbf {b}}\)
\(\newcommand {\bd }{\mathbf {d}}\)
\(\newcommand {\be }{\mathbf {e}}\)
\(\newcommand {\bh }{\mathbf {h}}\)
\(\newcommand {\bn }{\mathbf {n}}\)
\(\newcommand {\bq }{\mathbf {q}}\)
\(\newcommand {\br }{\mathbf {r}}\)
\(\newcommand {\bt }{\mathbf {t}}\)
\(\newcommand {\bv }{\mathbf {v}}\)
\(\newcommand {\bw }{\mathbf {w}}\)
\(\newcommand {\bx }{\mathbf {x}}\)
\(\newcommand {\bxx }{\mathbf {xx}}\)
\(\newcommand {\bxy }{\mathbf {xy}}\)
\(\newcommand {\by }{\mathbf {y}}\)
\(\newcommand {\byy }{\mathbf {yy}}\)
\(\newcommand {\bz }{\mathbf {z}}\)
\(\newcommand {\bA }{\mathbf {A}}\)
\(\newcommand {\bB }{\mathbf {B}}\)
\(\newcommand {\bI }{\mathbf {I}}\)
\(\newcommand {\bK }{\mathbf {K}}\)
\(\newcommand {\bP }{\mathbf {P}}\)
\(\newcommand {\bQ }{\mathbf {Q}}\)
\(\newcommand {\bR }{\mathbf {R}}\)
\(\newcommand {\bU }{\mathbf {U}}\)
\(\newcommand {\bW }{\mathbf {W}}\)
\(\newcommand {\bX }{\mathbf {X}}\)
\(\newcommand {\bY }{\mathbf {Y}}\)
\(\newcommand {\bZ }{\mathbf {Z}}\)
\(\newcommand {\balpha }{\bm {\alpha }}\)
\(\newcommand {\bth }{{\bm {\theta }}}\)
\(\newcommand {\bepsilon }{{\bm {\epsilon }}}\)
\(\newcommand {\bmu }{{\bm {\mu }}}\)
\(\newcommand {\bphi }{\bm {\phi }}\)
\(\newcommand {\bOne }{\mathbf {1}}\)
\(\newcommand {\bZero }{\mathbf {0}}\)
\(\newcommand {\btx }{\tilde {\bx }}\)
\(\newcommand {\loss }{\mathcal {L}}\)
\(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\)
\(\newcommand {\SSE }{\mathrm {SSE}}\)
\(\newcommand {\MSE }{\mathrm {MSE}}\)
\(\newcommand {\RMSE }{\mathrm {RMSE}}\)
\(\newcommand {\toprule }[1][]{\hline }\)
\(\let \midrule \toprule \)
\(\let \bottomrule \toprule \)
\(\def \LWRbooktabscmidruleparen (#1)#2{}\)
\(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\)
\(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\)
\(\newcommand {\morecmidrules }{}\)
\(\newcommand {\specialrule }[3]{\hline }\)
\(\newcommand {\addlinespace }[1][]{}\)
\(\newcommand {\LWRsubmultirow }[2][]{#2}\)
\(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\)
\(\newcommand {\multirow }[2][]{\LWRmultirow }\)
\(\newcommand {\mrowcell }{}\)
\(\newcommand {\mcolrowcell }{}\)
\(\newcommand {\STneed }[1]{}\)
\(\newcommand {\tcbset }[1]{}\)
\(\newcommand {\tcbsetforeverylayer }[1]{}\)
\(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\)
\(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\)
\(\newcommand {\tcblower }{}\)
\(\newcommand {\tcbline }{}\)
\(\newcommand {\tcbtitle }{}\)
\(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\)
\(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\)
\(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\)
20 Regression Metrics
Given targets \(y[n]\) and predictions \(\hat y[n]\), \(n=0,\ldots ,L-1\), the prediction error (residual) is
\(\seteqnumber{0}{}{0}\)
\begin{equation}
e[n]=y[n]-\hat y[n].
\end{equation}
20.1 Scale-Dependent Metrics
Scale-dependent metrics retain the original units of the data, making them appropriate when the absolute error magnitude is meaningful.
MSE and RMSE
\(\seteqnumber{0}{}{1}\)
\begin{equation}
\mathrm {MSE}=\frac {1}{L}\sum _{n} e^2[n],\qquad \mathrm {RMSE}=\sqrt {\mathrm {MSE}}.
\end{equation}
Heavily penalizes large errors; sensitive to outliers. MSE is also commonly used as a training loss (differentiable, convex).
MAE and MedianAE
\(\seteqnumber{0}{}{2}\)
\begin{equation}
\mathrm {MAE}=\frac {1}{L}\sum _{n}|e[n]|,\qquad \mathrm {MedAE}=\operatorname {median}_n\,|e[n]|.
\end{equation}
Robust to outliers (especially MedAE). MAE is also used as a training loss (convex, but nondifferentiable at \(0\)).
20.2 Scale-Free Metrics
When comparing forecasts across series with different units or scales, scale-free metrics normalize the error so that results are comparable.
MAPE and sMAPE
\(\seteqnumber{0}{}{3}\)
\begin{equation}
\mathrm {MAPE}=\frac {100}{L}\sum _n \frac {|e[n]|}{|y[n]|+\varepsilon },\qquad \mathrm {sMAPE}=\frac {100}{L}\sum _n \frac {2|e[n]|}{|y[n]|+|\hat y[n]|+\varepsilon }.
\end{equation}
A small \(\varepsilon \) is added to avoid division by zero. sMAPE is bounded in \([0,200]\%\) but remains biased when target values are close to zero.
MASE (Mean Absolute Scaled Error)
MASE normalizes the MAE by the error of an in-sample naive forecast (\(\hat {y}[n]=y[n-1]\)):
\(\seteqnumber{0}{}{4}\)
\begin{equation}
\mathrm {MASE}= \frac {\frac {1}{L}\sum _{n}|e[n]|} {\frac {1}{L-1}\sum _{n=1}^{L-1}|y[n]-y[n-1]|}.
\end{equation}
MASE \(<1\) means the model outperforms the naive baseline. The metric is robust across scales and can be extended to a seasonal naive denominator when seasonality is present.
NRMSE
NRMSE normalizes the RMSE by a measure of the target’s spread:
\(\seteqnumber{0}{}{5}\)
\begin{equation}
\mathrm {NRMSE}_{\sigma }=\frac {\mathrm {RMSE}}{\sigma _y},\qquad \mathrm {NRMSE}_{\mathrm {range}}=\frac {\mathrm {RMSE}}{\max y-\min y}.
\end{equation}
Example 20.1 : Two predictors are compared on the same target signal (Fig. 20.1 ). Predictor A has small, uniformly distributed errors. Predictor B is more accurate on most samples but contains three large outliers. The bar chart shows how
different metrics rank the two: RMSE penalizes the outliers heavily (B is much worse), MAE is similar for both, and MedAE favors B (ignoring the outliers entirely). This illustrates why reporting multiple metrics provides a more
complete picture of prediction quality.
Figure 20.1: Effect of outliers on metric values. Predictor A has uniform noise; Predictor B has sparse large errors. RMSE is sensitive to outliers, MAE is moderate, and MedAE is robust.
20.3 Information Criteria
For a probabilistic model with parameters \(\boldsymbol {\theta }\), the likelihood \(\mathcal {L}(\boldsymbol {\theta })\) is the probability that the model assigns to the observed data, and parameters are
commonly estimated by maximizing it. Its logarithm, the log-likelihood \(\ln \mathcal {L}\), is preferred because it turns products into sums and is numerically stable; a higher \(\ln \mathcal {L}\) means a better fit
to the data. For regression with Gaussian residuals, maximizing the log-likelihood is equivalent to minimizing the MSE, and up to an additive constant
\(\seteqnumber{0}{}{6}\)
\begin{equation}
-2\ln \mathcal {L} \;=\; L\,\ln (\mathrm {MSE}) + \text {const},
\end{equation}
which links the formulas below to the MSE-based variants.
Adding parameters always improves the training log-likelihood, which encourages overfitting. Information criteria replace the raw log-likelihood with a penalized score that balances goodness-of-fit against the number of
parameters \(k\), given \(L\) samples. Lower values indicate a better model.
Akaike’s Information Criterion (AIC)
AIC trades off goodness-of-fit against complexity:
\(\seteqnumber{0}{}{7}\)
\begin{equation}
\mathrm {AIC}=2k-2\ln \mathcal {L}.
\end{equation}
AICc adds a small-sample correction and should be preferred when \(L/k\) is small:
\(\seteqnumber{0}{}{8}\)
\begin{equation}
\mathrm {AICc}=\mathrm {AIC}+\frac {2k(k+1)}{L-k-1}.
\end{equation}
Bayesian Information Criterion (BIC)
BIC applies a stronger penalty on the number of parameters, especially as the sample size \(L\) grows, so it tends to favor simpler models than AIC:
\(\seteqnumber{0}{}{9}\)
\begin{equation}
\mathrm {BIC}=k\ln L-2\ln \mathcal {L}.
\end{equation}
These criteria are computed from the fitted log-likelihood and the parameter count, requiring no additional model fitting, and they enable principled comparison between models of different complexity. Their main limitation is
sensitivity to the assumed likelihood (commonly Gaussian residuals); AIC and BIC can also disagree about which model is best, so no single criterion is universally preferred. The MSE-based forms of these criteria, together with
Akaike’s Final Prediction Error (FPE), are given in Sec. 6.5 .
20.4 Summary
.
Metric
Scale-free
Outlier-robust
Also used as loss
MSE / RMSE
No
No
Yes
MAE / MedAE
No
Yes
Yes (MAE)
MAPE / sMAPE
Yes
No
No
MASE
Yes
Yes
No
NRMSE
Yes
No
No