Machine Learning & Signals Learning
\(\newcommand{\footnotename}{footnote}\)
\(\def \LWRfootnote {1}\)
\(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\)
\(\let \LWRorighspace \hspace \)
\(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\)
\(\newcommand {\TextOrMath }[2]{#2}\)
\(\newcommand {\mathnormal }[1]{{#1}}\)
\(\newcommand \ensuremath [1]{#1}\)
\(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \)
\(\newcommand {\setlength }[2]{}\)
\(\newcommand {\addtolength }[2]{}\)
\(\newcommand {\setcounter }[2]{}\)
\(\newcommand {\addtocounter }[2]{}\)
\(\newcommand {\arabic }[1]{}\)
\(\newcommand {\number }[1]{}\)
\(\newcommand {\noalign }[1]{\text {#1}\notag \\}\)
\(\newcommand {\cline }[1]{}\)
\(\newcommand {\directlua }[1]{\text {(directlua)}}\)
\(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\)
\(\newcommand {\protect }{}\)
\(\def \LWRabsorbnumber #1 {}\)
\(\def \LWRabsorbquotenumber "#1 {}\)
\(\newcommand {\LWRabsorboption }[1][]{}\)
\(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\)
\(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\)
\(\def \mathcode #1={\mathchar }\)
\(\let \delcode \mathcode \)
\(\let \delimiter \mathchar \)
\(\def \oe {\unicode {x0153}}\)
\(\def \OE {\unicode {x0152}}\)
\(\def \ae {\unicode {x00E6}}\)
\(\def \AE {\unicode {x00C6}}\)
\(\def \aa {\unicode {x00E5}}\)
\(\def \AA {\unicode {x00C5}}\)
\(\def \o {\unicode {x00F8}}\)
\(\def \O {\unicode {x00D8}}\)
\(\def \l {\unicode {x0142}}\)
\(\def \L {\unicode {x0141}}\)
\(\def \ss {\unicode {x00DF}}\)
\(\def \SS {\unicode {x1E9E}}\)
\(\def \dag {\unicode {x2020}}\)
\(\def \ddag {\unicode {x2021}}\)
\(\def \P {\unicode {x00B6}}\)
\(\def \copyright {\unicode {x00A9}}\)
\(\def \pounds {\unicode {x00A3}}\)
\(\let \LWRref \ref \)
\(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\)
\( \newcommand {\multicolumn }[3]{#3}\)
\(\require {textcomp}\)
\( \newcommand {\abs }[1]{\lvert #1\rvert } \)
\( \DeclareMathOperator {\sign }{sign} \)
\(\newcommand {\intertext }[1]{\text {#1}\notag \\}\)
\(\let \Hat \hat \)
\(\let \Check \check \)
\(\let \Tilde \tilde \)
\(\let \Acute \acute \)
\(\let \Grave \grave \)
\(\let \Dot \dot \)
\(\let \Ddot \ddot \)
\(\let \Breve \breve \)
\(\let \Bar \bar \)
\(\let \Vec \vec \)
\(\newcommand {\bm }[1]{\boldsymbol {#1}}\)
\(\require {physics}\)
\(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\)
\(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\)
\(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\)
\(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\)
\(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\)
\(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\)
\(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\)
\(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\)
\(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\)
\(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\)
\(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\)
\(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\)
\(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\)
\(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\)
\(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\)
\(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\)
\(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\)
\(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\)
\(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\)
\(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\)
\(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\)
\(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\)
\(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\)
\(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\)
\(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\)
\(\require {cancel}\)
\(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\)
\(\DeclareMathOperator *{\argmax }{argmax}\)
\(\DeclareMathOperator *{\argmin }{arg\,min}\)
\(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\)
\(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\)
\(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\)
\(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\)
\(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\)
\(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\)
\(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\)
\(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\)
\(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\)
\(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\)
\(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\)
\(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\)
\(\renewcommand {\real }{\mathbb {R}}\)
\(\newcommand {\ba }{\mathbf {a}}\)
\(\newcommand {\bb }{\mathbf {b}}\)
\(\newcommand {\bd }{\mathbf {d}}\)
\(\newcommand {\be }{\mathbf {e}}\)
\(\newcommand {\bh }{\mathbf {h}}\)
\(\newcommand {\bn }{\mathbf {n}}\)
\(\newcommand {\bq }{\mathbf {q}}\)
\(\newcommand {\br }{\mathbf {r}}\)
\(\newcommand {\bt }{\mathbf {t}}\)
\(\newcommand {\bv }{\mathbf {v}}\)
\(\newcommand {\bw }{\mathbf {w}}\)
\(\newcommand {\bx }{\mathbf {x}}\)
\(\newcommand {\bxx }{\mathbf {xx}}\)
\(\newcommand {\bxy }{\mathbf {xy}}\)
\(\newcommand {\by }{\mathbf {y}}\)
\(\newcommand {\byy }{\mathbf {yy}}\)
\(\newcommand {\bz }{\mathbf {z}}\)
\(\newcommand {\bA }{\mathbf {A}}\)
\(\newcommand {\bB }{\mathbf {B}}\)
\(\newcommand {\bI }{\mathbf {I}}\)
\(\newcommand {\bK }{\mathbf {K}}\)
\(\newcommand {\bP }{\mathbf {P}}\)
\(\newcommand {\bQ }{\mathbf {Q}}\)
\(\newcommand {\bR }{\mathbf {R}}\)
\(\newcommand {\bU }{\mathbf {U}}\)
\(\newcommand {\bW }{\mathbf {W}}\)
\(\newcommand {\bX }{\mathbf {X}}\)
\(\newcommand {\bY }{\mathbf {Y}}\)
\(\newcommand {\bZ }{\mathbf {Z}}\)
\(\newcommand {\balpha }{\bm {\alpha }}\)
\(\newcommand {\bth }{{\bm {\theta }}}\)
\(\newcommand {\bepsilon }{{\bm {\epsilon }}}\)
\(\newcommand {\bmu }{{\bm {\mu }}}\)
\(\newcommand {\bOne }{\mathbf {1}}\)
\(\newcommand {\bZero }{\mathbf {0}}\)
\(\newcommand {\loss }{\mathcal {L}}\)
\(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\)
\(\newcommand {\SSE }{\mathrm {SSE}}\)
\(\newcommand {\MSE }{\mathrm {MSE}}\)
\(\newcommand {\RMSE }{\mathrm {RMSE}}\)
\(\newcommand {\toprule }[1][]{\hline }\)
\(\let \midrule \toprule \)
\(\let \bottomrule \toprule \)
\(\def \LWRbooktabscmidruleparen (#1)#2{}\)
\(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\)
\(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\)
\(\newcommand {\morecmidrules }{}\)
\(\newcommand {\specialrule }[3]{\hline }\)
\(\newcommand {\addlinespace }[1][]{}\)
\(\newcommand {\LWRsubmultirow }[2][]{#2}\)
\(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\)
\(\newcommand {\multirow }[2][]{\LWRmultirow }\)
\(\newcommand {\mrowcell }{}\)
\(\newcommand {\mcolrowcell }{}\)
\(\newcommand {\STneed }[1]{}\)
\(\newcommand {\tcbset }[1]{}\)
\(\newcommand {\tcbsetforeverylayer }[1]{}\)
\(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\)
\(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\)
\(\newcommand {\tcblower }{}\)
\(\newcommand {\tcbline }{}\)
\(\newcommand {\tcbtitle }{}\)
\(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\)
\(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\)
\(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\)
19 Regression Metrics
Given targets \(y[n]\) and predictions \(\hat y[n]\), \(n=0,\ldots ,L-1\), the prediction error (residual) is
\(\seteqnumber{0}{}{0}\)
\begin{equation}
e[n]=y[n]-\hat y[n].
\end{equation}
19.1 Scale-Dependent Metrics
Scale-dependent metrics retain the original units of the data, making them appropriate when the absolute error magnitude is meaningful.
MSE and RMSE
\(\seteqnumber{0}{}{1}\)
\begin{equation}
\mathrm {MSE}=\frac {1}{L}\sum _{n} e^2[n],\qquad \mathrm {RMSE}=\sqrt {\mathrm {MSE}}.
\end{equation}
Heavily penalizes large errors; sensitive to outliers. MSE is also commonly used as a training loss (differentiable, convex).
MAE and MedianAE
\(\seteqnumber{0}{}{2}\)
\begin{equation}
\mathrm {MAE}=\frac {1}{L}\sum _{n}|e[n]|,\qquad \mathrm {MedAE}=\operatorname {median}_n\,|e[n]|.
\end{equation}
Robust to outliers (especially MedAE). MAE is also used as a training loss (convex, but nondifferentiable at \(0\)).
19.2 Scale-Free Metrics
When comparing forecasts across series with different units or scales, scale-free metrics normalize the error so that results are comparable.
MAPE and sMAPE
\(\seteqnumber{0}{}{3}\)
\begin{equation}
\mathrm {MAPE}=\frac {100}{L}\sum _n \frac {|e[n]|}{|y[n]|+\varepsilon },\qquad \mathrm {sMAPE}=\frac {100}{L}\sum _n \frac {2|e[n]|}{|y[n]|+|\hat y[n]|+\varepsilon }.
\end{equation}
A small \(\varepsilon \) is added to avoid division by zero. sMAPE is bounded in \([0,200]\%\) but remains biased when target values are close to zero.
MASE (Mean Absolute Scaled Error)
MASE normalizes the MAE by the error of an in-sample naive forecast (\(\hat {y}[n]=y[n-1]\)):
\(\seteqnumber{0}{}{4}\)
\begin{equation}
\mathrm {MASE}= \frac {\frac {1}{L}\sum _{n}|e[n]|} {\frac {1}{L-1}\sum _{n=1}^{L-1}|y[n]-y[n-1]|}.
\end{equation}
MASE \(<1\) means the model outperforms the naive baseline. The metric is robust across scales and can be extended to a seasonal naive denominator when seasonality is present.
NRMSE
NRMSE normalizes the RMSE by a measure of the target’s spread:
\(\seteqnumber{0}{}{5}\)
\begin{equation}
\mathrm {NRMSE}_{\sigma }=\frac {\mathrm {RMSE}}{\sigma _y},\qquad \mathrm {NRMSE}_{\mathrm {range}}=\frac {\mathrm {RMSE}}{\max y-\min y}.
\end{equation}
Example 19.1 : Two predictors are compared on the same target signal (Fig. 19.1 ). Predictor A has small, uniformly distributed errors. Predictor B is more accurate on most samples but contains three large outliers. The bar chart shows how
different metrics rank the two: RMSE penalizes the outliers heavily (B is much worse), MAE is similar for both, and MedAE favors B (ignoring the outliers entirely). This illustrates why reporting multiple metrics provides a more
complete picture of prediction quality.
Figure 19.1: Effect of outliers on metric values. Predictor A has uniform noise; Predictor B has sparse large errors. RMSE is sensitive to outliers, MAE is moderate, and MedAE is robust.
19.3 Information Criteria
Information criteria balance model fit against complexity to prevent overfitting. Given log-likelihood \(\ln \mathcal {L}\), number of parameters \(k\), and sample size \(L\):
\(\seteqnumber{0}{}{6}\)
\begin{equation}
\mathrm {AIC}=2k-2\ln \mathcal {L},\qquad \mathrm {BIC}=k\ln L-2\ln \mathcal {L}.
\end{equation}
Lower values indicate a better trade-off. AIC favors predictive fit; BIC penalizes complexity more strongly, preferring simpler models. The corrected variant
\(\seteqnumber{0}{}{7}\)
\begin{equation}
\mathrm {AICc}=\mathrm {AIC}+\frac {2k(k+1)}{L-k-1}
\end{equation}
adds a finite-sample correction and should be preferred when \(L/k\) is small.
19.4 Summary
.
Metric
Scale-free
Outlier-robust
Also used as loss
MSE / RMSE
No
No
Yes
MAE / MedAE
No
Yes
Yes (MAE)
MAPE / sMAPE
Yes
No
No
MASE
Yes
Yes
No
NRMSE
Yes
No
No