Machine Learning & Signals Learning

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $ \newcommand {\abs }[1]{\lvert #1\rvert } $ $ \DeclareMathOperator {\sign }{sign} $ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}$ $\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}$ $\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}$ $\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}$ $\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}$ $\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}$ $\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}$ $\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}$ $\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}$ $\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}$ $\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}$ $\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}$ $\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}$ $\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}$ $\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}$ $\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}$ $\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}$ $\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}$ $\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}$ $\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}$ $\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}$ $\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}$ $\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}$ $\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}$ $\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bt }{\mathbf {t}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bphi }{\bm {\phi }}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\btx }{\tilde {\bx }}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\SSE }{\mathrm {SSE}}$ $\newcommand {\MSE }{\mathrm {MSE}}$ $\newcommand {\RMSE }{\mathrm {RMSE}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$

19 Signal classification

Goal: Classify signal.

• Direct approach: distance metrics and classifier.

Distance metrics examples: MSE, DTW
• Feature approach: feature extraction and classifier.
• Shapelets: discriminative sub-sequences of time-series data and classifier.

19.1 Dynamic Time Warping (DTW)

Goal: “Elastic” distance metric between two signals.
- • Time-domain, non-linear metric that can stretch and compress the time axis to find the best possible match between sequences.
- • Applied for k-NN and other classifiers.
- • Historically, developed to compare same/different words spoken with different speed.

Definition

Two signal are defined [11],

\begin{equation} \begin{aligned} X = \left (x_1,\ldots ,x_m,\ldots ,x_M\right )\\ Y = \left (y_1,\ldots ,x_n,\ldots ,x_N\right )\\ \end {aligned} \end{equation}

The dissimilarity function is the input of DTW algorithm,

\begin{equation} d_{mn} = f(x_m,y_n)\ge 0 \end{equation}

For example, it can (1D) euclidean distance, $d_{mn} = \abs {x_m-y_n}$. The selected path has the following restrictions. Start and end points are $d_{11}$ and $d_{MN}$. Possible moves are monotonous,

• Vertical: $(m,n)\rightarrow (m+1,n)$
• Horizontal: $(m,n)\rightarrow (m,n+1)$
• Diagonal: $(m,n)\rightarrow (m+1,n+1)$

No skips or turns back are possible. The minimum distance path $\sum d_{mn}$ is calculated. This path can serve as metric of “distance” between two stretch signals.

Example 19.1: Two signals of different lengths: $X$ is a chirp of $M=1000$ samples and $Y$ is a sinusoid of $N=399$ samples (Fig. 19.1). The cost matrix $d_{mn}=\abs {x_m-y_n}$ and the optimal warping path are shown in Fig. 19.2. After alignment along the optimal path, the two signals match closely (Fig. 19.3), demonstrating how DTW stretches the time axis to find the best correspondence.

Multivariate DTW

Same as univariate, except distance definition. For example, for (complex signals) of dimension $K$ the euclidean distance is

\begin{equation} d_{mn} = \sqrt {\sum _{k=1}^K \left (x_{k,m}-y_{k,n}\right )\left (x_{k,m}-y_{k,n}\right )^*} \end{equation}

19.2 Shapelets

Goal: Classify time series by identifying short, discriminative sub-sequences (shapelets) that characterize each class.

A shapelet is a sub-sequence $S = (s_1,\ldots ,s_l)$ of length $l$ that is “maximally representative of a class” [20].

Shapelet distance

Given a shapelet $S$ of length $l$ and a signal $X = (x_1,\ldots ,x_M)$, the shapelet distance is the minimum distance over all possible alignments:

\begin{equation} d(S, X) = \min _{i=1,\ldots ,M-l+1} \sqrt {\frac {1}{l}\sum _{j=1}^{l}(s_j - x_{i+j-1})^2} \end{equation}

This sliding-window RMSE measures how well the shapelet matches anywhere in the signal (Fig. 19.4).

Classification

1. Extract shapelet candidates from the training set (sub-sequences of varying lengths).
2. For each candidate, compute its distance to every training signal.
3. Select the shapelet(s) whose distances best separate the classes (e.g. by information gain).
4. Use the shapelet distances as features for a classifier (e.g. decision tree, SVM).

Properties

• A signal may contain more than one discriminative shapelet; multiple shapelets can be combined as a feature vector.
• Shapelets are interpretable — they correspond to physically meaningful signal patterns.
• Optimized for time-domain analysis; less effective for signals whose discriminative information lies in repeated patterns or the frequency domain.
• Shapelet discovery can be computationally expensive ($O(M^2 \cdot l)$ per candidate); approximate and learned methods exist to reduce cost.

In Python: tslearn (shapelet learning), sktime (shapelet transform classifier).

19.3 Time-series forest

Goal: Classify time series using random forests built on summary features extracted from random intervals.

Time-series forest (TSF) is an interval-based method. Instead of operating on raw samples or shapelets, it extracts simple statistics from randomly chosen sub-intervals of the signal.

Algorithm

1. For each tree in the forest, randomly select $\sqrt {M}$ intervals $[a_i, b_i] \subset \{1,\ldots ,M\}$.
2. From each interval, extract three summary features: mean, standard deviation, and slope.
3. Train a decision tree on the resulting feature vector (of length $3\sqrt {M}$).
4. Repeat for all trees and aggregate predictions by majority vote.

Properties

• Somewhat interpretable: each split corresponds to a statistic computed over a specific time interval.
• Computationally efficient compared to shapelet-based methods.
• Captures interval-level (phase) information rather than point-level or subsequence-level patterns.

Variants

Random Interval Spectral Ensemble (RISE) replaces the time-domain summary statistics with spectral features (e.g. periodogram coefficients), making it effective for signals whose discriminative information lies in the frequency domain.

In Python: sktime (interval-based classifiers including TSF and RISE).

19.4 Symbolic aggregate approximation (SAX)

Goal: Convert a real-valued time series into a discrete symbolic string, enabling the use of text-mining techniques for classification.

Algorithm

1. Piecewise Aggregate Approximation (PAA): divide the signal of length $M$ into $w$ equal-length segments and replace each segment by its mean, producing a reduced representation of length $w$.
2. Symbolization: map each mean value to a symbol from an alphabet of size $\alpha $ using breakpoints derived from the standard normal distribution (assuming z-normalized data). The result is a string of length $w$ over the alphabet $\{a, b, \ldots \}$.

Example 19.2: A signal of $M=128$ samples is converted to SAX with $w=8$ segments and alphabet size $\alpha =4$. The PAA reduces the signal to 8 mean values (Fig. 19.5, left). Each mean is mapped to a symbol based on the breakpoints of the standard normal distribution (center). The resulting SAX string is shown on the right.

SAX-VSM

SAX combined with the Vector Space Model (SAX-VSM) [18] applies text-classification ideas to time-series classification:

1. Convert each training signal to a SAX string using a sliding window, producing a “bag of words.”
2. Build a tf-idf weighted term-frequency vector for each class.
3. Classify a new signal by computing the cosine similarity of its bag-of-words vector to each class vector.

Properties

• Interpretable: discriminative patterns correspond to readable symbolic words.
• Dimensionality reduction is controlled by two parameters: number of segments $w$ and alphabet size $\alpha $.
• Sensitive to the choice of $w$ and $\alpha $; typically selected by cross-validation.

In Python: tslearn (PAA, SAX), sktime (SAX-based classifiers).

19.5 ROCKET

Goal: Classify time series by transforming them with a large number of random convolutional kernels, then training a linear classifier on the resulting features.

RandOm Convolutional KErnel Transform (ROCKET) [7].

Algorithm

1. Generate a large number (e.g. $10{,}000$) of random convolutional kernels with random length, weights, bias, dilation, and padding.
2. Convolve each kernel with the input signal.
3. From each convolution output, extract two features: the maximum value and the proportion of positive values (ppv).
4. Train a linear classifier (e.g. ridge regression) on the resulting feature vector.

The key insight is that a sufficiently large collection of random kernels captures diverse patterns (trends, spikes, oscillations at various scales) without any learning or search over the kernel space.

Properties

• May be slow: training scales linearly with the number of signals and kernels.
• Accuracy comparable to deep learning and ensemble methods (e.g. HIVE-COTE).
• Not interpretable: random kernels have no direct physical meaning.

MINIROCKET

MINIROCKET [8] is a faster, near-deterministic variant. The main differences are summarized below.

.
	ROCKET	MINIROCKET
Kernel weights	Random (continuous)	Fixed $\{-1,\,2\}$
Features	Max value + ppv	ppv only
Determinism	Stochastic	Near-deterministic
Speed	Fast	Up to $75\times $ faster

In Python: sktime (RocketClassifier, MiniRocketClassifier).

19.6 HIVE-COTE

Goal: Achieve state-of-the-art accuracy by combining classifiers that operate on different signal representations.

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) [16] is a meta-ensemble that fuses the outputs of several diverse classifiers, each operating on a different signal domain:

• Time domain (e.g. shapelet transform, TSF).
• Frequency domain (e.g. RISE).
• Dictionary-based representation (e.g. SAX-based methods).
• Interval-based and whole-series distance-based classifiers.

Each component classifier produces a probability distribution over the classes. HIVE-COTE combines them using a weighted vote, where the weights reflect each component’s estimated accuracy.

Properties

• Among the most accurate non-DL classifiers on standard benchmarks.
• Computationally expensive: trains multiple full classifiers.
• Modular: individual components can be replaced or upgraded independently.

19.7 Summary

Method comparison

The methods presented in this chapter are compared below. The “domain” column indicates the signal representation used by each method. Accuracy and speed are qualitative rankings on standard benchmarks.

.
Method	Domain	Accuracy	Speed	Interpretable
DTW + $k$-NN	Time (distance)	Moderate	Slow	No
Shapelets	Time (sub-sequence)	Moderate	Slow	Yes
TSF / RISE	Time / Frequency (interval)	Moderate	Fast	Partially
SAX-VSM	Symbolic	Moderate	Fast	Yes
ROCKET	Time (random convolution)	High	Fast	No
HIVE-COTE	Multi-domain (ensemble)	Highest	Very slow	No

Classical vs. DL approaches

.
	Classical (non-DL)	DL
Signal modeling	Not required	Not required
Multivariate signals	Less effective	Naturally supported
Interpretability	Limited (except shapelets, SAX)	Limited
Generalization	Low	High
Complexity	High vs. model-based	High
Feature engineering	Required (hand-crafted)	Learned automatically
Data requirements	Moderate	Large (orders of magnitude more)
Hyper-parameters	Few	Many; optimization required
Non-linear signals	Supported	Supported