Machine Learning & Signals Learning

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\TextOrMath }[2]{#2}\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \( \newcommand {\abs }[1]{\lvert #1\rvert } \) \( \DeclareMathOperator {\sign }{sign} \) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\) \(\require {physics}\) \(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\) \(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\) \(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\) \(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\) \(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\) \(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\) \(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\) \(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\) \(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\) \(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\) \(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\) \(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\) \(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\) \(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\) \(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\) \(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\) \(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\) \(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\) \(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\) \(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\) \(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\) \(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\) \(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\) \(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\) \(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\) \(\require {cancel}\) \(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\) \(\DeclareMathOperator *{\argmax }{argmax}\) \(\DeclareMathOperator *{\argmin }{arg\,min}\) \(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\) \(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\) \(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\) \(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\) \(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\) \(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\) \(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\) \(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\) \(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\) \(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\) \(\renewcommand {\real }{\mathbb {R}}\) \(\newcommand {\ba }{\mathbf {a}}\) \(\newcommand {\bb }{\mathbf {b}}\) \(\newcommand {\bd }{\mathbf {d}}\) \(\newcommand {\be }{\mathbf {e}}\) \(\newcommand {\bh }{\mathbf {h}}\) \(\newcommand {\bn }{\mathbf {n}}\) \(\newcommand {\bq }{\mathbf {q}}\) \(\newcommand {\br }{\mathbf {r}}\) \(\newcommand {\bt }{\mathbf {t}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\bw }{\mathbf {w}}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\bxx }{\mathbf {xx}}\) \(\newcommand {\bxy }{\mathbf {xy}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\byy }{\mathbf {yy}}\) \(\newcommand {\bz }{\mathbf {z}}\) \(\newcommand {\bA }{\mathbf {A}}\) \(\newcommand {\bB }{\mathbf {B}}\) \(\newcommand {\bI }{\mathbf {I}}\) \(\newcommand {\bK }{\mathbf {K}}\) \(\newcommand {\bP }{\mathbf {P}}\) \(\newcommand {\bQ }{\mathbf {Q}}\) \(\newcommand {\bR }{\mathbf {R}}\) \(\newcommand {\bU }{\mathbf {U}}\) \(\newcommand {\bW }{\mathbf {W}}\) \(\newcommand {\bX }{\mathbf {X}}\) \(\newcommand {\bY }{\mathbf {Y}}\) \(\newcommand {\bZ }{\mathbf {Z}}\) \(\newcommand {\balpha }{\bm {\alpha }}\) \(\newcommand {\bth }{{\bm {\theta }}}\) \(\newcommand {\bepsilon }{{\bm {\epsilon }}}\) \(\newcommand {\bmu }{{\bm {\mu }}}\) \(\newcommand {\bOne }{\mathbf {1}}\) \(\newcommand {\bZero }{\mathbf {0}}\) \(\newcommand {\loss }{\mathcal {L}}\) \(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\) \(\newcommand {\SSE }{\mathrm {SSE}}\) \(\newcommand {\MSE }{\mathrm {MSE}}\) \(\newcommand {\RMSE }{\mathrm {RMSE}}\) \(\newcommand {\toprule }[1][]{\hline }\) \(\let \midrule \toprule \) \(\let \bottomrule \toprule \) \(\def \LWRbooktabscmidruleparen (#1)#2{}\) \(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\) \(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\) \(\newcommand {\morecmidrules }{}\) \(\newcommand {\specialrule }[3]{\hline }\) \(\newcommand {\addlinespace }[1][]{}\) \(\newcommand {\LWRsubmultirow }[2][]{#2}\) \(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\) \(\newcommand {\multirow }[2][]{\LWRmultirow }\) \(\newcommand {\mrowcell }{}\) \(\newcommand {\mcolrowcell }{}\) \(\newcommand {\STneed }[1]{}\) \(\newcommand {\tcbset }[1]{}\) \(\newcommand {\tcbsetforeverylayer }[1]{}\) \(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\) \(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\) \(\newcommand {\tcblower }{}\) \(\newcommand {\tcbline }{}\) \(\newcommand {\tcbtitle }{}\) \(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\) \(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\) \(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\)

17 Signal classification

  • Goal: Classify signal.

  • Direct approach: distance metrics and classifier.

    Distance metrics examples: MSE, DTW

  • Feature approach: feature extraction and classifier.

  • Shapelets: discriminative sub-sequences of time-series data and classifier.

17.1 Dynamic Time Warping (DTW)

  • Goal: “Elastic” distance metric between two signals.

    • Time-domain, non-linear metric that can stretch and compress the time axis to find the best possible match between sequences.

    • Applied for k-NN and other classifiers.

    • Historically, developed to compare same/different words spoken with different speed.

Definition

Two signal are defined [?],

\begin{equation} \begin{aligned} X = \left (x_1,\ldots ,x_m,\ldots ,x_M\right )\\ Y = \left (y_1,\ldots ,x_n,\ldots ,x_N\right )\\ \end {aligned} \end{equation}

The dissimilarity function is the input of DTW algorithm,

\begin{equation} d_{mn} = f(x_m,y_n)\ge 0 \end{equation}

For example, it can (1D) euclidean distance, \(d_{mn} = \abs {x_m-y_n}\). The selected path has the following restrictions. Start and end points are \(d_{11}\) and \(d_{MN}\). Possible moves are monotonous,

  • Vertical: \((m,n)\rightarrow (m+1,n)\)

  • Horizontal: \((m,n)\rightarrow (m,n+1)\)

  • Diagonal: \((m,n)\rightarrow (m+1,n+1)\)

No skips or turns back are possible. The minimum distance path \(\sum d_{mn}\) is calculated. This path can serve as metric of “distance” between two stretch signals.

  • Example 17.1: Two signals of different lengths: \(X\) is a chirp of \(M=1000\) samples and \(Y\) is a sinusoid of \(N=399\) samples (Fig. 17.1). The cost matrix \(d_{mn}=\abs {x_m-y_n}\) and the optimal warping path are shown in Fig. 17.2. After alignment along the optimal path, the two signals match closely (Fig. 17.3), demonstrating how DTW stretches the time axis to find the best correspondence.

(image)

Figure 17.1: Signal examples.

(image)

Figure 17.2: \(d_{mn}\) with the optimal path.

(image)

Figure 17.3: Aligned signals.
Multivariate DTW

Same as univariate, except distance definition. For example, for (complex signals) of dimension \(K\) the euclidean distance is

\begin{equation} d_{mn} = \sqrt {\sum _{k=1}^K \left (x_{k,m}-y_{k,n}\right )\left (x_{k,m}-y_{k,n}\right )^*} \end{equation}

17.2 Shapelets

  • Goal: Classify time series by identifying short, discriminative sub-sequences (shapelets) that characterize each class.

A shapelet is a sub-sequence \(S = (s_1,\ldots ,s_l)\) of length \(l\) that is “maximally representative of a class” [?].

Shapelet distance

Given a shapelet \(S\) of length \(l\) and a signal \(X = (x_1,\ldots ,x_M)\), the shapelet distance is the minimum distance over all possible alignments:

\begin{equation} d(S, X) = \min _{i=1,\ldots ,M-l+1} \sqrt {\frac {1}{l}\sum _{j=1}^{l}(s_j - x_{i+j-1})^2} \end{equation}

This sliding-window RMSE measures how well the shapelet matches anywhere in the signal (Fig. 17.4).

Classification
  • 1. Extract shapelet candidates from the training set (sub-sequences of varying lengths).

  • 2. For each candidate, compute its distance to every training signal.

  • 3. Select the shapelet(s) whose distances best separate the classes (e.g. by information gain).

  • 4. Use the shapelet distances as features for a classifier (e.g. decision tree, SVM).

(image)

Figure 17.4: Shapelet-based classification. Left: two signal classes (Class A contains a characteristic bump). Center: the shapelet (orange) extracted from Class A. Right: distance profile — Class A has a near-zero minimum distance, while Class B remains high, enabling discrimination.
Properties
  • A signal may contain more than one discriminative shapelet; multiple shapelets can be combined as a feature vector.

  • Shapelets are interpretable — they correspond to physically meaningful signal patterns.

  • Optimized for time-domain analysis; less effective for signals whose discriminative information lies in repeated patterns or the frequency domain.

  • Shapelet discovery can be computationally expensive (\(O(M^2 \cdot l)\) per candidate); approximate and learned methods exist to reduce cost.

In Python: tslearn (shapelet learning), sktime (shapelet transform classifier).

17.3 Time-series forest

  • Goal: Classify time series using random forests built on summary features extracted from random intervals.

Time-series forest (TSF) is an interval-based method. Instead of operating on raw samples or shapelets, it extracts simple statistics from randomly chosen sub-intervals of the signal.

Algorithm
  • 1. For each tree in the forest, randomly select \(\sqrt {M}\) intervals \([a_i, b_i] \subset \{1,\ldots ,M\}\).

  • 2. From each interval, extract three summary features: mean, standard deviation, and slope.

  • 3. Train a decision tree on the resulting feature vector (of length \(3\sqrt {M}\)).

  • 4. Repeat for all trees and aggregate predictions by majority vote.

Properties
  • Somewhat interpretable: each split corresponds to a statistic computed over a specific time interval.

  • Computationally efficient compared to shapelet-based methods.

  • Captures interval-level (phase) information rather than point-level or subsequence-level patterns.

Variants

Random Interval Spectral Ensemble (RISE) replaces the time-domain summary statistics with spectral features (e.g. periodogram coefficients), making it effective for signals whose discriminative information lies in the frequency domain.

In Python: sktime (interval-based classifiers including TSF and RISE).

17.4 Symbolic aggregate approximation (SAX)

  • Goal: Convert a real-valued time series into a discrete symbolic string, enabling the use of text-mining techniques for classification.

Algorithm
  • 1. Piecewise Aggregate Approximation (PAA): divide the signal of length \(M\) into \(w\) equal-length segments and replace each segment by its mean, producing a reduced representation of length \(w\).

  • 2. Symbolization: map each mean value to a symbol from an alphabet of size \(\alpha \) using breakpoints derived from the standard normal distribution (assuming z-normalized data). The result is a string of length \(w\) over the alphabet \(\{a, b, \ldots \}\).

  • Example 17.2: A signal of \(M=128\) samples is converted to SAX with \(w=8\) segments and alphabet size \(\alpha =4\). The PAA reduces the signal to 8 mean values (Fig. 17.5, left). Each mean is mapped to a symbol based on the breakpoints of the standard normal distribution (center). The resulting SAX string is shown on the right.

(image)

Figure 17.5: SAX visualization: original signal with PAA overlay (left), symbolization with breakpoints (center), and resulting SAX string (right).
SAX-VSM

SAX combined with the Vector Space Model (SAX-VSM) [?] applies text-classification ideas to time-series classification:

  • 1. Convert each training signal to a SAX string using a sliding window, producing a “bag of words.”

  • 2. Build a tf-idf weighted term-frequency vector for each class.

  • 3. Classify a new signal by computing the cosine similarity of its bag-of-words vector to each class vector.

Properties
  • Interpretable: discriminative patterns correspond to readable symbolic words.

  • Dimensionality reduction is controlled by two parameters: number of segments \(w\) and alphabet size \(\alpha \).

  • Sensitive to the choice of \(w\) and \(\alpha \); typically selected by cross-validation.

In Python: tslearn (PAA, SAX), sktime (SAX-based classifiers).

17.5 ROCKET

  • Goal: Classify time series by transforming them with a large number of random convolutional kernels, then training a linear classifier on the resulting features.

RandOm Convolutional KErnel Transform (ROCKET) [?].

Algorithm
  • 1. Generate a large number (e.g. \(10{,}000\)) of random convolutional kernels with random length, weights, bias, dilation, and padding.

  • 2. Convolve each kernel with the input signal.

  • 3. From each convolution output, extract two features: the maximum value and the proportion of positive values (ppv).

  • 4. Train a linear classifier (e.g. ridge regression) on the resulting feature vector.

The key insight is that a sufficiently large collection of random kernels captures diverse patterns (trends, spikes, oscillations at various scales) without any learning or search over the kernel space.

Properties
  • May be slow: training scales linearly with the number of signals and kernels.

  • Accuracy comparable to deep learning and ensemble methods (e.g. HIVE-COTE).

  • Not interpretable: random kernels have no direct physical meaning.

MINIROCKET

MINIROCKET [?] is a faster, near-deterministic variant. The main differences are summarized below.

.
ROCKET MINIROCKET
Kernel weights Random (continuous) Fixed \(\{-1,\,2\}\)
Features Max value + ppv ppv only
Determinism Stochastic Near-deterministic
Speed Fast Up to \(75\times \) faster

In Python: sktime (RocketClassifier, MiniRocketClassifier).

17.6 HIVE-COTE

  • Goal: Achieve state-of-the-art accuracy by combining classifiers that operate on different signal representations.

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) [?] is a meta-ensemble that fuses the outputs of several diverse classifiers, each operating on a different signal domain:

  • Time domain (e.g. shapelet transform, TSF).

  • Frequency domain (e.g. RISE).

  • Dictionary-based representation (e.g. SAX-based methods).

  • Interval-based and whole-series distance-based classifiers.

Each component classifier produces a probability distribution over the classes. HIVE-COTE combines them using a weighted vote, where the weights reflect each component’s estimated accuracy.

Properties
  • Among the most accurate non-DL classifiers on standard benchmarks.

  • Computationally expensive: trains multiple full classifiers.

  • Modular: individual components can be replaced or upgraded independently.

17.7 Summary

Method comparison

The methods presented in this chapter are compared below. The “domain” column indicates the signal representation used by each method. Accuracy and speed are qualitative rankings on standard benchmarks.

.
Method Domain Accuracy Speed Interpretable
DTW + \(k\)-NN Time (distance) Moderate Slow No
Shapelets Time (sub-sequence) Moderate Slow Yes
TSF / RISE Time / Frequency (interval) Moderate Fast Partially
SAX-VSM Symbolic Moderate Fast Yes
ROCKET Time (random convolution) High Fast No
HIVE-COTE Multi-domain (ensemble) Highest Very slow No
Classical vs. DL approaches
.
Classical (non-DL) DL
Signal modeling Not required Not required
Multivariate signals Less effective Naturally supported
Interpretability Limited (except shapelets, SAX) Limited
Generalization Low High
Complexity High vs. model-based High
Feature engineering Required (hand-crafted) Learned automatically
Data requirements Moderate Large (orders of magnitude more)
Hyper-parameters Few Many; optimization required
Non-linear signals Supported Supported