Machine Learning & Signals Learning

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\TextOrMath }[2]{#2}\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \( \newcommand {\abs }[1]{\lvert #1\rvert } \) \( \DeclareMathOperator {\sign }{sign} \) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\) \(\require {physics}\) \(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\) \(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\) \(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\) \(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\) \(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\) \(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\) \(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\) \(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\) \(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\) \(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\) \(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\) \(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\) \(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\) \(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\) \(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\) \(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\) \(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\) \(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\) \(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\) \(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\) \(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\) \(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\) \(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\) \(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\) \(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\) \(\require {cancel}\) \(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\) \(\DeclareMathOperator *{\argmax }{argmax}\) \(\DeclareMathOperator *{\argmin }{arg\,min}\) \(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\) \(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\) \(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\) \(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\) \(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\) \(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\) \(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\) \(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\) \(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\) \(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\) \(\renewcommand {\real }{\mathbb {R}}\) \(\newcommand {\ba }{\mathbf {a}}\) \(\newcommand {\bb }{\mathbf {b}}\) \(\newcommand {\bc }{\mathbf {c}}\) \(\newcommand {\bd }{\mathbf {d}}\) \(\newcommand {\be }{\mathbf {e}}\) \(\newcommand {\bf }{\mathbf {f}}\) \(\newcommand {\bh }{\mathbf {h}}\) \(\newcommand {\bi }{\mathbf {i}}\) \(\newcommand {\bn }{\mathbf {n}}\) \(\newcommand {\bo }{\mathbf {o}}\) \(\newcommand {\bp }{\mathbf {p}}\) \(\newcommand {\bq }{\mathbf {q}}\) \(\newcommand {\br }{\mathbf {r}}\) \(\newcommand {\bs }{\mathbf {s}}\) \(\newcommand {\bt }{\mathbf {t}}\) \(\newcommand {\bu }{\mathbf {u}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\bw }{\mathbf {w}}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\bxx }{\mathbf {xx}}\) \(\newcommand {\bxy }{\mathbf {xy}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\byy }{\mathbf {yy}}\) \(\newcommand {\bz }{\mathbf {z}}\) \(\newcommand {\bA }{\mathbf {A}}\) \(\newcommand {\bB }{\mathbf {B}}\) \(\newcommand {\bC }{\mathbf {C}}\) \(\newcommand {\bD }{\mathbf {D}}\) \(\newcommand {\bH }{\mathbf {H}}\) \(\newcommand {\bI }{\mathbf {I}}\) \(\newcommand {\bK }{\mathbf {K}}\) \(\newcommand {\bM }{\mathbf {M}}\) \(\newcommand {\bP }{\mathbf {P}}\) \(\newcommand {\bQ }{\mathbf {Q}}\) \(\newcommand {\bR }{\mathbf {R}}\) \(\newcommand {\bS }{\mathbf {S}}\) \(\newcommand {\bU }{\mathbf {U}}\) \(\newcommand {\bW }{\mathbf {W}}\) \(\newcommand {\bX }{\mathbf {X}}\) \(\newcommand {\bY }{\mathbf {Y}}\) \(\newcommand {\bZ }{\mathbf {Z}}\) \(\newcommand {\balpha }{\bm {\alpha }}\) \(\newcommand {\bth }{{\bm {\theta }}}\) \(\newcommand {\bepsilon }{{\bm {\epsilon }}}\) \(\newcommand {\bmu }{{\bm {\mu }}}\) \(\newcommand {\bphi }{\bm {\phi }}\) \(\newcommand {\bOne }{\mathbf {1}}\) \(\newcommand {\bZero }{\mathbf {0}}\) \(\newcommand {\indFunc }{\mathbb {1}}\) \(\newcommand {\btx }{\tilde {\bx }}\) \(\newcommand {\loss }{\mathcal {L}}\) \(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\) \(\newcommand {\SSE }{\mathrm {SSE}}\) \(\newcommand {\MSE }{\mathrm {MSE}}\) \(\newcommand {\RMSE }{\mathrm {RMSE}}\) \(\newcommand {\toprule }[1][]{\hline }\) \(\let \midrule \toprule \) \(\let \bottomrule \toprule \) \(\def \LWRbooktabscmidruleparen (#1)#2{}\) \(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\) \(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\) \(\newcommand {\morecmidrules }{}\) \(\newcommand {\specialrule }[3]{\hline }\) \(\newcommand {\addlinespace }[1][]{}\) \(\newcommand {\LWRsubmultirow }[2][]{#2}\) \(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\) \(\newcommand {\multirow }[2][]{\LWRmultirow }\) \(\newcommand {\mrowcell }{}\) \(\newcommand {\mcolrowcell }{}\) \(\newcommand {\STneed }[1]{}\) \(\newcommand {\tcbset }[1]{}\) \(\newcommand {\tcbsetforeverylayer }[1]{}\) \(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\) \(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\) \(\newcommand {\tcblower }{}\) \(\newcommand {\tcbline }{}\) \(\newcommand {\tcbtitle }{}\) \(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\) \(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\) \(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\) \(\require {colortbl}\) \(\let \LWRorigcolumncolor \columncolor \) \(\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigrowcolor \rowcolor \) \(\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }\) \(\let \LWRorigcellcolor \cellcolor \) \(\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }\)

25 Signal Classifiers

  • Goal: Assign a whole signal \(y[n]\) to one of \(C\) discrete classes.

25.1 Preface

Given a training set of labeled signals, the task is to assign a whole signal \(y[n],\,n=0,\ldots ,L-1\) to one of \(C\) discrete classes. Unlike the regression setting of earlier chapters, the input is a sequence of inter-related samples rather than a fixed-length feature vector.

Dataset Signals in a dataset may be

  • univariate (single channel \(y[n]\)) or multivariate (\(K\) channels \(y_k[n]\), e.g. ECG leads or accelerometer axes);

  • equal-length or variable-length (length \(L\) may differ between examples);

  • scarce: labeled time-series datasets are typically small relative to \(L\).

Overfitting challenge Using each sample \(y[n]\) as a feature yields an \(L\)-dimensional input from only a handful of training examples, so a raw-sample classifier overfits easily. The methods below differ mainly in how they summarize, transform, or compare signals before classification in order to control this.

Main categories of methods

  • Distance-based: define an elastic distance between raw signals and apply \(k\)-NN (Dynamic Time Warping).

  • Feature-based: extract interpretable summaries and classify on them (time-series forest / RISE, shapelets, SAX and SAX-VSM). The general feature-extraction pipeline these methods specialize is covered in Sec. 24.6.

  • Specially adapted transforms: apply a large bank of random or fixed transforms followed by a linear classifier (ROCKET, MINIROCKET), or combine several domain-specific classifiers in a meta-ensemble (HIVE-COTE).

  • Deep learning (out of scope here, pointer only): 1D CNNs, RNNs, and pretrained time-series foundation models. These are contrasted with classical methods in the closing table.

25.2 Distance Metrics

The standard distances are merely applicable

The standard distance metrics of Sec. 10.2 treat a signal as a plain vector in \(\real ^L\) and pair samples index-by-index. For signals this is rarely what we want, for two reasons:

  • Sample-to-sample comparison: Distances, such as \(L_p\) and cosine, are sensitive to any misalignment along the time axis, such as a shift, a stretch, a local speed change, or an additional/missing event, even when the two signals share the same underlying characteristics. They are also restricted to equal-length inputs.

  • Curse of dimensionality: treating each of the \(L\) samples as a separate feature places the classifier in the ultra-high-dimensional regime (Sec. 10.4).

The distance below are particularly fit to signals of the certain kinds.

25.2.1 Dynamic Time Warping (DTW)
  • Goal: “Elastic” distance metric between two signals.

    • Time-domain, non-linear metric that can stretch and compress the time axis to find the best possible match between sequences.

    • Applied for k-NN and other classifiers.

    • Historically, developed to compare same/different words spoken with different speed.

Definition

Given two signals [9]

\begin{equation} \begin{aligned} X &= \left (x_1,\ldots ,x_m,\ldots ,x_M\right ),\\ Y &= \left (y_1,\ldots ,y_n,\ldots ,y_N\right ),\\ \end {aligned} \end{equation}

the DTW algorithm takes as input a pointwise dissimilarity function

\begin{equation} d_{mn} = f(x_m,y_n)\ge 0. \end{equation}

A common choice is the 1D Euclidean distance, \(d_{mn} = \abs {x_m-y_n}\).

A warping path is a sequence of index pairs \(\pi = \{(m_k,n_k)\}_{k=1}^{K}\) that satisfies three constraints:

  • Boundary: \((m_1,n_1)=(1,1)\) and \((m_K,n_K)=(M,N)\).

  • Monotonicity: \(m_{k+1}\ge m_k\) and \(n_{k+1}\ge n_k\) (no turns back).

  • Step size: each step is vertical \((m{+}1,n)\), horizontal \((m,n{+}1)\), or diagonal \((m{+}1,n{+}1)\) (no skips).

The DTW distance is the minimum cumulative cost along any admissible path,

\begin{equation} \mathrm {DTW}(X,Y) = \min _{\pi }\sum _{k=1}^{K} d_{m_k n_k}, \end{equation}

and is obtained by dynamic programming over the cost matrix \([d_{mn}]\) in \(O(MN)\) time. The resulting path serves as an elastic distance between the two signals, allowing non-linear stretching of the time axis.

Unlike Euclidean distance, which pairs samples index-by-index and therefore penalizes any time shift or stretch, DTW is free to re-align the time axes and then measures the residual mismatch under the best alignment. The distance is therefore small whenever \(X\) and \(Y\) share the same shape, even if one is faster, slower, or shifted in time.

  • DTW is a dissimilarity measure, not a true metric; the triangle inequality does not hold in general.

  • The raw sum depends on path length. To compare pairs with different \(M,N\), it is common to normalize by the number of steps \(K\) (mean cost per step).

  • Example 25.1: Two signals of different lengths: \(X\) is a chirp of \(M=1000\) samples and \(Y\) is a sinusoid of \(N=399\) samples (Fig. 25.1). The cost matrix \(d_{mn}=\abs {x_m-y_n}\) and the optimal warping path are shown in Fig. 25.2. After alignment along the optimal path, the two signals match closely (Fig. 25.3), demonstrating how DTW stretches the time axis to find the best correspondence.

(image)

Figure 25.1: Signal examples.

(image)

Figure 25.2: \(d_{mn}\) with the optimal path.

(image)

Figure 25.3: Aligned signals.
Multivariate DTW

The multivariate case is identical to the univariate one, except that the pointwise dissimilarity is defined between vectors rather than scalars. For \(K\)-dimensional (possibly complex-valued) samples \(\mathbf {x}_m,\mathbf {y}_n\in \mathbb {C}^K\), a natural choice is the Euclidean norm of the difference,

\begin{equation} d_{mn} = \norm {\mathbf {x}_m-\mathbf {y}_n}_2 = \sqrt {\sum _{k=1}^K \left (x_{k,m}-y_{k,n}\right )\left (x_{k,m}-y_{k,n}\right )^{*}}. \end{equation}

k-NN with DTW

Because DTW provides a pairwise distance between signals of possibly different lengths, it plugs directly into any distance-based classifier. The canonical choice is \(k\)-NN (Section 10): to classify a new signal, compute its DTW distance to every training signal and take a majority vote among the \(k\) nearest. With \(k=1\), 1-NN + DTW is a strong and frequently cited baseline for time-series classification.

25.2.2 Shapelets
  • Goal: Classify time series by identifying short, discriminative sub-sequences (shapelets) that characterize each class.

A shapelet is a sub-sequence \(S = (s_1,\ldots ,s_l)\) of length \(l\) that is “maximally representative of a class” [18].

Shapelet distance

Given a shapelet \(S\) of length \(l\) and a signal \(X = (x_1,\ldots ,x_M)\), the shapelet distance is the minimum distance over all possible alignments:

\begin{equation} d(S, X) = \min _{i=1,\ldots ,M-l+1} \sqrt {\frac {1}{l}\sum _{j=1}^{l}(s_j - x_{i+j-1})^2} \end{equation}

This sliding-window RMSE measures how well the shapelet matches anywhere in the signal (Fig. 25.4).

Classification
  • 1. Extract shapelet candidates from the training set (sub-sequences of varying lengths).

  • 2. For each candidate, compute its distance to every training signal.

  • 3. Select the shapelet(s) whose distances best separate the classes (e.g. by information gain).

  • 4. Use the shapelet distances as features for a classifier (e.g. decision tree, SVM).

(image)

Figure 25.4: Shapelet-based classification. Left: two signal classes (Class A contains a characteristic bump). Center: the shapelet (orange) extracted from Class A. Right: distance profile — Class A has a near-zero minimum distance, while Class B remains high, enabling discrimination.
Properties
  • A signal may contain more than one discriminative shapelet; multiple shapelets can be combined as a feature vector.

  • Shapelets are interpretable — they correspond to physically meaningful signal patterns.

  • Optimized for time-domain analysis; less effective for signals whose discriminative information lies in repeated patterns or the frequency domain.

  • Shapelet discovery can be computationally expensive (\(O(M^2 \cdot l)\) per candidate); approximate and learned methods exist to reduce cost.

In Python: tslearn (shapelet learning), sktime (shapelet transform classifier).

25.3 Pseudo-Random-Features-Based Classifiers

25.3.1 Time-series forest
  • Goal: Classify time series using random forests built on summary features extracted from random intervals.

Time-series forest (TSF) is an interval-based method. Instead of operating on raw samples or shapelets, it extracts simple statistics from randomly chosen sub-intervals of the signal.

Algorithm
  • 1. For each tree in the forest, randomly select \(\sqrt {M}\) intervals \([a_i, b_i] \subset \{1,\ldots ,M\}\).

  • 2. From each interval, extract three summary features: mean, standard deviation, and slope.

  • 3. Train a decision tree on the resulting feature vector (of length \(3\sqrt {M}\)).

  • 4. Repeat for all trees and aggregate predictions by majority vote.

Properties
  • Somewhat interpretable: each split corresponds to a statistic computed over a specific time interval.

  • Computationally efficient compared to shapelet-based methods.

  • Captures interval-level (phase) information rather than point-level or subsequence-level patterns.

25.3.2 Random Interval Spectral Ensemble (RISE)

RISE replaces the time-domain summary statistics with spectral features (e.g. periodogram coefficients), making it effective for signals whose discriminative information lies in the frequency domain.

In Python: sktime (interval-based classifiers including TSF and RISE).

25.4 Symbolic aggregate approximation (SAX)

  • Goal: Convert a real-valued time series into a discrete symbolic string, enabling the use of text-mining techniques for classification.

Algorithm
  • 1. Piecewise Aggregate Approximation (PAA): divide the signal of length \(M\) into \(w\) equal-length segments and replace each segment by its mean, producing a reduced representation of length \(w\).

  • 2. Symbolization: map each mean value to a symbol from an alphabet of size \(\alpha \) using breakpoints derived from the standard normal distribution (assuming z-normalized data). The result is a string of length \(w\) over the alphabet \(\{a, b, \ldots \}\).

  • Example 25.2: A signal of \(M=128\) samples is converted to SAX with \(w=8\) segments and alphabet size \(\alpha =4\). The PAA reduces the signal to 8 mean values (Fig. 25.5, left). Each mean is mapped to a symbol based on the breakpoints of the standard normal distribution (center). The resulting SAX string is shown on the right.

(image)

Figure 25.5: SAX visualization: original signal with PAA overlay (left), symbolization with breakpoints (center), and resulting SAX string (right).
SAX-VSM

SAX combined with the Vector Space Model (SAX-VSM) [16] applies text-classification ideas to time-series classification:

  • 1. Convert each training signal to a SAX string using a sliding window, producing a “bag of words.”

  • 2. Build a tf-idf weighted term-frequency vector for each class.

  • 3. Classify a new signal by computing the cosine similarity of its bag-of-words vector to each class vector.

Properties
  • Interpretable: discriminative patterns correspond to readable symbolic words.

  • Dimensionality reduction is controlled by two parameters: number of segments \(w\) and alphabet size \(\alpha \).

  • Sensitive to the choice of \(w\) and \(\alpha \); typically selected by cross-validation.

In Python: tslearn (PAA, SAX), sktime (SAX-based classifiers).

25.5 Random Convolutions

  • Goal: Classify time series by transforming them with a large number of random convolutional kernels, then training a linear classifier on the resulting features.

25.5.1 ROCKET

RandOm Convolutional KErnel Transform (ROCKET) [5].

Algorithm
  • 1. Generate a large number (e.g. \(10{,}000\)) of random convolutional kernels with random length, weights, bias, dilation, and padding.

  • 2. Convolve each kernel with the input signal.

  • 3. From each convolution output, extract two features: the maximum value and the proportion of positive values (ppv).

  • 4. Train a linear classifier (e.g. ridge regression) on the resulting feature vector.

The key insight is that a sufficiently large collection of random kernels captures diverse patterns (trends, spikes, oscillations at various scales) without any learning or search over the kernel space.

Properties
  • May be slow: training scales linearly with the number of signals and kernels.

  • Accuracy comparable to deep learning and ensemble methods (e.g. HIVE-COTE).

  • Not interpretable: random kernels have no direct physical meaning.

25.5.2 MINIROCKET

MINIROCKET [6] is a faster, near-deterministic variant. The main differences are summarized below.

.
ROCKET MINIROCKET
Kernel weights Random (continuous) Fixed \(\{-1,\,2\}\)
Features Max value + ppv ppv only
Determinism Stochastic Near-deterministic
Speed Fast Up to \(75\times \) faster

In Python: sktime (RocketClassifier, MiniRocketClassifier).

25.6 Ensemble Classifiers

  • Goal: Achieve state-of-the-art accuracy by combining classifiers that operate on different signal representations.

25.6.1 HIVE-COTE

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) [14] is a meta-ensemble that fuses the outputs of several diverse classifiers, each operating on a different signal domain:

  • Time domain (e.g. shapelet transform, TSF).

  • Frequency domain (e.g. RISE).

  • Dictionary-based representation (e.g. SAX-based methods).

  • Interval-based and whole-series distance-based classifiers.

Each component classifier produces a probability distribution over the classes. HIVE-COTE combines them using a weighted vote, where the weights reflect each component’s estimated accuracy.

Properties
  • Among the most accurate non-DL classifiers on standard benchmarks.

  • Computationally expensive: trains multiple full classifiers.

  • Modular: individual components can be replaced or upgraded independently.

25.7 Summary

Method comparison

The methods presented in this chapter are compared below. The “domain” column indicates the signal representation used by each method. Accuracy and speed are qualitative rankings on standard benchmarks.

.
Method Domain Accuracy Speed Interpretable
DTW + \(k\)-NN Time (distance) Moderate Slow No
Shapelets Time (sub-sequence) Moderate Slow Yes
TSF / RISE Time / Frequency (interval) Moderate Fast Partially
SAX-VSM Symbolic Moderate Fast Yes
ROCKET Time (random convolution) High Fast No
HIVE-COTE Multi-domain (ensemble) Highest Very slow No
Classical vs. DL approaches
.
Classical (non-DL) DL
Signal modeling Not required Not required
Multivariate signals Less effective Naturally supported
Interpretability Limited (except shapelets, SAX) Limited
Generalization Low High
Complexity High vs. model-based High
Feature engineering Required (hand-crafted) Learned automatically
Data requirements Moderate Large (orders of magnitude more)
Hyper-parameters Few Many; optimization required
Non-linear signals Supported Supported