Machine Learning & Signals Learning

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\TextOrMath }[2]{#2}\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \( \newcommand {\abs }[1]{\lvert #1\rvert } \) \( \DeclareMathOperator {\sign }{sign} \) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\) \(\require {physics}\) \(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\) \(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\) \(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\) \(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\) \(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\) \(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\) \(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\) \(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\) \(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\) \(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\) \(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\) \(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\) \(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\) \(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\) \(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\) \(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\) \(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\) \(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\) \(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\) \(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\) \(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\) \(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\) \(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\) \(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\) \(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\) \(\require {cancel}\) \(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\) \(\DeclareMathOperator *{\argmax }{argmax}\) \(\DeclareMathOperator *{\argmin }{arg\,min}\) \(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\) \(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\) \(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\) \(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\) \(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\) \(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\) \(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\) \(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\) \(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\) \(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\) \(\renewcommand {\real }{\mathbb {R}}\) \(\newcommand {\ba }{\mathbf {a}}\) \(\newcommand {\bb }{\mathbf {b}}\) \(\newcommand {\bd }{\mathbf {d}}\) \(\newcommand {\be }{\mathbf {e}}\) \(\newcommand {\bh }{\mathbf {h}}\) \(\newcommand {\bn }{\mathbf {n}}\) \(\newcommand {\bq }{\mathbf {q}}\) \(\newcommand {\br }{\mathbf {r}}\) \(\newcommand {\bt }{\mathbf {t}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\bw }{\mathbf {w}}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\bxx }{\mathbf {xx}}\) \(\newcommand {\bxy }{\mathbf {xy}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\byy }{\mathbf {yy}}\) \(\newcommand {\bz }{\mathbf {z}}\) \(\newcommand {\bA }{\mathbf {A}}\) \(\newcommand {\bB }{\mathbf {B}}\) \(\newcommand {\bI }{\mathbf {I}}\) \(\newcommand {\bK }{\mathbf {K}}\) \(\newcommand {\bP }{\mathbf {P}}\) \(\newcommand {\bQ }{\mathbf {Q}}\) \(\newcommand {\bR }{\mathbf {R}}\) \(\newcommand {\bU }{\mathbf {U}}\) \(\newcommand {\bW }{\mathbf {W}}\) \(\newcommand {\bX }{\mathbf {X}}\) \(\newcommand {\bY }{\mathbf {Y}}\) \(\newcommand {\bZ }{\mathbf {Z}}\) \(\newcommand {\balpha }{\bm {\alpha }}\) \(\newcommand {\bth }{{\bm {\theta }}}\) \(\newcommand {\bepsilon }{{\bm {\epsilon }}}\) \(\newcommand {\bmu }{{\bm {\mu }}}\) \(\newcommand {\bOne }{\mathbf {1}}\) \(\newcommand {\bZero }{\mathbf {0}}\) \(\newcommand {\loss }{\mathcal {L}}\) \(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\) \(\newcommand {\SSE }{\mathrm {SSE}}\) \(\newcommand {\MSE }{\mathrm {MSE}}\) \(\newcommand {\RMSE }{\mathrm {RMSE}}\) \(\newcommand {\toprule }[1][]{\hline }\) \(\let \midrule \toprule \) \(\let \bottomrule \toprule \) \(\def \LWRbooktabscmidruleparen (#1)#2{}\) \(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\) \(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\) \(\newcommand {\morecmidrules }{}\) \(\newcommand {\specialrule }[3]{\hline }\) \(\newcommand {\addlinespace }[1][]{}\) \(\newcommand {\LWRsubmultirow }[2][]{#2}\) \(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\) \(\newcommand {\multirow }[2][]{\LWRmultirow }\) \(\newcommand {\mrowcell }{}\) \(\newcommand {\mcolrowcell }{}\) \(\newcommand {\STneed }[1]{}\) \(\newcommand {\tcbset }[1]{}\) \(\newcommand {\tcbsetforeverylayer }[1]{}\) \(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\) \(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\) \(\newcommand {\tcblower }{}\) \(\newcommand {\tcbline }{}\) \(\newcommand {\tcbtitle }{}\) \(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\) \(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\) \(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\)

Part III Learning in Signals

16 Feature Extraction from Signals

  • Goal: Feature extraction (FE) from signals: converting raw signal windows into numerical feature vectors suitable for ML models.

16.1 Windowing

  • Goal: Divide a continuous signal into fixed-length segments (windows) suitable for feature extraction.

Non-overlapping windows (Fig. 16.1a) are the standard choice. Overlapping windows (Fig. 16.1b) are less recommended since they may reduce performance — the resulting feature vectors become too similar.

For overlapping windows, the overlap is controlled by the step size \(S\) (number of samples between consecutive window starts). The overlap ratio is

\begin{equation} r = \frac {L - S}{L}, \end{equation}

where \(r=0\) corresponds to non-overlapping windows and \(r=0.5\) to 50% overlap.

Overlapping windows may introduce data leakage: adjacent windows share samples, so train and test sets may contain nearly identical data if split carelessly.

Each window contains an equal number of samples, \(L\). The value of \(L\) is:

  • Field-related, e.g. 20–40 msec in acoustic and speech processing.

  • Hand-picked by visual analysis.

  • Hyper-parameter (the worst case).

16.2 Workflow

  • Goal: Understand the end-to-end pipeline from raw signals to a feature dataset ready for ML.

Feature extraction: A mapping \(f:\real ^L\to \real ^N\) that converts a signal window of \(L\) samples into a feature vector of \(N\) numerical values.

FE from signals is presented in Fig. 16.1a.

(image)

(a) Non-overlapping windows

(image)

(b) Overlapping windows

(image)

(c) Typical workflow
Figure 16.1: Feature extraction from signals. Dataset is further applied for ML tasks, such as regression or classification.

Typical dimensions:

  • \(L\) may vary between 100s to 10,000s samples.

  • \(N\) may vary between few to 1,000s features.

Feature extraction and feature selection are distinct steps. FE creates new features from raw data; feature selection (FS) then removes redundant or irrelevant features from the extracted set.

FE from images FE from images uses the similar principles. The main difference is in the applied image feature functions.

16.3 Signal transformation

  • Goal: Functional mapping of a signal to improve prediction (or classification). The result can be:

    • Univariate to another univariate signal (basic case).

    • Univariate to multivariate set of signals for further multivariate processing (advanced case).

Notes:

  • The transformations have to be invertible.

  • Some of the transformations are restricted only for positive signals, \(y[n]>0\,\forall n\).

Some of the common transformations:

  • Logarithmic transformation

    \begin{equation} \tilde {y}[n]= \begin{cases} \log (y[n]+1) & y[n]\geq 0\\ \log (y[n]) & y[n]> 0\\ \end {cases} \end{equation}

    We use \((+1)\) to avoid value of 0.

  • Square root transformation,

    \begin{equation} \tilde {y}[n]=\sqrt {y[n]} \end{equation}

  • Exponential transformation,

    \begin{equation} \tilde {y}[n]=y^m[n] \end{equation}

  • Modified Box-Cox transformations [?],

    \begin{equation} w_t = \begin{cases} \log (y_t) & \text {if $\lambda =0$}; \\ \text {sign}(y_t)(|y_t|^\lambda -1)/\lambda & \text {otherwise}. \end {cases} \end{equation}

    with inverse

    \begin{equation} y_{t} = \begin{cases} \exp (w_{t}) & \text {if $\lambda =0$};\\ \text {sign}(\lambda w_t + 1)|\lambda w_t+1|^{1/\lambda } & \text {otherwise}. \end {cases} \end{equation}

  • Example 16.1: A sensor signal \(y[n]\) with values in \([0,\,10{,}000]\) is highly right-skewed (Fig. 16.2). Applying \(\tilde {y}[n]=\log (y[n]+1)\) compresses the dynamic range to \([0,\,\approx 9.2]\), making the distribution more symmetric and improving downstream classifier performance.

(image)

Figure 16.2: Log transformation of a right-skewed signal.

16.4 Signal Features

  • Goal: Numerical functions that convert raw signal windows into scalar values that capture the characteristics of interest.

Statistical (time-domain) features

Descriptive statistics computed directly on the signal window \(y[1],\ldots ,y[L]\):

  • Mean:

    \begin{equation} \bar {y} = \frac {1}{L}\sum _{n=1}^{L} y[n] \end{equation}

  • Variance:

    \begin{equation} \sigma ^2 = \frac {1}{L}\sum _{n=1}^{L}\bigl (y[n]-\bar {y}\bigr )^2 \end{equation}

  • Root mean square (RMS):

    \begin{equation} y_{\text {rms}} = \sqrt {\frac {1}{L}\sum _{n=1}^{L} y^2[n]} \end{equation}

  • Zero-crossing rate (ZCR):

    \begin{equation} \text {ZCR} = \frac {1}{L-1}\sum _{n=2}^{L}\mathbf {1}\bigl [y[n]\cdot y[n-1]<0\bigr ] \end{equation}

    where \(\mathbf {1}[\cdot ]\) is the indicator function.

  • Additional: maximum, minimum, median, skewness, kurtosis, energy (\(\sum y^2[n]\)), number of peaks.

Spectral (frequency-domain) features

Features derived from the Fourier transform \(Y[k]\) of the signal window:

  • Spectral centroid — the “center of mass” of the spectrum:

    \begin{equation} \text {SC} = \frac {\sum _{k=1}^{K} f_k |Y[k]|^2}{\sum _{k=1}^{K} |Y[k]|^2} \end{equation}

    where \(f_k\) is the frequency of the \(k\)-th bin.

  • Spectral bandwidth, spectral rolloff, spectral flatness.

Auto-correlation based features

Time-domain features derived from the auto-correlation function, capturing periodicity and temporal structure.

Mixed-domain features

Features based on STFT and wavelet transforms, capturing both time and frequency information simultaneously.

Examples: tsfel feature list

Field-tailored features

Historically-developed features. For example, cepstrum in speech processing, that is logarithm of Fourier transform.

Features from different domains (statistical, spectral, etc.) may have very different scales. Feature normalization is recommended before combining them into a single feature vector.

16.4.1 Output

At the end of the FE process, each signal window of \(L\) samples is mapped to a single \(N\)-dimensional feature vector. Stacking all \(M\) windows yields an \(M\times N\) feature matrix (dataset) ready for ML (Fig. 16.1a).

Complexity Some features (e.g. spectral or wavelet-based) have non-negligible computational cost, especially for large \(L\). Speed-up can be achieved by:

  • Parallel computation: since windows are independent, they can be distributed across multiple CPU/GPU cores.

  • Selecting a compact feature subset (see feature selection) to avoid computing unnecessary features.

16.5 Dedicated Libraries

  • Goal: Overview of ready-to-use toolboxes for time-series feature extraction and selection.

Several open-source libraries provide large collections of pre-implemented signal features, eliminating the need to code them from scratch.

tsfresh (Time Series Fresh)

  • Python-based, \(150+\) features covering statistical, spectral, and non-linear domains.

  • Built-in statistical feature selection (relevance testing via hypothesis tests).

  • Seamless scikit-learn integration (transformers, pipelines).

tsfel (Time Series Feature Extraction Library)

  • Python-based, \(60+\) features organized by domain (statistical, temporal, spectral).

hctsa (highly comparative time-series analysis)

  • Matlab-based, \(7{,}700+\) features — the largest available feature library.

  • Designed for exploratory comparison across many time-series datasets.

catch22 (CAnonical Time-series CHaracteristics)

  • A curated subset of 22 features selected from \(4{,}791\) hctsa features on 93 publicly available datasets.

  • Provides strong baseline performance on a broad variety of signals.

  • Applied on internally normalized signals; optional mean and std features.

  • Fast C implementation with Python, R, and Matlab wrappers.

scikit-learn

  • Does not include signal-level feature extraction, but provides classifiers, regressors, and feature selection methods.

  • Feature selection can be embedded in a Pipeline for end-to-end workflows.

mlxtend Supplementary ML tools including sequential feature selection (forward/backward) and model evaluation utilities.

16.6 Train-Test Split in Signals

  • Goal: Correctly partition signal data into training and test sets without introducing data leakage.

Train and test cannot be adjacent windows.

Random train-test splitting, which is standard for i.i.d. tabular data, is invalid for time-series signals. Adjacent windows share temporal context (and samples, if overlapping), so random splitting causes data leakage and over-optimistic performance estimates.

16.6.1 Classification

When the goal is to classify signals (e.g. fault detection, speaker identification), the split must be done by source rather than by individual windows. Each source (e.g. speaker, device, or recording session) appears entirely in either the train or the test set, never both.

This is known as group splitting1 (or leave-one-group-out CV). It ensures the model is evaluated on truly unseen sources, not on different windows from a source it has already learned.

  • Example 16.2: In a speaker identification task with 20 speakers, group splitting assigns 16 speakers to training and 4 to testing. All windows from a given speaker belong to the same set. A random window-level split would leak speaker-specific patterns into training, inflating accuracy.

16.6.2 Prediction

A temporal split is used: all data before a cutoff time \(t_c\) is used for training, and data after \(t_c\) for testing. The test set must always lie in the future relative to the training set.

Time-series cross-validation Standard \(k\)-fold CV shuffles the data, violating temporal order. Instead, time-series CV (Fig. 16.3) uses an expanding (or sliding) window:

  • 1. Start with a minimal training set of the earliest samples.

  • 2. Train the model and evaluate on the next time step (or window).

  • 3. Expand the training set to include the previous test point, and repeat.

Each fold preserves the temporal ordering: training data always precedes test data (reference).

Optionally, a gap of \(g\) samples can be inserted between the training and test sets. This prevents leakage from auto-correlated signals where adjacent windows carry similar information. The gap size \(g\) is a hyper-parameter that depends on the correlation length of the signal.

Example in Python

(image)

Figure 16.3: Time-series cross-validation with expanding window. Each row is one fold: training samples (blue) always precede the test sample (orange); future samples (gray) are unused (inspired by source).

16.7 Summary

  • Feature extraction maps signal windows of \(L\) samples to \(N\)-dimensional feature vectors via \(f:\real ^L\to \real ^N\).

  • Non-overlapping windows are preferred to avoid redundancy and data leakage.

  • Signal transformations (log, Box-Cox, etc.) can improve downstream model performance by reshaping distributions.

  • Features span statistical, spectral, auto-correlation, and mixed domains; dedicated libraries (tsfresh, tsfel, catch22) provide ready-to-use implementations.

  • Train-test splitting must respect temporal or source structure — random splits are invalid for signal data.