Machine Learning & Signals Learning

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $ \newcommand {\abs }[1]{\lvert #1\rvert } $ $ \DeclareMathOperator {\sign }{sign} $ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}$ $\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}$ $\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}$ $\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}$ $\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}$ $\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}$ $\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}$ $\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}$ $\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}$ $\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}$ $\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}$ $\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}$ $\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}$ $\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}$ $\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}$ $\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}$ $\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}$ $\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}$ $\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}$ $\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}$ $\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}$ $\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}$ $\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}$ $\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}$ $\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bc }{\mathbf {c}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bf }{\mathbf {f}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bi }{\mathbf {i}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bo }{\mathbf {o}}$ $\newcommand {\bp }{\mathbf {p}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bs }{\mathbf {s}}$ $\newcommand {\bt }{\mathbf {t}}$ $\newcommand {\bu }{\mathbf {u}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bC }{\mathbf {C}}$ $\newcommand {\bD }{\mathbf {D}}$ $\newcommand {\bH }{\mathbf {H}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bM }{\mathbf {M}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bS }{\mathbf {S}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bphi }{\bm {\phi }}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\indFunc }{\mathbb {1}}$ $\newcommand {\btx }{\tilde {\bx }}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\SSE }{\mathrm {SSE}}$ $\newcommand {\MSE }{\mathrm {MSE}}$ $\newcommand {\RMSE }{\mathrm {RMSE}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$ $\require {colortbl}$ $\let \LWRorigcolumncolor \columncolor $ $\renewcommand {\columncolor }[2][named]{\LWRorigcolumncolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigrowcolor \rowcolor $ $\renewcommand {\rowcolor }[2][named]{\LWRorigrowcolor [#1]{#2}\LWRabsorbtwooptions }$ $\let \LWRorigcellcolor \cellcolor $ $\renewcommand {\cellcolor }[2][named]{\LWRorigcellcolor [#1]{#2}\LWRabsorbtwooptions }$

26 Exploratory Analysis of Univariate Signal Samples

Goal: Screen a labeled set of signals with exploratory data analysis (EDA).

The signal-classification chapters so far assume that each input sample is a clean, equally-sampled univariate record of comparable length and energy. In practice, raw acquisitions may fail any of these assumptions. This chapter is a partial diagnostic checklist applied before signal classification.

26.1 Visual inspection for artifacts

Goal: Before running any quantitative diagnostic, look at a representative sample of recordings per class.

Every quantitative test below answers a pre-specified question. Visual inspection answers the question you did not think to ask. A few minutes spent plotting raw samples typically exposes problems that would otherwise survive into the model: a class with a constant DC offset, a sensor that dropped to zero for a fraction of a second, a mains-hum line at $50$/$60$ Hz, isolated spikes from electrical interference, or stretches of missing samples filled with zeros or NaN.

Minimum protocol

• Plot at least $5$ randomly drawn samples per class, with the same y-axis range across classes.
• For each class plot one sample’s amplitude histogram and one sample’s spectrogram (Sec. 19.7) or PSD (Sec. 19.4.2).
• Report the count of samples containing NaN, $\pm \infty $, or runs of identical-valued samples.

Common artifacts

• DC offset:Is it per-class or inter-class consistent?
• Mains hum: narrow spectral lines at $50$ Hz (Europe) or $60$ Hz (US) and their harmonics.
• Saturation: flat tops at the A/D rails, treated quantitatively in Sec. 26.2.
• Spikes and glitches: isolated single-sample excursions far outside the local envelope. They inflate $P_i$ (Sec. 26.5) and bias high-order spectral features.
• Dropouts: stretches of repeated values (often exactly zero) where the sensor lost lock or the buffer was empty. Indistinguishable in features from genuine silence unless explicitly flagged.
• Phase or polarity flip: a class consistently inverted relative to the other.

Visual inspection does not scale, and that is fine

On a dataset of $10^5$ samples no human inspects them all. Inspect a stratified sample, e.g. $20$ per class, plus the longest, shortest, highest power, and lowest-power (Sec. 19.1.2) recording per class. The goal is not full coverage; it is to discover failure modes.

Re-run on any change

Re-run the inspection plots after any change to the acquisition pipeline. Every protocol change is an opportunity for a new artifact.

26.2 Detecting saturated samples

Goal: Identify samples whose amplitude was hard-limited by the A/D converter.

A saturated sample exceeds the converter’s full-scale range $\pm A_{\max }$. The signature appears in three views:

• Time domain: flat-top runs of consecutive points pinned at $\pm A_{\max }$.
• Amplitude histogram: an isolated spike of mass at the rails, separated from the interior distribution.

Three per-sample diagnostics make this quantitative:

1. Rail fraction: $\rho _i = \frac {1}{N_i}\sum _{n=0}^{N_i-1}\indFunc \!\left [\abs {x_i[n]} \ge (1-\epsilon )\,A_{\max }\right ]$ for a small tolerance $\epsilon $ (e.g. $10^{-3}$).
2. Longest rail run: $\ell _i = \max \{k : x_i[n],\ldots ,x_i[n+k-1]\text { all satisfy }\abs {\cdot }\ge (1-\epsilon )A_{\max }\}$.
3. Histogram inspection: visible spike of mass at $\pm A_{\max }$, separated from the interior distribution.

Fig. 26.1 contrasts a clean and a saturated sinusoid across the three views.

Handle explicitly

When saturation is detected, choose explicitly haw to handle it: discard the sample, re-acquire with adjusted gain, or record saturation status as an explicit feature.

26.3 Signal boundaries: onset and offset

Goal: Locate the actual signal interval $[n_{\mathrm {on}},\,n_{\mathrm {off}}]$ inside each sample and/or characterize its decay.

The nominal duration $T_i = N_i / f_s$ counts every recorded sample, including pre-event silence and post-event decay. The signal of interest sometimes occupies only a sub-interval $[n_{\mathrm {on}},\,n_{\mathrm {off}}]$ with effective duration

\begin{equation} T_i^{\mathrm {eff}} = (n_{\mathrm {off}} - n_{\mathrm {on}}) / f_s. \end{equation}

A small ratio $T_i^{\mathrm {eff}} / T_i$ is itself a quality metric: most of the recording is not signal.

Onset and offset

Given a per-sample log envelope $20\log _{10} e_i[n]$ (e.g. short-time $\RMSE $ over a window of a few fundamental periods; the factor $20$ is the amplitude-to-dB convention, replace it with $10$ for a squared/power envelope), locate the peak and define

\begin{equation} n_{\mathrm {on}} = \min \{n : 20\log _{10} e_i[n] \ge \mathrm {peak}_{\mathrm {dB}} - \theta \}, \quad n_{\mathrm {off}} = \max \{n : 20\log _{10} e_i[n] \ge \mathrm {peak}_{\mathrm {dB}} - \theta \}, \end{equation}

with some $\theta $ threshold (measured in dB, .e.g. $\theta =20$) and a hangover of $L_h$ samples that keeps the segment still marked active to overcome an “occasional drop” in amplitude.

Noise floor $\theta $ in per-class setting may become class-discriminative feature.

Threshold may result leakage

Test set should not be used for thresholding derivation.

Exponential decay

When the signal has the form

\begin{equation} x_i[n] \approx a\,e^{-(n - n_{\mathrm {on}})/\tau }\cos (2\pi f_0 n / f_s + \phi ), \end{equation}

the log envelope is linear in $n$ between onset and the late noise floor. A direct linear fit of $\log e_i[n]$ on $[n_{\mathrm {on}},\,n_{\mathrm {off}}]$ returns $\hat \tau $. The per-class distribution of $\hat \tau $ is sometimes discriminative.

For decay-dominated signals, $\hat \tau $ becomes a per-class feature in its own right.

Takeaway

Trim samples to active interval $[n_{\mathrm {on}},\,n_{\mathrm {off}}]$ before any downstream diagnostic.

26.4 Sample-rate adequacy

Goal: Confirm $f_s$ is high enough to resolve the informative band, and check whether it is wastefully too high.

The Nyquist criterion $f_s > 2 f_{\max }$ (see the signals chapter) is the lower bound. The classification literature obsesses over the under-sampling case; the over-sampling case is rarely discussed but just as common in practice.

The too-high case

When $f_s$ is many times the informative bandwidth, every per-sample vector $\bx _i$ is artificially long. The concrete costs:

• Feature-extraction computation costs.
• Some features have higher dimensions or are less informative.
• Curse of dimensionality on a representation whose intrinsic dimension is unchanged.

Spectral-occupancy diagnostic

Estimate the PSD (of a representative subset) of samples and locate the frequency $f_{\mathrm {occ}}$ above which a chosen fraction (e.g. 99%) of the in-band power lies. Set the working sample rate to $f_s' \gtrsim 2.5\,f_{\mathrm {occ}}$ via decimation (with an anti-alias prefilter). Fig. 26.3 shows that PSDs at $2.5\,f_{\max }$ and $25\,f_{\max }$ contain the same informative band.

Set $f_s$ from spectral occupancy

The acquisition sample rate is usually a sensor default, not a modeling choice. Estimate the spectral occupancy of representative recordings and decimate to a working $f_s' \gtrsim 2.5\,f_{\mathrm {occ}}$ before feature extraction.

Terminology: decimation vs. undersampling

Undersampling keeps every $k$-th sample as-is. Decimation first applies a low-pass anti-alias filter at the new Nyquist rate $f_s'/2$, then keeps every $k$-th sample. Use decimation. Prefer a linear-phase (FIR) anti-alias filter: a non-linear-phase one (e.g. IIR Butterworth) introduces frequency-dependent group delay that distorts onset positions and decay envelopes used downstream (Sec. 26.3).

Re-check with the classifier

The spectral-occupancy cut $f_s'$ is necessary but not sufficient. After decimating the training set to $f_s'$ (identical decimation applied to the test set, never with parameters fit on test), refit the chosen classifier and compare its held-out accuracy to a baseline fit at the original $f_s$. Equal accuracy confirms the discarded frequencies carried no class-discriminative information.

Sweep recipe

Decimate to a small grid of working rates (e.g. $1\times $, $0.5\times $, $0.25\times $ of the original $f_s$) and plot held-out performance (e.g., accuracy) vs. $f_s'$.

Leakage warning

Decimation parameters (cutoff, filter order) are derived from training-set statistics only. This is the same end-to-end check pattern used in the permutation diagnostics of Sec. 26.8.

26.5 Energy and power across classes

Goal: Check whether per-sample amplitude alone discriminates the classes, and decide whether that discrimination reflects the phenomenon or the acquisition setup.

For each sample define

\begin{equation} E_i = \sum _{n=0}^{N_i-1} x_i[n]^2, \qquad P_i = \frac {E_i}{N_i}. \end{equation}

When sample lengths $N_i$ differ across the dataset, $E_i$ is misleading and the average power $P_i$ is the fair cross-class quantity. When the recordings include silence or decay tails, restrict the sum to the signal interval $[n_{\mathrm {on}},\,n_{\mathrm {off}}]$ recovered in Sec. 26.3; otherwise silence padding deflates $P_i$ and a class with a longer recorded tail looks quieter than it is.

A per-class boxplot of $P_i$ (Fig. 26.4) answers two questions at once: is amplitude class-discriminative, and is the discrimination credible? Two distinct interpretations apply:

• Phenomenon-driven: classes genuinely differ in radiated power (e.g. loud vs. quiet sound source). Amplitude is a valid feature.
• Acquisition-driven: classes were collected on different hardware, gains, or distances. Amplitude is a confound and the classifier will exploit it instead of the phenomenon.

Disambiguation recipe

1. Compare $P_i$ distributions before and after per-sample amplitude normalization (e.g. divide by $\sqrt {P_i}$ or scale to $\max \abs {x_i} = 1$).
2. Refit the classifier on the normalized samples. If accuracy collapses to chance, amplitude was the only working signal; whether that is acceptable depends on the answer to the phenomenon-vs-acquisition question above.

Interaction with saturation

A class whose samples are saturated while the other’s are not has its $P_i$ artificially pinned toward $A_{\max }^2$.

Acquisition-driven amplitude is leakage

If the amplitude difference between classes traces to a hardware or protocol difference at acquisition time rather than to the phenomenon, the classifier is learning the acquisition setup.

The leak is invisible to a held-out test set drawn from the same acquisition and is exposed only when the model is deployed on a new setup.

Normalize amplitude away or rerun acquisition under a single protocol.

26.6 Effective signal length

Goal: Match the recording duration $T_i = N_i / f_s$ to the time scale of the phenomenon being classified.

Two failure modes appear at the ends of the $T_i$ scale.

Too short

A sample that contains only a fraction of a fundamental period cannot yield a reliable $\hat f_0$ (Sec. 26.7), and its spectral-resolution floor is $\Delta f = 1/T_i$. As a practical lower bound, target

\begin{equation} T_i \cdot \hat f_0 \gtrsim 5, \end{equation}

i.e. at least five fundamental periods per sample. Plot the distribution of $T_i \hat f_0$ per class; an excess of samples below this threshold flags a recording protocol that needs to be lengthened.

Too long

A long recording is often a mixture of regimes (onset transient, steady state, decay) that should be classified separately. The diagnostic is intra-sample non-stationarity: split each $\bx _i$ into $Q$ contiguous sub-windows of equal length.

Significant spread within a single sample means either the sample should be re-windowed (Sec. 24.4) or the class label refers to a regime that occupies only part of the recording.

Spectral-resolution floor

No spectral feature can resolve detail finer than $\Delta f = 1/T_i$. A classifier asked to discriminate two classes whose spectra differ on a finer scale will fail regardless of the model; the fix is longer recordings, not a richer model.

Class-conditional duration histogram

A per-class histogram of $T_i$ frequently reveals acquisition mismatch: one class consistently recorded longer than another (Fig. 26.5).

Acquisition-driven time-length is leakage

$T_i$ itself can leak into duration-sensitive features.

26.7 Fundamental frequency

Goal: Estimate $\hat f_0$ per sample and per class, and check that its distribution agrees with the physical phenomenon.

For periodic or quasi-periodic signals, the fundamental frequency $f_0$ is often the single most informative scalar summary. The standard estimators are:

• Autocorrelation peak (Sec. 20.2.2): $\hat f_0 = f_s / \arg \max _{\tau \ge \tau _{\min }} r_{xx}[\tau ]$, where $r_{xx}$ is the biased autocorrelation and $\tau _{\min }$ excludes the trivial peak at $\tau =0$.
• Periodogram peak: $\hat f_0 = \arg \max _f \abs {X(f)}^2$, restricted to the expected band (Sec. 19.5).

A per-class histogram of $\hat f_0$ (Fig. 26.6) often reveals class separability via $f_0$ alone, and an empty overlap region is itself an important finding: a one-dimensional decision rule on $\hat f_0$ may already saturate accuracy and a complex classifier is unnecessary.

Interaction with saturation (Sec. 26.2). Saturation inflates higher harmonics but does not shift the fundamental, so $\hat f_0$ from autocorrelation or cepstrum remains usable for mildly saturated samples. A periodogram-based estimator restricted to the fundamental band is also robust; only an estimator that searches the full spectrum can be misled into reporting a harmonic.

$f_0$ ambiguity at short records

When the sample length $T_i$ contains fewer than $\approx 3$ fundamental periods, $\hat f_0$ from any of the three estimators has high variance and bias. This is a recording-length issue, not an estimator issue (Sec. 26.6).

26.8 Data-integrity permutation tests

Goal: Use targeted shuffles of labels and class membership to expose chance-level pipelines and per-sample leakage cues.

Two of the diagnostics in Ch. 13 are particularly relevant once a signal-classification pipeline is in place: the label-permutation null baseline (Sec. 13.3.1), which establishes the dataset- and pipeline-specific chance accuracy, and the cross-class sample permutation (Sec. 13.3.5), which exposes per-sample leakage cues (filename, timestamp, recording-session id, residual saturation signature, duration) that survive a swap between class buckets. Run both before trusting any held-out score.