Machine Learning & Signals Learning

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\TextOrMath }[2]{#2}\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \( \newcommand {\abs }[1]{\lvert #1\rvert } \) \( \DeclareMathOperator {\sign }{sign} \) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\) \(\require {physics}\) \(\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}\) \(\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}\) \(\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}\) \(\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}\) \(\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}\) \(\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}\) \(\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}\) \(\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}\) \(\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}\) \(\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}\) \(\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}\) \(\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}\) \(\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}\) \(\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}\) \(\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}\) \(\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}\) \(\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}\) \(\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}\) \(\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}\) \(\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}\) \(\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}\) \(\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}\) \(\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}\) \(\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}\) \(\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}\) \(\require {cancel}\) \(\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}\) \(\DeclareMathOperator *{\argmax }{argmax}\) \(\DeclareMathOperator *{\argmin }{arg\,min}\) \(\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}\) \(\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}\) \(\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}\) \(\newcommand {\floor }[1]{\lfloor #1 \rfloor }\) \(\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}\) \(\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}\) \(\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}\) \(\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}\) \(\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}\) \(\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}\) \(\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}\) \(\renewcommand {\real }{\mathbb {R}}\) \(\newcommand {\ba }{\mathbf {a}}\) \(\newcommand {\bb }{\mathbf {b}}\) \(\newcommand {\bd }{\mathbf {d}}\) \(\newcommand {\be }{\mathbf {e}}\) \(\newcommand {\bh }{\mathbf {h}}\) \(\newcommand {\bn }{\mathbf {n}}\) \(\newcommand {\bq }{\mathbf {q}}\) \(\newcommand {\br }{\mathbf {r}}\) \(\newcommand {\bt }{\mathbf {t}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\bw }{\mathbf {w}}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\bxx }{\mathbf {xx}}\) \(\newcommand {\bxy }{\mathbf {xy}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\byy }{\mathbf {yy}}\) \(\newcommand {\bz }{\mathbf {z}}\) \(\newcommand {\bA }{\mathbf {A}}\) \(\newcommand {\bB }{\mathbf {B}}\) \(\newcommand {\bI }{\mathbf {I}}\) \(\newcommand {\bK }{\mathbf {K}}\) \(\newcommand {\bP }{\mathbf {P}}\) \(\newcommand {\bQ }{\mathbf {Q}}\) \(\newcommand {\bR }{\mathbf {R}}\) \(\newcommand {\bU }{\mathbf {U}}\) \(\newcommand {\bW }{\mathbf {W}}\) \(\newcommand {\bX }{\mathbf {X}}\) \(\newcommand {\bY }{\mathbf {Y}}\) \(\newcommand {\bZ }{\mathbf {Z}}\) \(\newcommand {\balpha }{\bm {\alpha }}\) \(\newcommand {\bth }{{\bm {\theta }}}\) \(\newcommand {\bepsilon }{{\bm {\epsilon }}}\) \(\newcommand {\bmu }{{\bm {\mu }}}\) \(\newcommand {\bOne }{\mathbf {1}}\) \(\newcommand {\bZero }{\mathbf {0}}\) \(\newcommand {\loss }{\mathcal {L}}\) \(\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}\) \(\newcommand {\SSE }{\mathrm {SSE}}\) \(\newcommand {\MSE }{\mathrm {MSE}}\) \(\newcommand {\RMSE }{\mathrm {RMSE}}\) \(\newcommand {\toprule }[1][]{\hline }\) \(\let \midrule \toprule \) \(\let \bottomrule \toprule \) \(\def \LWRbooktabscmidruleparen (#1)#2{}\) \(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\) \(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\) \(\newcommand {\morecmidrules }{}\) \(\newcommand {\specialrule }[3]{\hline }\) \(\newcommand {\addlinespace }[1][]{}\) \(\newcommand {\LWRsubmultirow }[2][]{#2}\) \(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\) \(\newcommand {\multirow }[2][]{\LWRmultirow }\) \(\newcommand {\mrowcell }{}\) \(\newcommand {\mcolrowcell }{}\) \(\newcommand {\STneed }[1]{}\) \(\newcommand {\tcbset }[1]{}\) \(\newcommand {\tcbsetforeverylayer }[1]{}\) \(\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}\) \(\newcommand {\tcboxfit }[2][]{\boxed {#2}}\) \(\newcommand {\tcblower }{}\) \(\newcommand {\tcbline }{}\) \(\newcommand {\tcbtitle }{}\) \(\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}\) \(\newcommand {\tcboxmath }[2][]{\boxed {#2}}\) \(\newcommand {\tcbhighmath }[2][]{\boxed {#2}}\)

Part II Linear Least Squares in
Signals & Systems Modeling

Preface

In recent years, the convergence of machine learning (ML) and signal processing (SP) has gathered growing attention in engineering education. Students are often introduced to ML principles at an early stage, yet many advanced SP topics, ranging from linear systems and time-frequency analysis to probabilistic modeling, traditionally require multiple specialized courses [?]. Although these SP methods yield comprehensive performance insights and rigorous conclusions, teaching them can be both time-consuming and demanding.

A key bridge between basic ML concepts and advanced SP techniques is the least squares (LS) method. LS is grounded in a simple and intuitive idea: minimizing the sum of squared errors. While direct LS computations may be \(\mathcal {O}(N^3)\), and thus less efficient than typical SP methods (\(\mathcal {O}(N \log N)\) to \(\mathcal {O}(N^2)\)), the LS perspective fosters a simpler, data-driven understanding of fundamental SP tasks. For example, the estimation of sinusoidal signal parameters in noise can be introduced by viewing it purely as a regression problem, bypassing the need for more involved probabilistic analyses. Likewise, the discrete Fourier transform (DFT) can be reframed as an extension of sinusoidal parameter estimation, illustrating SP principles with real arithmetic alone.

An LS-centric viewpoint aligns well with the foundational prerequisites of many ML courses and can be integrated at an early stage of engineering or data science programs. It offers an accessible path for teaching core SP ideas to engineering students who might lack extensive mathematical or probabilistic training. Although the underlying techniques are not new, this data-driven, regression-based interpretation may be more intuitive for those already familiar with basic ML concepts, enabling them to explore SP topics with minimal additional theoretical overhead.

13 Sinusoidal Signal Analysis

  • Goal: This chapter introduces the fundamental concepts and methods for analyzing and estimating (learning) parameters of a discrete-time sinusoidal signal observed in additive noise.

13.1 Signal Preliminaries

13.1.1 Cosine Signal

A general continuous-time cosine signal can be written as

\begin{equation} \begin{aligned} x(t) &= A\cos (2\pi F_0t + \theta ),\\ &= A\cos (\Omega _0 t + \theta ), \end {aligned} \end{equation}

where

  • \(A>0\) is the amplitude,

  • \(-\pi <\theta \leq \pi \) or \(0\le \theta < 2\pi \) is the phase,

  • \(F_0\) is the frequency in Hz,

  • \(\Omega _0 = 2\pi F_0\) is the radial frequency in rad/sec,

For the further analysis, we use the sampled version \(x[n]\) of the continuous-time signal \(x(t)\), sampled with frequency \(F_s=1/T\),

\begin{equation} \label {eq:base_signal} \begin{aligned} x[n] &= x(nT)\\ &= A\cos \left (\omega _0 n+\theta \right )\quad n=0,\ldots ,L-1, \end {aligned} \end{equation}

where

\begin{equation} \label {eq-signal-omega0} \begin{aligned} \omega _0 &= 2\pi F_0 T=2\pi \frac {F_0}{F_s}\\ \end {aligned} \end{equation}

is the angular frequency (measured in radians per sample) derived from the analog frequency \(F_0\) and \(L\) is the resulting number of samples.

Nyquist criterion

In order to accurately reproduce a cosine signal, the Nyquist criterion demands \(F_0<F_s/2\), which implies \(0\le \omega _0<\pi \). This requirement can be easily illustrated by the following example. Consider two signals:

\[ x_1 (t) = \cos (0.6\pi t ), \ \ \ x_2(t) = \cos (2.6\pi t) \]

Sampling with \(F_s=1\) Hz results in two identical signals,

\begin{align*} x_1[n] &= \cos (0.6\pi n ), \\ x_2[n] &= \cos (2.6\pi n) = \cos (0.6\pi n + 2 \pi n) = x_1[n]. \end{align*} This phenomenon is called aliasing. Therefore, the sampling frequency requirement is \(0\leqslant \omega <\pi \). Note, when \(\omega _0=0\) the signal is the DC level, \(y(t)=y[n]=A\cos (\theta )\). For \(\omega _0=\pi \) the signal is “alternating sign” signal \(y[n] = A(-1)^n\).

This relation holds for all the following discussions and derivations.

Energy & power

The energy of the signal \(x[n]\) is defined as

\begin{equation} E_\bx = \norm {\bx }^2 = \sum _{n=0}^{L-1} x^2[n], \end{equation}

where \(\bx \) is the vector of samples of the signal \(x[n]\). The corresponding power is time-averaged energy,

\begin{equation} \label {eq-signal-power} P_\bx = \frac {1}{L}E_\bx = \frac {1}{L}\norm {\bx }^2. \end{equation}

13.1.2 Noise

We define noisy signal

\begin{equation} y[n] = x[n] + \epsilon [n], \end{equation}

where \(\epsilon (t)\) is additive noise.

The only assumption for the additive noise is that it is zero-mean,

\begin{equation} \bar \epsilon = \frac {1}{L}\sum _{n=0}^{L-1}\epsilon [n] = 0. \end{equation}

No additional assumptions, such as Gaussianity, are applied; however, the special case of additive white Gaussian noise (AWGN) is further refined as tips for selected topics.

The interpretation of the noise (unbiased) variance is the power (13.5) of the zero-mean noise signal.

13.1.3 Basic Signal Analysis

Given the noise signal \(y[n]\), the goal is to find the parameters of \(x[n]\), as well as noise power, as illustrated in Fig. 13.1.

(image)

Figure 13.1: Basic signal analysis goals.
Linear regression

The analysis is based on the LS minimization with model,

\begin{equation} \hat {\by } = \bX \bw , \end{equation}

where \(\bw \) are learned weights. SSE loss is

\begin{equation} \loss (\bw ) = \norm {\by -\hat {\by }}^2 =\norm {\by - \bX \bw }^2 \end{equation}

with the loss minimum at

\begin{equation} \hat {\bw } = \left (\bX ^T\bX \right )^{-1}\bX ^T\by \end{equation}

Due to orthogonality,

\begin{equation} \norm {\by }^2 = \norm {\hat {\by }}^2 + \norm {\be }^2 \end{equation}

In the following, for each method, matrices \(\bX ,\by ,\bw \) and there relation to a sinusoidal signal are to be defined.

13.2 Amplitude estimation

The goal is to find the amplitude of a sinusoidal signal in noise that best fits the model in a least squares (LS) sense.

Given a signal model with \(\theta =0\) and a known frequency \(\omega _0\),

\begin{equation} x[n] = A\cos (\omega _0n) \end{equation}

the goal is to estimate the amplitude \(A\) from

\begin{equation} y[n] = x[n] + \epsilon [n]\quad n=0,\ldots ,L-1 \end{equation}

that best fits a provided model,

\begin{equation} \label {eq-amplitude-model} \hat {x}[n] = \hat {A}\cos (\omega _0n) \end{equation}

13.2.1 LS Formulation

Technically, we are looking for the value of \(A\) that minimizes the squared error,

\begin{equation} \begin{aligned} \loss =\sum _n (y[n]-\hat {x}[n])^2. \end {aligned} \end{equation}

The corresponding vector form of the SSE loss is

\begin{equation} \begin{aligned} \loss (A) &= \norm {\begin{bmatrix} y[0]\\ y[1] \\ y[2]\\ \vdots \\ y[L-1] \end {bmatrix} - \begin{bmatrix} 1 \\ \cos (\omega _0) \\ \cos (2\omega _0) \\ \vdots \\ \cos ((L-1)\omega _0) \end {bmatrix} A\,}^2\\ &=\norm {\by -\bX \bw }^2 \end {aligned} \end{equation}

where the required weight is \(\bw =A\), \(\bX \) is formed by samples of the signal \(\left \lbrace \cos (\omega _0n)\right \rbrace _{n=0}^{L-1}\) and \(\by \) is formed by samples of the signal \(\left \lbrace y[n]\right \rbrace _{n=0}^{L-1}\), The resulting minimization is straightforward,

\begin{equation} \label {eq:signal:amplitude} \begin{aligned} \hat {A} &= (\bX ^T\bX )^{-1}\bX ^T\by \\[2pt] &=\frac {\sum _n y[n]\cos (\omega _0n)}{\sum _n \cos [2](\omega _0n)} \end {aligned} \end{equation}

If we substitute the signal model into the resulting solution,

\begin{equation} \begin{aligned} \hat {A} &=\frac {\sum _n (A\cos (\omega _0n) + \epsilon [n])\cos (\omega _0n)}{\sum _n \cos [2](\omega _0n)}\\[3pt] &=A + \frac {\sum _n \epsilon [n]\cos (\omega _0)}{\sum _n \cos [2](\omega _0n)} \end {aligned} \end{equation}

it produces a true value of \(A\) with some additive noise.

13.2.2 Signal-to-Noise-Ratio

Using the interpretation of \(\norm {\by }^2\) as the signal energy and \(\norm {\hat {\by }}^2\) as an energy of the estimated sinusoidal signal (13.14), the residual error \(\norm {\mathbf {e}}^2\) is the estimation of the noise,

\begin{equation} \label {eq-signal-noise-orthogonality} \norm {\by }^2 = \norm {\hat {\by }}^2 + \norm {\mathbf {e}}^2\,\Rightarrow \, \begin{cases} E_\by = E_{\hat {\by }} + E_\be \\ P_\by = P_{\hat {\by }} + P_\be \end {cases} \end{equation}

Moreover, due to zero-mean property of the noise, the estimated power and/or variance of the noise (13.5) is

\begin{equation} \label {eq:signal:Anoise} \hat {P}_\epsilon = \hat {\sigma }_\epsilon ^2 = \frac {1}{L}\norm {\mathbf {e}}^2. \end{equation}

An interesting interpretation of this result is estimated signal to noise ratio (SNR), defined as

\begin{equation} \label {eq:signal:Asnr} \widehat {SNR} = \frac {P_{\hat {\by }}}{P_\be } =\frac {\norm {\hat {\by }}^2}{\norm {\mathbf {e}}^2} \end{equation}

The following example (Fig. 13.2) uses a synthetic cosine signal of length \(L = 51\) samples, angular frequency \(\omega _0 = 0.1\pi \) and amplitude \(A = 1.5\). Gaussian noise with standard deviation \(\sigma = 5\) is then added to create a noisy observation. A least-squares regression is applied to estimate the amplitude, yielding \(\hat {\sigma }_\epsilon =4.43\). Matlab code.

(image)

(a) Reconstructed signal, \(\hat {\mathbf {y}}\). Note the significant noise amount.

(image)

(b) Residual error. Ideally, if the model was perfect, the residual would be equal to the added noise.
Figure 13.2: Example of the cosine signal amplitude estimation.

13.3 Amplitude and phase estimation

The goal is to find both the amplitude and the phase of a sinusoidal signal in noise that best fits the model in a least squares (LS) sense. The following analysis is provided for the more general model,

\begin{equation} x[n] = A\cos \left (\omega _0n + \theta \right )\quad n=0,\ldots ,L-1, \end{equation}

The goal is to estimate amplitude \(A\) and the phase \(\theta \) that best fits a provided model,

\begin{equation} \hat {x}[n] = \hat {A}\cos \left (\omega _0n + \hat {\theta }\right ) \end{equation}

LS Formulation

The first step involves the use of trigonometric identities to express the cosine with a phase shift as a linear combination of sine and cosine signals,

\begin{align} A\cos \left (\omega _0n + \theta \right ) &=w_c\cos (\omega _0n) + w_s\sin (\omega _0n), \end{align} where

\begin{equation} \begin{aligned} w_c &= A\cos (\theta )\\ w_s &= -A\sin (\theta ) \end {aligned} \end{equation}

and

\begin{equation} \label {eq-cos-linear-coeff} \begin{aligned} A &= \sqrt {w_c^2 + w_s^2}\\ \theta &= -\arctan (\frac {w_c}{w_s}) \end {aligned} \end{equation}

This transforms the problem into a two-parameter linear LS problem in terms of \(w_c\) and \(w_s\) [?]. The resulting LS formulation involves a two-valued vector of linear coefficients, \(\bw = \begin {bmatrix} w_c & w_s\end {bmatrix}^T\), \(\by \) is formed by samples of the signal \(\left \lbrace y[n]\right \rbrace _{n=0}^{L-1}\), and the matrix \(\bX \) of dimensions \(L\times 2\) that is given by

\begin{equation} \label {eq-X-single-freq} \bX = \begin{bmatrix} 1 & 0 \\ \cos (\omega _0) & \sin (\omega _0) \\ \cos (2\omega _0) & \sin (2\omega _0) \\ \vdots & \vdots \\ \cos ((L-1)\omega _0) & \sin ((L-1)\omega _0) \end {bmatrix}. \end{equation}

Once \(\hat {\bw }\) has been found, the amplitude and phase can be recovered from (13.26).

SNR and noise variance interpretations are similar to in the previous model in Eqs. (13.21) and (13.20).

The numerical example is presented in Fig. 13.3. The configuration is similar to the previous figure, expect the lower noise variance, \(\sigma =1.5\). Nevertheless, there is a decrease in performance, since two parameters are estimated simultaneously. Matlab code.

(image)

Figure 13.3: Example of the cosine signal amplitude and phase estimation. Note the lower estimation accuracy compared to Fig. 13.2, since two parameters are estimated simultaneously.
13.3.1 Advanced notes (*)
  • This estimation procedure is optimal in the maximum likelihood (ML) sense under additive white Gaussian noise (AWGN) and achieves the Cramér-Rao lower bound (CRLB) [?, ?].

  • The theoretical lower bound (also termed Cramer-Rao lower bound(CRLB)) on the variance of estimation accuracy of \(w_c,w_s\) is given by [?, Eqs. (5.47-48)]

    \begin{equation} \label {eq-signals-Ap_cr} \Var [\hat {w}_{c,s}]\gtrsim \frac {2\sigma ^2}{L} \end{equation}

    This bound is the tightest for the AWGN case and is less accurate for other noise distributions.

  • While the derivation of CRLB is non-trivial, the approximated numerical estimation variance can be easily evaluated by Monte-Carlo simulations for any set of parameters and any distribution of interest.

13.4 Frequency estimation

The goal is to deal with the unknown \(A,\theta ,\omega _0\).

If the frequency \(\omega _0\) is also unknown, it can be estimated by searching for the \(\hat {\omega } _0\) that best fits a sinusoidal model for the observed data, i.e., that minimizes the residual error norm or maximizes the reconstructed signal energy. The corresponding matrix \(\bX \) may be parameterized as a frequency-dependent one, \(\bX (\omega )\). Here, the estimated signal is frequency-dependent

\begin{equation} \label {eq:signal:P_frequency_dependent} \hat {\by }(\omega ) =\bX (\omega )\bw (\omega ), \end{equation}

where \(\bw (\omega )\) are the estimated parameters \(w_c\) and \(w_s\) at that frequency. The corresponding frequency-dependent residual error is given by (see also (13.19))

\begin{equation} \mathbf {e}(\omega ) = \by - \hat {\by }(\omega ). \end{equation}

Since the error \(\mathbf {e}(\omega )\) is orthogonal to \(\hat {\by }\),

\begin{equation} \norm {\by }^2 = \norm {\hat {\by }(\omega )}^2 + \norm {\be (\omega )}^2. \end{equation}

To find the frequency that best represents the data, we seek the one that maximizes the energy of the reconstructed signal (or equivalently minimizes the residual error), as mentioned above

\begin{equation} \label {eq:signal:omega0} \hat {\omega }_0 = \arg \min \limits _{\omega } \norm {\mathbf {e}(\omega )}^2 = \arg \max \limits _{\omega } \norm {\hat {\mathbf {y}}(\omega )}^2. \end{equation}

Note, this optimization problem can be challenging because the objective function may exhibit multiple local maxima/minima. Therefore, an appropriate numerical global optimization method is required.

Once \(\hat {\omega }_0\) has been found, the amplitude and phase are estimated using the corresponding linear LS solution \(\bw (\omega _0)\) (Sec. 13.3). This solution also results in SNR and noise variance estimations, as in Eqs. (13.21) and (13.20).

A numerical example of the signal with additive white Gaussian noise (AWGN), and with the parameters \(A=1.5,\omega _0=0.1\pi ,\theta =-\pi /4\) and \(\sigma _\epsilon ^2=1\), is presented in Fig. 13.4. First, periodogram peak is found (Fig. 13.4a). Than, the subsequent amplitude/phase estimation result is presented (Fig. 13.4b). Matlab code.

(image)

(a) The periodogram \(P(\omega )\) with a prominent peak at \(\omega _0 \approx 0.1\pi \).

(image)

(b) Reconstracted signal.
Figure 13.4: The reconstruction in (b) uses the estimated amplitude, phase, and angular frequency \((\hat {A}, \hat {\theta }, \hat {\omega }_0)\) found by maximizing the periodogram in (a). Note the lower estimation accuracy than in Fig. 13.3, since three parameters are estimated.
13.4.1 Advanced notes (*)
Interpretation in Terms of the Periodogram

The function

\begin{equation} \label {eq:signal:periodogram} P(\omega ) = \frac {1}{L}\norm {\hat {\by }(\omega )}^2 \end{equation}

as a function of \(\omega \) is termed a periodogram that is a frequency-dependent quantification of signal power that approximates the power spectral density (PSD) of the signal. By scanning over frequencies, the maximum periodogram value is taken as the frequency estimate,

\begin{equation} \hat {\omega }_0 = \arg \min \limits _{\omega } P(\omega ) \end{equation}

The complexity of frequency estimation (without amplitude/phase step) can significantly reduced by noting that

\begin{equation} \arg \max \limits _{\omega } \norm {\hat {\by }(\omega )}^2 \propto \arg \max \limits _{\omega } \abs {\sum _{n=0}^{L-1}{[n]\exp (-j\omega n)}}^2 \end{equation}

However, this definition involves frequencies in the range \(0\le \omega <2\pi \); these has challenging physical interpretation.

Theoretical performance bounds

Under AWGN assumption, theoretical SNR is given by

\begin{equation} SNR = \frac {A^2}{2\sigma ^2} \end{equation}

and the corresponding CRLB on the estimation variances are [?]

\begin{align} \Var [\hat {A}] &\geqslant \frac {2\sigma ^2}{L}\quad [V^2] \\ \Var [\hat {\omega }_0] &\geqslant \frac {12}{SNR\times L(L^2-1)}\approx \frac {12}{SNR\times L^3}\quad \left [\left (\frac {rad}{sample}\right )^2\right ] \\ \Var [\hat {\theta }] &\geqslant \frac {2(2L-1)}{SNR\times L(L+1)}\approx \frac {4}{SNR\times L}\quad [rad^2] \end{align} For analog frequency \(F_0 = \dfrac {\omega _0}{2\pi }F_s\),

\begin{equation} \Var [F_0] = \Var [\omega _0]\left (\frac {F_s}{2\pi }\right )^2 \quad [Hz^2] \end{equation}

13.5 Harmonic Signal Analysis

Signal model

A particularly important class of signals encountered in many practical applications is the harmonic or periodic signal (also termed Fourier series). Such a signal can be expressed as a sum of cosine terms whose frequencies are integer multiples (harmonics) of a fundamental frequency \(\omega _0\).

\begin{equation} y[n] = A_0 + \sum _{m=1}^M A_m\cos (m\omega _0 n + \theta _m) = \sum _{m=\mathbf {0}}^M A_m\cos (m\omega _0 n + \theta _m), \end{equation}

where:

  • \(A_0\) is the constant (DC) component,

  • \(A_m\) and \(\theta _m\) represent the amplitude and phase of the \(m\)-th harmonic,

  • \(\omega _0\) is the fundamental angular frequency,

  • \(m\omega _0\) corresponds to the frequency of the \(m\)-th harmonic,

  • and \(M\) is the number of harmonics in the model.

The model order \(M\) (number of harmonics) is a hyper-parameter that should be chosen carefully. Too few harmonics can fail to capture essential signal structure, while too many may overfit noise. The maximum value of \(M\) is bounded by the Nyquist criterion, \(M\omega _0<\pi \).

Total harmonic distortion (THD) is a measure commonly used in electrical engineering, audio processing, and other fields to quantify how much the harmonic components of a signal differ from a pure sinusoid at the fundamental frequency. It is defined as the ratio of the root-sum-square of the harmonic amplitudes and the amplitude of the fundamental frequency,

\begin{equation} THD = \frac {\sqrt {\sum _{m=2}^M A_m^2}}{A_1}. \end{equation}

A lower THD value indicates that the signal is closer to a pure sinusoidal shape, whereas a higher THD signifies a stronger presence of higher-order harmonics.

Amplitudes and phases estimation

Given \(\omega _0\), the model is linear in terms of the unknown parameters \(\{A_m, \theta _m\}\) for each harmonic \(m=1,\ldots ,M\). Similar to the single-frequency case, the LS matrix \(\bX \) is constructed with columns corresponding to \(\cos (m\omega _0 n)\) and \(\sin (m\omega _0 n)\) for \(m=1,\ldots ,M\), plus a column of ones for the DC component. Each pair \((A_m,\theta _m)\) can be recovered from the LS estimated cosine and sine coefficients in the manner described for single-frequency amplitude-phase estimation. The resulting SNR and noise variance estimates are similar to those described in the previous sections.

Fundamental frequency estimation

If \(\omega _0\) is not known, the approach that is described in the frequency estimation section can also be applied here. Once \(\hat {\omega }_0\) has been determined from a maximum of the harmonic periodogram,

\begin{equation} P_h(\omega ) = \frac {1}{L}\sum _{m=1}^M\norm {\hat {\by }(m\omega )}^2, \end{equation}

the harmonic amplitudes and phases can be estimated via LS at this frequency [?].

The example is the sampled current of a switch-mode power supply in a 50Hz network sampled at a 50kHz frequency [?]. Figure 13.5a shows a reconstruction of the signal with \(M=250\) harmonics. The estimated amplitudes \(\hat {A}_m\) are shown (Fig. 13.5b) as a function of the harmonic index \(m\), including the DC term at \(m = 0\). A larger magnitude indicates a more prominent harmonic component. The first non-DC harmonic amplitude \(m=1\) corresponds to the fundamental frequency, \(\omega _0\), while higher indices capture additional harmonics in the signal. The estimated fundamental frequency is 50.104Hz with the corresponding THD of about 1.6 ratio (\(\approx 160\%\)). Figure 13.5c shows estimated SNR (top) and the noise standard deviation (bottom) vary as the number of harmonics \(M\) in the model increases. Matlab code.

(image)

(a) The upper plot shows the signal overlaid with the least-squares harmonic reconstruction using the estimated frequency, amplitude, and phase parameters. The lower plots zoom in on a smaller portion of the time axis for different values of \(M\), and demonstrate the challenging shape of the signal.

(image)

(b) Estimated harmonic amplitudes, \(A_m\).

(image)

(c) Estimated SNR and noise std, \(\hat {\sigma }_\epsilon \).
Figure 13.5: Example of a harmonic signal analysis.
13.5.1 Advanced notes (*)

The presented estimator for both known and unknown \(\omega _0\) is an effective ML estimator in the case of AWGN, with known analytical CRLB [?].

13.6 Discrete Fourier Transform (DFT)

13.6.1 Definition

The discrete Fourier transform (DFT) is a special case of the harmonic model with:

  • \(N\ge L\) harmonics,

  • fundamental angular frequency \(\omega _0 = \dfrac {2\pi }{N}\).

Formally speaking, DFT is harmonic decomposition of a finite-length signal into a sum of \(N\) harmonically related sinusoids that are evenly spaced in frequency.

DFT representation states that:

  • any arbitrary, finite-time signal \(y[n]\) may be represented as a sum of sinusoidal signals,

    \begin{equation} y[n] = \sum _{k=0}^{N-1}A_k\cos (k\frac {2\pi }{N}n+\theta _k),\quad n=0,\ldots ,L-1 \end{equation}

  • For \(N \geq L\), the DFT allows for perfect reconstruction of the signal using its harmonic representation parameters, \(\left \lbrace A_k,\theta _k\right \rbrace _{k=0}^{N-1}\).

The corresponding LS formulation follows the harmonic model above for known \(\omega _0\).

13.6.2 Properties
Zero-padding

The basic requirement is \(N=L\) harmonic signals. Sometimes, denser spectral representation with \(N>L\) is required. In this case, the corresponding time-domain signal can be interpreted as the original signal \(y[n]\) with \(N-L\) zeros padded at the end of this signal. This case it termed zero padding.

Symmetry

It is also worth noting the symmetry in the DFT,

\begin{equation} \begin{aligned} \cos ((N-k)\Delta \omega ) &= \cos (k\Delta \omega )\\ \sin ((N-k)\Delta \omega ) &= -\sin (k\Delta \omega ), \end {aligned} \end{equation}

resulting in redundant information for frequencies \(k\omega _0\) above and below \(\pi \),

\begin{equation} \begin{aligned} A_k &= A_{N-k}\\ \theta _k &= -\theta _{N-k} \end {aligned}. \end{equation}

As a result, only frequencies \(k\omega _0 \leq \pi \) need to be considered uniquely.

13.6.3 Power Spectral Density

The power of a signal of the form

\begin{equation} x_k[n] = A_k\cos (k\frac {2\pi }{N}n + \theta _k) \end{equation}

is

\begin{equation} S_\by [k]=P_{\by _k} = \frac {1}{L}\norm {\by _k}^2 = \frac {A_k^2}{2}. \end{equation}

This value is known as the power spectral density (PSD) at the frequency \(\omega = k\frac {2\pi }{N}\). The corresponding squared magnitude values \(A_k^2/2\) are known as the discrete-frequency periodogram (Eq. (13.33)), and this is the basic method for the PSD estimation of a signal. Plotting such a periodogram gives a frequency-domain representation of the signal’s power distribution, highlighting which frequencies carry the most power.

Parseval’s Theorem DFT is energy conservation transform with

\begin{equation} \sum _{k=0}^{N-1} A_k^2 = \frac {1}{L}\norm {\by }^2. \end{equation}

13.6.4 Advanced notes (*)
Fast Fourier transform (FFT) & Goertzel algorithm

The fast Fourier transform (FFT) algorithm efficiently computes \(Y[k]=\DFTtr {y[n]}\), providing \(A_k = |Y[k]|/N\) and \(\theta _k = \angle (Y[k])\) with significantly lower memory requirements and complexity than direct calculation in Eq. (13.59).

When only a single frequency value is of interest, Goertzel algorithm is more efficient method for the task. Moreover, it can be used for computationally effective peaking of the maximum in Eq. (13.32).

Parseval’s Theorem Using FFT notation,

\begin{equation} A_k^2 = \abs {Y[k]}^2/L^2\Rightarrow \sum _{k=0}^{L-1} \abs {Y[k]}^2 = L\norm {\by }^2. \end{equation}

Sinusoidal signal power

For a more general case of an arbitrary \(\omega \) value, the signal of the form

\begin{equation} y[n] = A\cos (\omega _0 n) \end{equation}

has the \(\omega _0\)-dependent power,

\begin{equation} \begin{aligned}\label {eq-cos-power} P_\by &= \frac {A^2}{L}\sum _{n=0}^{L-1}\cos [2](\omega _0 n)\\ &=\frac {A^2}{4 L}\left (1+2 L-\frac {\sin (\omega _0-2 L \omega _0)}{\sin (\omega _0)}\right ), \end {aligned} \end{equation}

that results from the time-limited origin of the signal \(y[n]\). For the infinite number of samples, the resulting power converges to a continuous-time power expression,

\begin{equation} \lim \limits _{L\rightarrow \infty } P_\by \rightarrow \frac {A^2}{2} \end{equation}

\begin{equation} \sum _n \cos [2](\omega _0n)\approx \frac {L}{2}\qquad \omega _0\ne 0,\pi \end{equation}

The illustration of the (13.52) is presented in Fig. 13.6.

(image)

Figure 13.6: Power of the sinusoidal signal as a function of \(\omega _0\) for different values of \(L\).
Single frequency analysis

Consider a signal \(y_k[n]\) with a discrete frequency \(\omega _k = \dfrac {2\pi }{N}k\) is given,

\begin{equation} y_n[k] = A\cos (\omega _k n + \theta ) = A\cos (\frac {2\pi }{N}k n + \theta ) \end{equation}

To estimate amplitude and phase at this predefined frequency, we can form a matrix \(\bX \),

\[ \bX _k = \begin {bmatrix} \bx _c & \bx _s \end {bmatrix}, \]

where

\[ \bx _c = \begin {bmatrix} \cos (2\pi \frac {k}{N}\cdot 0) \\ \cos (2\pi \frac {k}{N}\cdot 1) \\ \vdots \\ \cos (2\pi \frac {k}{N}(L-1)) \end {bmatrix} \ \text {and} \ \bx _s = \begin {bmatrix} \sin (2\pi \frac {k}{N}\cdot 0) \\ \sin (2\pi \frac {k}{N}\cdot 1) \\ \vdots \\ \sin (2\pi \frac {k}{N}(L-1)) \end {bmatrix}. \]

By evaluating

\begin{equation} \bX _k^T \bX _k = \begin{bmatrix} \bx _c^T\bx _c & \bx _s^T\bx _c\\[5pt] \bx _s^T\bx _c & \bx _s^T\bx _s \end {bmatrix} \end{equation}

we find that the sine and cosine columns form an orthogonal basis for this single frequency, with (irrespective to \(k\))

\begin{equation} \label {eq:orthogonality} \begin{aligned} \bx _{c}^T\bx _{c} &= \sum _{n=0}^{N-1}\cos [2](2\pi \frac {k}{N}n) = \frac {N}{2}\\ \bx _{c}^T\bx _{s} &= \sum _{n=0}^{N-1}\cos (2\pi \frac {k}{N}n)\sin (2\pi \frac {k}{N}n) = 0\\ &= \bx _s^T\bx _c\\ \bx _{s}^T\bx _{s} &= \sum _{n=0}^{N-1}\sin [2](2\pi \frac {k}{N}n) = \frac {N}{2} \end {aligned} \end{equation}

Stacking these results for all \(k=0,\ldots ,N-1\) yields the complete DFT matrix forms a complete orthogonal basis for the \(L\)-sample signal space. The further discussion of \(\left (\bX ^T\bX \right )^{-1}\) matrix properties can be found in Examples 4.2 and 8.5 in [?].

Moreover, since \(\left (\bX _k^T\bX _k\right )^{-1}\) takes a particularly simple diagonal form,

\begin{equation} \left (\bX _k^T\bX _k\right )^{-1} = \begin{bmatrix} \frac {2}{N} & 0\\[5pt] 0 & \frac {2}{N} \end {bmatrix} \end{equation}

and the least squares solution \(\hat {\bw }\) for the parameters \(w_{c,k}\) and \(w_{s,k}\) (corresponding to amplitude and phase components at \(\omega _k\)) is

\begin{align} w_{c,k} &= \frac {2}{N}\sum _{n=0}^{L-1} y[n]\cos \left (2\pi \frac {k}{N}n\right ), \\ w_{s,k} &= \frac {2}{N}\sum _{n=0}^{L-1} y[n]\sin \left (2\pi \frac {k}{N}n\right ). \end{align}

General orthogonality

The orthogonality of \(\bX ^T\bX \) in more general form results from

\begin{align} \sum _{n=0}^{N-1}\cos (2\pi \frac {j}{N}n)\cos (2\pi \frac {k}{N}n) &= \frac {N}{2}\delta \left [j-k\right ]\\ \sum _{n=0}^{N-1}\cos (2\pi \frac {k}{N}n)\sin (2\pi \frac {k}{N}n) &= 0\quad \forall j,k\\ \sum _{n=0}^{N-1}\sin (2\pi \frac {j}{N}n)\sin (2\pi \frac {k}{N}n) &= \frac {N}{2}\delta \left [j-k\right ], \end{align}

Spectral Spreading and Leakage

In an idealized setting, a pure cosine signal has a perfectly defined frequency representation. For instance, consider the discrete-time signal,

\begin{equation} \label {eq:signal:single_freq_cos} x[n] = A\cos (k_0\frac {2\pi }{L}n),\ k_0\in \left \lbrace 1,\cdots ,L-1 \right \rbrace \end{equation}

where \(k_0\) is the frequency index. The Fourier transform of this signal yields a single spectral component at frequency \(w_0=k_0\frac {2\pi }{L}\), such that the spectral amplitude \(A_k\) at each value of \(k\) is given by

\begin{equation} A_k = \begin{cases} \frac {A}{2} & k=k_0,N-k_0\\ 0 & \text {otherwise} \end {cases}. \end{equation}

Under these conditions, the signal’s spectral representation seems to be strictly localized at the specific frequency \(\omega _k\), with no energy distributed elsewhere in the spectrum.

(image)

Figure 13.7: Illustration of a single-frequency cosine signal’s spectrum under ideal assumptions (discrete, integer-multiple frequencies) versus practical conditions (denser frequency grid or non-integer frequencies). Note how the ideal single peak broadens and additional low-level components appear, highlighting the effects of spectral spreading and leakage.

However, practical scenarios deviate from this ideal case. In particular, if a denser frequency grid is employed (i.e. \(N>L\)) or the frequency varies continuously (as in Eq. (13.29)), the resulting spectral distribution can differ substantially from the discrete, single-peak ideal (Fig. 13.7). This difference arises because, in general, \(\bX (\omega )^T\bX (\omega )\) is not orthogonal as in Eq. (13.57). As a result, two effects are introduced:

  • The main frequency peak broadens, resulting in “spectral spreading”.

  • Additional frequency components emerge beyond the broadened main peak, termed “spectral leakage.”

13.6.5 Short-Time Fourier Transform (STFT)

All the analysis presented above, e.g. harmonic analysis and DFT, assume that the signal has time-constant harmonic components (termed stationary signal) over the entire observation interval of \(L\) samples. In many practical applications, however, the frequency content of a signal changes over time.

  • Goal: Capture time-varying spectral behavior.

The short-time Fourier transform (STFT) addresses analysis of time-varying spectral behavior by applying the DFT to short, overlapping segments of the signal. A window function \(w[n]\) of length \(N_w\) is used to extract a local segment centered around a given time instant. The \(\ell \)-th windowed segment is defined as

\begin{equation} y_\ell [n] = w[n]\cdot y[\ell H + n],\quad n=0,\ldots ,N_w-1, \end{equation}

where \(H\) is the hop size (the number of samples between consecutive windows) and \(\ell = 0,1,2,\ldots \) is the segment (frame) index.

Each windowed segment \(y_\ell [n]\) is then analyzed using the DFT framework described above, i.e., the LS matrix \(\bX \) of dimension \(N_w\times 2N_w\) with cosine and sine columns at the DFT frequencies \(\omega _k = \frac {2\pi k}{N_w}\) is applied to the segment. The resulting DFT coefficients become time-dependent,

\begin{equation} A_k[\ell ],\quad \theta _k[\ell ],\quad k=0,\ldots ,N_w-1, \end{equation}

providing the amplitude and phase of each frequency component at each time frame \(\ell \).

The following example uses a chirp signal whose instantaneous frequency sweeps linearly from 50 Hz to 400 Hz over 1 s, sampled at \(F_s=1000\) Hz, with additive Gaussian noise (\(\sigma =0.5\)). The time-domain signal is shown in Fig. 13.8a and its spectrogram (\(N_w=128\), \(H=16\), Hann window) in Fig. 13.8b.

(image)

(a) Time-domain chirp signal with linearly increasing frequency and additive noise.

(image)

(b) Spectrogram (\(N_w=128\), \(H=16\)). The linearly increasing frequency is clearly visible.
Figure 13.8: Example of STFT analysis of a chirp signal.
Time-frequency trade-off

The choice of the window length \(N_w\) involves a fundamental trade-off:

  • A longer window (large \(N_w\)) provides better frequency resolution (smaller \(\Delta \omega = \frac {2\pi }{N_w}\)) but poorer time localization, since each segment spans a longer time interval.

  • A shorter window (small \(N_w\)) provides better time localization but coarser frequency resolution.

This time-frequency trade-off is inherent to STFT and cannot be resolved by the choice of window function alone. Figure 13.9 illustrates this trade-off: a short window (\(N_w=32\)) provides sharp time localization but blurred frequency content, while a long window (\(N_w=256\)) yields precise frequency resolution at the cost of temporal smearing.

(image)

Figure 13.9: Time-frequency trade-off. Left: short window (\(N_w=32\)) — good time resolution, poor frequency resolution. Right: long window (\(N_w=256\)) — good frequency resolution, poor time resolution.
Spectrogram

The power spectral density of each segment defines a time-frequency representation known as the spectrogram,

\begin{equation} S[\ell , k] = \frac {A_k^2[\ell ]}{2}, \end{equation}

which displays the signal’s power distribution as a function of both time (frame index \(\ell \)) and frequency (index \(k\)). The spectrogram is one of the most widely used tools for visualizing non-stationary signals and serves as a basis for feature extraction in signal classification tasks.

Software reference

Common implementations of STFT and spectrogram:

Summary

The STFT extends the DFT to non-stationary signals by applying windowed DFT analysis to overlapping segments of the signal. The key parameters are:

  • Window length \(N_w\) — controls the time-frequency resolution trade-off.

  • Hop size \(H\) — determines the overlap between consecutive frames.

  • Window function \(w[n]\) — shapes the spectral characteristics of each segment (e.g., Hann, Hamming).

The resulting spectrogram \(S[\ell , k]\) provides a real-valued two-dimensional representation of the signal’s power as a function of time and frequency.

13.7 Summary

The summary of the presented approach is shown in Table 13.1. The presented approach involves a design of matrix \(\bX \) and using LS to estimate unknown parameters. The key addressed task are as follows.

Amplitude Estimation With a known frequency \(\omega _0\), the amplitude \(A\) is found via LS. The resulting residuals provide noise variance and SNR estimates.

Amplitude and Phase Estimation: For known \(\omega _0\), rewriting

\[ A\cos (\omega _0 n + \theta ) = w_c\cos (\omega _0 n) + w_s\sin (\omega _0 n) \]

transforms the problem into a two-parameter LS regression.

Frequency Estimation: If \(\omega _0\) is unknown, it is found by searching for the frequency that maximizes the fitted signal energy.

Harmonic Signal Analysis: Signals can be expressed as sums of multiple harmonics. Extending the LS approach to multiple harmonics allows estimation of each amplitude and phase. THD quantifies deviations from a pure tone.

Discrete Fourier Transform (DFT): The DFT is a special case of harmonic modeling, decomposing a signal into equally spaced frequency components. Efficiently computed by the FFT, the DFT is central to signal spectral analysis.

STFT: For non-stationary signals, the STFT applies the DFT to short, overlapping windowed segments, producing time-dependent spectral coefficients. The resulting spectrogram provides a two-dimensional time-frequency representation of the signal’s power distribution. The window length \(N_w\) controls the trade-off between time and frequency resolution.

Although the estimators presented above have been extensively analyzed for the specific case of additive white Gaussian noise (AWGN) in the statistical signal processing literature [?, ?], conducting such an analysis requires a significantly more extensive mathematical framework. Furthermore, it is worth noting that any bias and variance in these estimators can be readily approximated via Monte Carlo simulations under various parameter settings and noise distributions.

Table 13.1: Comparison and summary of different signal estimation methods.
.
Task Parameters Matrix \(\bX \) SNR
Amplitude only \(A\) given \(\omega _0,\theta =0\) A single column of \(\cos (\omega _0n)\) \(\dfrac {\norm {\hat {\by }}^2}{\norm {\mathbf {e}}^2}\)
Amplitude & phase \(A,\theta \) given \(\omega _0\) Two columns of \(\cos (\omega _0n)\) and \(\sin (\omega _0n)\) \(\dfrac {\norm {\hat {\by }}^2}{\norm {\mathbf {e}}^2}\)
Frequency estimation \(\omega _0, A,\theta \) Frequency-dependent \(\cos (\omega n)\) and \(\sin (\omega n)\) columns Maximum of \(\dfrac {\norm {\hat {\by }(\omega )}^2}{\norm {\be (\omega )}^2}\)
Fourier series (harmonic decomposition) \(A_0,\left \lbrace A_m,\theta _m\right \rbrace _{m=1}^{M}\), possibly \(\omega _0\) Harmonic cos/sin columns at multiples of \(\omega _0\), \(\cos (m\omega _0 n)\), \(\sin (m\omega _0 n)\) \(\dfrac {\norm {\hat {\by }}^2}{\norm {\mathbf {e}}^2}\), can include frequency dependence if \(\omega _0\) unknown
DFT \(\left \lbrace A_k,\theta _k\right \rbrace _{k=0}^{N-1}\) Multiple pairs of columns \(\cos (\tfrac {2\pi k}{N} n)\), \(\sin (\tfrac {2\pi k}{N} n)\) for \(k=0,\dots ,N-1\) Not used directly. Perfect reconstruction for \(N\geq L\)
Table 13.2: Summary of \(\bX \) matrices.
.
Task \(\bX \)
Amplitude \(\bX = \begin {bmatrix} 1 \\ \cos (\omega _0) \\ \cos (2\omega _0) \\ \cdots \\ \cos ((L-1)\omega _0) \end {bmatrix}\)
Amplitude & phase \(\bX = \begin {bmatrix} 1 \\ \cos (\omega _0) & \sin (\omega _0) \\ \cos (2\omega _0) & \sin (2\omega _0) \\ \vdots & \vdots \\ \cos ((L-1)\omega _0) & \sin ((L-1)\omega _0) \end {bmatrix}\)
Frequency \(\bX (\omega ) = \begin {bmatrix} 1 \\ \cos (\omega ) & \sin (\omega ) \\ \cos (2\omega ) & \sin (2\omega ) \\ \vdots & \vdots \\ \cos ((L-1)\omega ) & \sin ((L-1)\omega ) \end {bmatrix}\)
Fourier series, \(\omega _0\) known \(\bX = \begin {bmatrix} 1 & \cdots & 1 & 0 & \cdots \\ 1 & \cdots & \cos (\omega _0m) & \sin (\omega _0m) & \cdots \\ 1 & \cdots & \cos (2\omega _0m) & \sin (2\omega _0m) & \cdots \\ \vdots & \vdots & \vdots & \vdots & \ddots \\ 1 & \cdots & \cos ((L-1)\omega _0m) & \sin ((L-1)\omega _0m) & \cdots \\ \end {bmatrix}\)
Fourier series, \(\omega _0\) unknown \(\bX (\omega )\)
DFT \(\bX = \begin {bmatrix} 1 & \cdots & 1 & 0 & \cdots \\ 1 & \cdots & \cos (\frac {2\pi }{N}k) & \sin (\frac {2\pi }{N}k) & \cdots \\[6pt] 1 & \cdots & \cos (2\frac {2\pi }{N}k) & \sin (2\frac {2\pi }{N}k) & \cdots \\ \vdots & & \vdots & \vdots & \ddots \\ 1 & \cdots & \sin ((L-1)\frac {2\pi }{N}k) & \sin ((L-1)\frac {2\pi }{N}k) & \cdots \end {bmatrix}\)

Answer of exercise 3 

\begin{align*} \loss (A) &= \sum _n \left (x[n]-A\right )^2\\ \frac {\partial }{\partial A}\loss (A) &= 2\sum _n \left (x[n]-A\right )(-1)=0\\ LA &= \sum _{n=0}^{L-1}x[n]\\ A&=\bar {x}[n]\\ MSE&=\frac {1}{L}\sum _{n=0}^{L-1}\left (x[n]-\bar {x}[n]\right )^2=\Var [x[n]] \end{align*}

Answer of exercise 4 The solution is based on

\begin{equation} \bX = \begin{bmatrix} \bOne _L & \bn \end {bmatrix} = \begin{bmatrix} 1 & 0\\ 1 & 1 \\ \vdots & \vdots \\ 1 & L - 1 \end {bmatrix} \end{equation}

The closed-form analysis is:

\begin{align*} \bX ^T\bX &= \begin{bmatrix} \bOne ^T\bOne & \bOne ^T\bn \\[4pt] \bn ^T\bOne & \bn ^T\bn \end {bmatrix}\\ \bOne ^T\bOne &= L\\ \bOne ^T\bn &= 0 + 1 + \cdots + L-1 = \frac {L}{2}(L-1)\\ &=\bn ^T\bOne \\ \bn ^T\bn &= 0 + 1^2 + 2^2 + (L-1)^2 \\ &= \frac {1}{6} (L-1) L (2L-1)\\ \left (\bX ^T\bX \right )^{-1} &= \begin{bmatrix} \dfrac {6}{L+1}-\dfrac {2}{L} & -\dfrac {6}{L^2+L} \\[10pt] -\dfrac {6}{L^2+L} & \dfrac {12}{L^3-L} \\ \end {bmatrix}\\[4pt] \bX ^T\by &= \begin{bmatrix} \bOne ^T \by \\[4pt] \bn ^T \by \end {bmatrix}\\[4pt] &=\begin{bmatrix} \sum _n y[n]\\[4pt] \sum _n ny[y] \end {bmatrix}\\ \begin{bmatrix} \hat {A} \\ \hat {B} \end {bmatrix} &= \left (\bX ^T\bX \right )^{-1}\bX ^T\by \end{align*}