Machine Learning & Signals Learning

$\newcommand{\footnotename}{footnote}$ $\def \LWRfootnote {1}$ $\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}$ $\let \LWRorighspace \hspace $ $\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }$ $\newcommand {\TextOrMath }[2]{#2}$ $\newcommand {\mathnormal }[1]{{#1}}$ $\newcommand \ensuremath [1]{#1}$ $\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } $ $\newcommand {\setlength }[2]{}$ $\newcommand {\addtolength }[2]{}$ $\newcommand {\setcounter }[2]{}$ $\newcommand {\addtocounter }[2]{}$ $\newcommand {\arabic }[1]{}$ $\newcommand {\number }[1]{}$ $\newcommand {\noalign }[1]{\text {#1}\notag \\}$ $\newcommand {\cline }[1]{}$ $\newcommand {\directlua }[1]{\text {(directlua)}}$ $\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}$ $\newcommand {\protect }{}$ $\def \LWRabsorbnumber #1 {}$ $\def \LWRabsorbquotenumber "#1 {}$ $\newcommand {\LWRabsorboption }[1][]{}$ $\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }$ $\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }$ $\def \mathcode #1={\mathchar }$ $\let \delcode \mathcode $ $\let \delimiter \mathchar $ $\def \oe {\unicode {x0153}}$ $\def \OE {\unicode {x0152}}$ $\def \ae {\unicode {x00E6}}$ $\def \AE {\unicode {x00C6}}$ $\def \aa {\unicode {x00E5}}$ $\def \AA {\unicode {x00C5}}$ $\def \o {\unicode {x00F8}}$ $\def \O {\unicode {x00D8}}$ $\def \l {\unicode {x0142}}$ $\def \L {\unicode {x0141}}$ $\def \ss {\unicode {x00DF}}$ $\def \SS {\unicode {x1E9E}}$ $\def \dag {\unicode {x2020}}$ $\def \ddag {\unicode {x2021}}$ $\def \P {\unicode {x00B6}}$ $\def \copyright {\unicode {x00A9}}$ $\def \pounds {\unicode {x00A3}}$ $\let \LWRref \ref $ $\renewcommand {\ref }{\ifstar \LWRref \LWRref }$ $ \newcommand {\multicolumn }[3]{#3}$ $\require {textcomp}$ $ \newcommand {\abs }[1]{\lvert #1\rvert } $ $ \DeclareMathOperator {\sign }{sign} $ $\newcommand {\intertext }[1]{\text {#1}\notag \\}$ $\let \Hat \hat $ $\let \Check \check $ $\let \Tilde \tilde $ $\let \Acute \acute $ $\let \Grave \grave $ $\let \Dot \dot $ $\let \Ddot \ddot $ $\let \Breve \breve $ $\let \Bar \bar $ $\let \Vec \vec $ $\newcommand {\bm }[1]{\boldsymbol {#1}}$ $\require {physics}$ $\newcommand {\LWRphystrig }[2]{\ifblank {#1}{\textrm {#2}}{\textrm {#2}^{#1}}}$ $\renewcommand {\sin }[1][]{\LWRphystrig {#1}{sin}}$ $\renewcommand {\sinh }[1][]{\LWRphystrig {#1}{sinh}}$ $\renewcommand {\arcsin }[1][]{\LWRphystrig {#1}{arcsin}}$ $\renewcommand {\asin }[1][]{\LWRphystrig {#1}{asin}}$ $\renewcommand {\cos }[1][]{\LWRphystrig {#1}{cos}}$ $\renewcommand {\cosh }[1][]{\LWRphystrig {#1}{cosh}}$ $\renewcommand {\arccos }[1][]{\LWRphystrig {#1}{arcos}}$ $\renewcommand {\acos }[1][]{\LWRphystrig {#1}{acos}}$ $\renewcommand {\tan }[1][]{\LWRphystrig {#1}{tan}}$ $\renewcommand {\tanh }[1][]{\LWRphystrig {#1}{tanh}}$ $\renewcommand {\arctan }[1][]{\LWRphystrig {#1}{arctan}}$ $\renewcommand {\atan }[1][]{\LWRphystrig {#1}{atan}}$ $\renewcommand {\csc }[1][]{\LWRphystrig {#1}{csc}}$ $\renewcommand {\csch }[1][]{\LWRphystrig {#1}{csch}}$ $\renewcommand {\arccsc }[1][]{\LWRphystrig {#1}{arccsc}}$ $\renewcommand {\acsc }[1][]{\LWRphystrig {#1}{acsc}}$ $\renewcommand {\sec }[1][]{\LWRphystrig {#1}{sec}}$ $\renewcommand {\sech }[1][]{\LWRphystrig {#1}{sech}}$ $\renewcommand {\arcsec }[1][]{\LWRphystrig {#1}{arcsec}}$ $\renewcommand {\asec }[1][]{\LWRphystrig {#1}{asec}}$ $\renewcommand {\cot }[1][]{\LWRphystrig {#1}{cot}}$ $\renewcommand {\coth }[1][]{\LWRphystrig {#1}{coth}}$ $\renewcommand {\arccot }[1][]{\LWRphystrig {#1}{arccot}}$ $\renewcommand {\acot }[1][]{\LWRphystrig {#1}{acot}}$ $\require {cancel}$ $\newcommand *{\underuparrow }[1]{{\underset {\uparrow }{#1}}}$ $\DeclareMathOperator *{\argmax }{argmax}$ $\DeclareMathOperator *{\argmin }{arg\,min}$ $\def \E [#1]{\mathbb {E}\!\left [ #1 \right ]}$ $\def \Var [#1]{\operatorname {Var}\!\left [ #1 \right ]}$ $\def \Cov [#1]{\operatorname {Cov}\!\left [ #1 \right ]}$ $\newcommand {\floor }[1]{\lfloor #1 \rfloor }$ $\newcommand {\DTFTH }{ H \brk 1{e^{j\omega }}}$ $\newcommand {\DTFTX }{ X\brk 1{e^{j\omega }}}$ $\newcommand {\DFTtr }[1]{\mathrm {DFT}\left \{#1\right \}}$ $\newcommand {\DTFTtr }[1]{\mathrm {DTFT}\left \{#1\right \}}$ $\newcommand {\DTFTtrI }[1]{\mathrm {DTFT^{-1}}\left \{#1\right \}}$ $\newcommand {\Ftr }[1]{ \mathcal {F}\left \{#1\right \}}$ $\newcommand {\FtrI }[1]{ \mathcal {F}^{-1}\left \{#1\right \}}$ $\newcommand {\Zover }{\overset {\mathscr Z}{\Longleftrightarrow }}$ $\renewcommand {\real }{\mathbb {R}}$ $\newcommand {\ba }{\mathbf {a}}$ $\newcommand {\bb }{\mathbf {b}}$ $\newcommand {\bd }{\mathbf {d}}$ $\newcommand {\be }{\mathbf {e}}$ $\newcommand {\bh }{\mathbf {h}}$ $\newcommand {\bn }{\mathbf {n}}$ $\newcommand {\bq }{\mathbf {q}}$ $\newcommand {\br }{\mathbf {r}}$ $\newcommand {\bt }{\mathbf {t}}$ $\newcommand {\bv }{\mathbf {v}}$ $\newcommand {\bw }{\mathbf {w}}$ $\newcommand {\bx }{\mathbf {x}}$ $\newcommand {\bxx }{\mathbf {xx}}$ $\newcommand {\bxy }{\mathbf {xy}}$ $\newcommand {\by }{\mathbf {y}}$ $\newcommand {\byy }{\mathbf {yy}}$ $\newcommand {\bz }{\mathbf {z}}$ $\newcommand {\bA }{\mathbf {A}}$ $\newcommand {\bB }{\mathbf {B}}$ $\newcommand {\bI }{\mathbf {I}}$ $\newcommand {\bK }{\mathbf {K}}$ $\newcommand {\bP }{\mathbf {P}}$ $\newcommand {\bQ }{\mathbf {Q}}$ $\newcommand {\bR }{\mathbf {R}}$ $\newcommand {\bU }{\mathbf {U}}$ $\newcommand {\bW }{\mathbf {W}}$ $\newcommand {\bX }{\mathbf {X}}$ $\newcommand {\bY }{\mathbf {Y}}$ $\newcommand {\bZ }{\mathbf {Z}}$ $\newcommand {\balpha }{\bm {\alpha }}$ $\newcommand {\bth }{{\bm {\theta }}}$ $\newcommand {\bepsilon }{{\bm {\epsilon }}}$ $\newcommand {\bmu }{{\bm {\mu }}}$ $\newcommand {\bphi }{\bm {\phi }}$ $\newcommand {\bOne }{\mathbf {1}}$ $\newcommand {\bZero }{\mathbf {0}}$ $\newcommand {\btx }{\tilde {\bx }}$ $\newcommand {\loss }{\mathcal {L}}$ $\newcommand {\appropto }{\mathrel {\vcenter { \offinterlineskip \halign {\hfil $##$\cr \propto \cr \noalign {\kern 2pt}\sim \cr \noalign {\kern -2pt}}}}}$ $\newcommand {\SSE }{\mathrm {SSE}}$ $\newcommand {\MSE }{\mathrm {MSE}}$ $\newcommand {\RMSE }{\mathrm {RMSE}}$ $\newcommand {\toprule }[1][]{\hline }$ $\let \midrule \toprule $ $\let \bottomrule \toprule $ $\def \LWRbooktabscmidruleparen (#1)#2{}$ $\newcommand {\LWRbooktabscmidrulenoparen }[1]{}$ $\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }$ $\newcommand {\morecmidrules }{}$ $\newcommand {\specialrule }[3]{\hline }$ $\newcommand {\addlinespace }[1][]{}$ $\newcommand {\LWRsubmultirow }[2][]{#2}$ $\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }$ $\newcommand {\multirow }[2][]{\LWRmultirow }$ $\newcommand {\mrowcell }{}$ $\newcommand {\mcolrowcell }{}$ $\newcommand {\STneed }[1]{}$ $\newcommand {\tcbset }[1]{}$ $\newcommand {\tcbsetforeverylayer }[1]{}$ $\newcommand {\tcbox }[2][]{\boxed {\text {#2}}}$ $\newcommand {\tcboxfit }[2][]{\boxed {#2}}$ $\newcommand {\tcblower }{}$ $\newcommand {\tcbline }{}$ $\newcommand {\tcbtitle }{}$ $\newcommand {\tcbsubtitle [2][]{\mathrm {#2}}}$ $\newcommand {\tcboxmath }[2][]{\boxed {#2}}$ $\newcommand {\tcbhighmath }[2][]{\boxed {#2}}$

7 Learning Systems

7.1 Basic Workflow

The basic ML/DL workflow is presented in Fig. 7.1. The workflow parts are:

• Goal definition
• Data: available data
• Pre-processing: preliminary dataset exploration and validation of dataset integrity (e.g., same physical units for all values of the same feature).
• Model: basic assumptions about the hidden pattern within the data
• Model training: minimization of the loss functions to derive the most appropriate parameters.
• Hyper-parameter optimization: tuning the model sub-type.
• Performance assessment according predefined metrics.

Baseline The basic end-to-end workflow implementation is called baseline.

7.1.1 Goal definition

Typical related goals:

• Prediction or regression, $\by $ is quantitative (Fig. 7.2a).
• Classification, $\by $ is categorical (Fig. 7.2b).
• Clustering, no $\by $ is provided - it is learned from dataset (Fig. 7.2c).
• Semi-supervised learning, combination of labeled and unlabeled data (Fig. 7.2d).
• Anomaly detection, somewhere between classification and clustering (Fig. 7.2e).
• Segmentation
• Simulation
• Signal processing tasks: noise removal, smoothing (filling missing values), event/condition detection.

Note, there is possible to have two or more goals for the same dataset.

7.1.2 Model

We assume that there is an underling problem (e.g., regression and classification) formulation is of the form

\begin{equation} y = h(\bx ) + \epsilon \end{equation}

where $h(\bx )$ is the true unknown function and $\epsilon $ is some irreducible noise. Sometimes, zero-mean noise is assumed. The values of $\bx $ (scalar or vector) and $y$ are known (it is the dataset).

The goal is to find the function $f(\cdot ;\bw )$ that approximates $h(\bx )$. The way to define the $f(\cdot ;\bw )$ is termed model that depends on some model parameters vector $\bw $. The process of finding parameters $\bw $ is called learning, such as the resulting model can provides predictions

\begin{equation} \hat {y}_0 = f(\bx _0;\bw ) \end{equation}

for some new data $\bx _0$.

Parameters vs hyper-parameters

There is a conceptual difference between parameters and hyper-parameters.

Model parameters: Model parameters are learned directly from a dataset.

Hyper-parameters: Model parameters that are not learned directly from a dataset are called hyper-parameters. They are learned in in-direct way during cross-validation process in the follow.

Hyper-parameter optimization Selecting the most appropriate hyper-parameters value is called hyper-parameter optimization.

Parametric vs non-parametric models

There are two main classes of models: parametric and non-parametric, summarized in Table 7.1.

Table 7.1: Comparison of parametric and non-parametric models.

.
Aspect	Parametric	Non-parametric
Dependence on number of parameters on dataset size	Fixed	Flexible
Interpretability	Yes	No
Underlying data assumptions	Yes	No
Risk	Underfitting due to rigid structure	Overfitting due to high flexibility
Dataset size	Smaller	Best for larger
Complexity	Often fast	Often complex
Examples	Linear regression	k-NN, trees

The modern trend is to bridge the gap between interpretable and non-parameteric modeling.

7.1.3 Loss Function

Loss (or cost) function is a function that relates between dataset outputs $\by $ and model outputs $\hat {\by }$. The parameters $\bw $ are minimum of that function,

\begin{equation} \hat \bw = \argmin _{\bw }\loss (\by ,\hat {\by }) \end{equation}

The minimization of the loss function is also termed training.

Loss Function Minimization

Goal: Minimum of the loss function for a given model.

Closed-form solution A closed-form solution for $\bw $ is a solution that is based on basic mathematical functions. For example, a "normal equation" is a solution for linear regression/classification.

Local-minimum gradient-based iterative algorithms This family of algorithms is applicable only for convex (preferably strictly convex) loss functions. For example, gradient descent (GD) and its modifications (e.g., stochastic GD) are used to evaluate NN parameters. Another example is the Newton-Raphson algorithm.

• Some advanced algorithms under this category also employ (require) second-order derivative $\frac {\partial ^2 }{\partial \bw }\loss $ for faster convergence.
• If either derivative is not available as a closed-form expression, it is evaluated numerically.

Global optimizers The goal of global optimizers is to find a global minimum of non-convex function. These algorithms may be gradient-free, first-derivative or second-derivative. The complexity of these algorithms is significantly higher than the local optimizer and can be prohibitive for more than a few hundred variables in $\bX $.

7.1.4 Metrics

Metrics are quantitative performance indicator $\loss (\by ,\hat {\by })$ of the model that relate between $\by $ and $\hat {\by }$. Sometimes, the minimum of the loss function is also a metric, e.g. mean squared error (MSE).

7.2 Project Steps

A widely adopted framework for ML/DL projects is the Cross-Industry Standard Process for Data Mining (CRISP-DM), which defines six iterative phases:

1. Business understanding – define the problem, success criteria, and project plan.
2. Data understanding – collect initial data and perform exploratory analysis.
3. Data preparation – clean, transform, and construct the final dataset.
4. Modeling – select and train candidate models.
5. Evaluation – assess model performance against business objectives.
6. Deployment – integrate the model into production.

The process is iterative: findings in any phase may require revisiting earlier phases.

Practical guidelines

• Understanding the problem: domain knowledge is essential.
- – Define performance metric and performance goal.
- – Evaluate human-level performance, if relevant.
• Data collection and preparation: typically the most time-consuming step.
- – Ensure representative distribution.
- – Identify outliers.
- – Compute basic statistics and create visualizations.
- – Verify sufficient dataset size.
• Model engineering:
- – Start with a standard, pre-implemented model.
- – Avoid overly complex models for the baseline.
• Baseline implementation:
- – Build an end-to-end pipeline: data $\rightarrow $ model $\rightarrow $ loss $\rightarrow $ metrics.
- – Debug and verify a sufficient level of performance.
• Performance evaluation:
- – Identify overfitting and underfitting (bias/variance analysis).
- – Investigate errors and validate dataset integrity.
• Performance improvement:
- – Hyper-parameter optimization.
- – Iteratively refine: revisit the problem, add data, or improve the model until sufficient performance is achieved.
• Model deployment:
- – Deploy the model for the target application.

7.3 MLOps

Goal: Automate the ML lifecycle from development to production.

ML projects differ from traditional software: outputs are probabilistic, development is experimental, and up to 80% of effort may go into data preparation. ML operations (MLOps) extends software development operations (DevOps) principles to address these challenges, bridging model development and production deployment (Fig. 7.3).

CI/CD Continuous Integration (CI) automatically builds and tests code changes upon each commit. Continuous Delivery (CD) extends CI by automating the release of validated changes to production. In MLOps, CI/CD pipelines additionally handle data validation, model training, and model deployment.

Key practices

• Experiment tracking: logging parameters, metrics, and artifacts for reproducibility.
• Data and model versioning: tracking dataset and model changes over time.
• Automated testing: data validation and model validation before deployment.
• ML pipeline automation: automating the training workflow via CI/CD pipelines.
• Model serving and inference: deploying trained models for real-time or batch predictions.
• Performance monitoring: tracking model accuracy, data drift, and concept drift in production.

Concept drift The data distribution may shift over time, a phenomenon known as concept drift. This is especially relevant for time-series models, where the underlying process may evolve. Monitoring for drift and triggering model retraining are essential for maintaining prediction quality.

MLOps maturity levels

• Level 0 – Manual: all steps (data preparation, training, deployment) are performed manually.
• Level 1 – ML pipeline automation: model training and validation are automated via pipelines.
• Level 2 – CI/CD automation: automated testing, deployment, and monitoring close the feedback loop.