\section{Modeling} \label{sec_2_4_Modeling} In this section, the fourth main step of \gls{cnmc}, i.e., modeling, is elaborated. The data and workflow is described in figure \ref{fig_42}. It comprises two main sub-tasks, which are modeling the \glsfirst{cpevol} and modeling the transition properties tensors $\bm Q / \bm T$. The settings are as usually defined in \emph{settings.py} and the extracted attributes are distributed to the sub-tasks. Modeling the \gls{cpevol} and the $\bm Q/ \bm T$ tensors can be executed separately from each other. If the output of one of the two modeling sub-steps is at hand, \gls{cnmc} is not forced to recalculate both modeling sub-steps. Since the tracked states are used as training data to run the modeling they are prerequisites for both modeling parts. The modeling of the centroid position shall be explained in the upcoming subsection \ref{subsec_2_4_1_CPE}, followed by the explanation of the transition properties in subsection \ref{subsec_2_4_2_QT}. A comparison between this \gls{cnmc} and the \emph{first CNMc} version is provided at the end of the respective subsections. The results of both modeling steps can be found in section \ref{sec_3_2_MOD_CPE} and \ref{sec_3_3_SVD_NMF} \begin{figure} [!h] \hspace*{-4cm} \resizebox{1.2\textwidth}{!}{ \input{2_Figures/2_Task/2_Modeling/0_Modeling.tikz} } \caption{Data and workflow of the fourth step: Modeling} \label{fig_42} \end{figure} \subsection{Modeling the centroid position evolution} \label{subsec_2_4_1_CPE} In this subsection, the modeling of the \gls{cpevol} is described. The objective is to find a surrogate model, which returns all $K$ centroid positions for an unseen $\beta_{unseen}$. The training data for this are the tracked centroids from the previous step, which is described in section \ref{sec_2_3_Tracking}. To explain the modeling of the \emph{CPE}, figure \ref{fig_43} shall be inspected. The model parameter values which shall be used to train the model $\vec{\beta_{tr}}$ are used for generating a so-called candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$. The candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is obtained making use of a function of \emph{pySindy} \cite{Silva2020,Kaptanoglu2022,Brunton2016}. In \cite{Brunton2016} the term $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is explained well. However, in brief terms, it allows the construction of a matrix, which comprises the output of defined functions. These functions could be, e.g., a linear, polynomial, trigonometrical or any other non-linear function. Made-up functions that include logical conditions can also be applied. \newline \begin{figure} [!h] \hspace*{-4cm} \resizebox{1.2\textwidth}{!}{ \input{2_Figures/2_Task/2_Modeling/1_Pos_Mod.tikz} } \caption{Data and workflow of modeling the \glsfirst{cpevol}} \label{fig_43} \end{figure} Since, the goal is not to explain, how to operate \emph{pySindy} \cite{Brunton2016}, the curious reader is referred to the \emph{pySindy} very extensive online documentation and \cite{Silva2020,Kaptanoglu2022}. Nevertheless, to understand $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ equation \eqref{eq_20} shall be considered. In this example, 3 different functions, denoted as $f_i$ in the first row, are employed. The remaining rows are the output for the chosen $f_i$. Furthermore, $n$ is the number of samples or the size of $\vec{\beta_{tr} }$, i.e., $n_{\beta,tr} $ and $m$ denotes the number of the features, i.e., the number of the functions $f_i$. \newline \begin{equation} \boldsymbol{\Theta_{exampl(n \times m )}}(\,\vec{\beta_{tr}}) = % \renewcommand\arraystretch{3} \renewcommand\arraycolsep{10pt} \begin{bmatrix} f_1 = \beta & f_2 = \beta^2 & f_2 = cos(\beta)^2 - exp\,\left(\dfrac{\beta}{-0.856} \right) \\[1.5em] 1 & 1^2 & cos(1)^2 - exp\,\left(\dfrac{1}{-0.856} \right) \\[1.5em] 2 & 2^2 & cos(2)^2 - exp\,\left(\dfrac{2}{-0.856} \right) \\[1.5em] \end{bmatrix} \label{eq_20} \end{equation} The actual candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ incorporates a quadratic polynomial, the inverse $ \frac{1}{\beta}$, the exponential $exp(\beta)$ and 3 frequencies of cos and sin, i.e., $cos(\vec{\beta}_{freq}), \ sin(\vec{\beta}_{freq})$, where $\vec{\beta}_{freq} = [1, \, 2,\, 3]$. There are much more complex $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ available in \gls{cnmc}, which can be selected if desired. Nonetheless, the default $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is chosen as described above. Once $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is the generated, the system of equations \eqref{eq_21} is solved. Note, this is very similar to solving the well-known $\bm A \, \vec{x} = \vec{y}$ system of equations. The difference is that the vectors $\vec{x}, \, \vec{y}$ can be vectors in the case of \eqref{eq_21} as well, but in general, they are the matrices $\bm{X} ,\, \bm Y$, respectively. The solution to the matrix $\bm{X}$ is the desired output. It contains the coefficients which assign importance to the used functions $f_i$. The matrix $\bm Y$ contains the targets or the known output for the chosen functions $f_i$. Comparing $\bm A$ and $ \boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ mathematically, no difference exists.\newline \begin{equation} \boldsymbol{\Theta}\,(\vec{\beta_{tr}}) \: \bm X = \bm Y \label{eq_21} \end{equation} With staying in the \emph{pySindy} environment, the system of equations \eqref{eq_21} is solved by means of the optimizer \emph{SR3}, which is implemented in \emph{pySindy}. Details and some advantages of the \emph{SR3} optimizer can be found in \cite{SR3}. Nevertheless, two main points shall be stated. It is highly unlikely that the $\boldsymbol{\Theta}\,(\vec{\beta_{tr}}),\: \bm X,\, \bm Y$ is going to lead to a well-posed problem, i.e., the number of equations are equal to the number of unknowns and having a unique solution. In most cases the configuration will be ill-posed, i.e., the number of equations is not equal to the number of unknowns. In the latter case, two scenarios are possible, the configuration could result in an over-or under-determined system of equations.\newline For an over-determined system, there are more equations than unknowns. Thus, generally, no outcome that satisfies equation \eqref{eq_21} exists. In order to find a representation that comes close to a solution, an error metric is defined as the objective function for optimization. There are a lot of error metrics or norms, however, some commonly used \cite{Brunton2019} are given in equations \eqref{eq_22} to \eqref{eq_24}, where $f(x_k)$ are true values of a function and $y_k$ are their corresponding predictions. The under-determined system has more unknown variables than equations, thus infinitely many solutions exist. To find one prominent solution, again, optimization is performed. Note, for practical application penalization or regularization parameter are exploited as additional constraints within the definition of the optimization problem. For more about over- and under-determined systems as well as for deploying optimization for finding a satisfying result the reader is referred to \cite{Brunton2019}.\newline \begin{equation} E_{\infty} = \max_{1