# Methodology {#sec-chap_2_Methodology}
In this chapter, the entire pipeline for designing the proposed
\gls{cnmc} is elaborated. For this purpose, the ideas behind 
the individual processes are explained. 
Results from the step tracking onwards will be presented in chapter [-@sec-ch_3].

Having said that, \gls{cnmc} consists of multiple main process steps or stages. 
First, a broad overview of the \gls{cnmc}'s workflow shall be given.
Followed by a detailed explanation for each major operational step. The 
implemented process stages are presented in the same order as they are 
executed in \gls{cnmc}. However, \gls{cnmc} is not forced 
to go through each stage. If the output of some steps is 
already available, the execution of the respective steps can be skipped. \newline

The main idea behind such an implementation is to prevent computing the same task multiple times.
Computational time can be reduced if the output of some \gls{cnmc} steps are available. 
Consequently, it allows users to be flexible in their explorations. 
It could be the case that only one step of \textsc{CNMc} is desired to be examined with different settings or even with newly implemented functions without running the full \gls{cnmc} pipeline. 
Let the one \gls{cnmc} step be denoted as C, then it is possible to skip steps A and B if their output is already calculated and thus available.
Also, the upcoming steps can be skipped or activated depending on the need for their respective outcomes. 
Simply put, the mentioned flexibility enables to load data for A and B and execute only C. Executing follow-up steps or loading their data is also made selectable.
<!-- % -->
<!-- %------------------------------- SHIFT FROM INTRODUCTION ---------------------- -->
<!-- % -->
Since the tasks of this thesis required much coding, 
it is important to 
mention the used programming language and the dependencies. 
As for the programming language,
*Python 3* [@VanRossum2009] was chosen. For the libraries, only a few important libraries will be mentioned, because the number of used libraries is high. Note, each used module is 
freely available on the net and no licenses are required to be purchased.
\newline 

The important libraries in terms of performing actual calculations are  
*NumPy* [@harris2020array], *SciPy* [@2020SciPy-NMeth], *Scikit-learn* [@scikit-learn], *pySindy* [@Silva2020; @Kaptanoglu2022], for multi-dimensional sparse matrix management *sparse* and for plotting only *plotly* [@plotly] was deployed. One of the reason why *plotly* is preferred over *Matplotlib* [@Hunter:2007] are post-processing capabilities, which now a re available. Note, the previous *\gls{cmm*c} version used *Matplotlib* [@Hunter:2007], which in this work has been fully replaced by *plotly* [@plotly]. More reasons why this modification is useful and new implemented post-processing capabilities will be given in the upcoming sections.\newline

For local coding, the author's Linux-Mint-based laptop with the following hardware was deployed: CPU: Intel Core i7-4702MQ \gls{cpu}@ 2.20GHz × 4, RAM: 16GB.
The Institute of fluid dynamics of the Technische Universität Braunschweig 
also supported this work by providing two more powerful computation resources.
The hardware specification will not be mentioned, due to the fact, that all computations and results elaborated in this thesis can be obtained by 
the hardware described above (authors laptop). However, the two provided 
resources shall be mentioned and explained if \gls{cnmc} benefits from 
faster computers. The first bigger machine is called *Buran*, it is a 
powerful Linux-based working station and access to it is directly provided by 
the chair of fluid dynamics. \newline 

The second resource is the high-performance 
computer or cluster available across the Technische Universität Braunschweig 
*Phoenix*. The first step, where the dynamical systems are solved through an \gls{ode} solver 
is written in a parallel manner. This step can if specified in the *settings.py* file, be performed in parallel and thus benefits from 
multiple available cores. However, most implemented \gls{ode}s are solved within 
a few seconds. There are also some dynamical systems implemented whose
ODE solution can take a few minutes. Applying \gls{cnmc} on latter dynamical 
systems results in solving their \gls{ode}s for multiple different model parameter values. Thus, deploying the parallelization can be advised in the latter mentioned time-consuming \gls{ode}s.\newline 

By far the most time-intensive part of the improved \gls{cnmc} is the clustering step. The main computation for this step is done with
*Scikit-learn* [@scikit-learn]. It is heavily parallelized and the 
computation time can be reduced drastically when multiple threads are available.
Other than that, *NumPy* and *SciPy* are well-optimized libraries and 
are assumed to benefit from powerful computers. In summary, it shall be stated that a powerful machine is for sure advised when multiple dynamical 
systems with a range of different settings shall be investigated since parallelization is available. Yet executing \gls{cnmc} on a single dynamical system, a regular laptop can be regarded as 
a sufficient tool.