# Methodology {#sec-chap_2_Methodology} In this chapter, the entire pipeline for designing the proposed \gls{cnmc} is elaborated. For this purpose, the ideas behind the individual processes are explained. Results from the step tracking onwards will be presented in chapter [-@sec-ch_3]. Having said that, \gls{cnmc} consists of multiple main process steps or stages. First, a broad overview of the \gls{cnmc}'s workflow shall be given. Followed by a detailed explanation for each major operational step. The implemented process stages are presented in the same order as they are executed in \gls{cnmc}. However, \gls{cnmc} is not forced to go through each stage. If the output of some steps is already available, the execution of the respective steps can be skipped. \newline The main idea behind such an implementation is to prevent computing the same task multiple times. Computational time can be reduced if the output of some \gls{cnmc} steps are available. Consequently, it allows users to be flexible in their explorations. It could be the case that only one step of \textsc{CNMc} is desired to be examined with different settings or even with newly implemented functions without running the full \gls{cnmc} pipeline. Let the one \gls{cnmc} step be denoted as C, then it is possible to skip steps A and B if their output is already calculated and thus available. Also, the upcoming steps can be skipped or activated depending on the need for their respective outcomes. Simply put, the mentioned flexibility enables to load data for A and B and execute only C. Executing follow-up steps or loading their data is also made selectable. Since the tasks of this thesis required much coding, it is important to mention the used programming language and the dependencies. As for the programming language, *Python 3* [@VanRossum2009] was chosen. For the libraries, only a few important libraries will be mentioned, because the number of used libraries is high. Note, each used module is freely available on the net and no licenses are required to be purchased. \newline The important libraries in terms of performing actual calculations are *NumPy* [@harris2020array], *SciPy* [@2020SciPy-NMeth], *Scikit-learn* [@scikit-learn], *pySindy* [@Silva2020; @Kaptanoglu2022], for multi-dimensional sparse matrix management *sparse* and for plotting only *plotly* [@plotly] was deployed. One of the reason why *plotly* is preferred over *Matplotlib* [@Hunter:2007] are post-processing capabilities, which now a re available. Note, the previous *\gls{cmm*c} version used *Matplotlib* [@Hunter:2007], which in this work has been fully replaced by *plotly* [@plotly]. More reasons why this modification is useful and new implemented post-processing capabilities will be given in the upcoming sections.\newline For local coding, the author's Linux-Mint-based laptop with the following hardware was deployed: CPU: Intel Core i7-4702MQ \gls{cpu}@ 2.20GHz × 4, RAM: 16GB. The Institute of fluid dynamics of the Technische Universität Braunschweig also supported this work by providing two more powerful computation resources. The hardware specification will not be mentioned, due to the fact, that all computations and results elaborated in this thesis can be obtained by the hardware described above (authors laptop). However, the two provided resources shall be mentioned and explained if \gls{cnmc} benefits from faster computers. The first bigger machine is called *Buran*, it is a powerful Linux-based working station and access to it is directly provided by the chair of fluid dynamics. \newline The second resource is the high-performance computer or cluster available across the Technische Universität Braunschweig *Phoenix*. The first step, where the dynamical systems are solved through an \gls{ode} solver is written in a parallel manner. This step can if specified in the *settings.py* file, be performed in parallel and thus benefits from multiple available cores. However, most implemented \gls{ode}s are solved within a few seconds. There are also some dynamical systems implemented whose ODE solution can take a few minutes. Applying \gls{cnmc} on latter dynamical systems results in solving their \gls{ode}s for multiple different model parameter values. Thus, deploying the parallelization can be advised in the latter mentioned time-consuming \gls{ode}s.\newline By far the most time-intensive part of the improved \gls{cnmc} is the clustering step. The main computation for this step is done with *Scikit-learn* [@scikit-learn]. It is heavily parallelized and the computation time can be reduced drastically when multiple threads are available. Other than that, *NumPy* and *SciPy* are well-optimized libraries and are assumed to benefit from powerful computers. In summary, it shall be stated that a powerful machine is for sure advised when multiple dynamical systems with a range of different settings shall be investigated since parallelization is available. Yet executing \gls{cnmc} on a single dynamical system, a regular laptop can be regarded as a sufficient tool.