master_Thesis / Data /1_Writing /2_Task /1_0_CNMC_Data.qmd
JavedA's picture
init
a67ae61
raw
history blame
13.7 kB
<!--%------------------------------- SHIFT FROM INTRODUCTION ------------------------>
<!-- % ===================================================================== -->
<!-- % ============= Workflow ============================================== -->
<!-- % ===================================================================== -->
## CNMc's data and workflow {#sec-sec_2_1_Workflow}
In this section, the 5 main points that characterize \gls{cnmc} will be discussed.
Before diving directly into \gls{cnmc}'s workflow some remarks
are important to be made.
First, \gls{cnmc} is written from scratch, it is not simply an updated version of the described *first CNMc* in subsection
[-@sec-subsec_1_1_3_first_CNMc].
Therefore, the workflow described in this section for \gls{cnmc} will not match that of *first CNMc*, e.g., *first CNMc* had no concept of *settings.py* and it was not utilizing *Plotly* [@plotly] to facilitate post-processing capabilities.
The reasons for a fresh start were given in subsection [-@sec-subsec_1_1_3_first_CNMc].
However, the difficulty of running *first CNMc* and the time required to adjust *first CNMc* such that a generic dynamic system could be utilized were considered more time-consuming than starting from zero. \newline
Second, the reader is reminded to have the following in mind.
Although it is called pipeline or workflow, \gls{cnmc} is not obliged to run the whole workflow. With *settings.py* file, which will be explained below, it is possible to run only specific selected tasks.
The very broad concept of \gls{cnmc} was already provided at the beginning of chapter [-@sec-chap_1_Intro].
However, instead of providing data of dynamical systems for different model parameter values, the user defines a so-called *settings.py* file and executes \gls{cnmc}.
The outcome of \gls{cnmc} consists, very broadly, of the predicted trajectories and some accuracy measurements as depicted in figure
@fig-fig_1_CNMC_Workflow .
In the following, a more in-depth view shall be given.\newline
The extension of *settings.py* is a regular *Python* file. However, it is a dictionary, thus there is no need to acquire and have specific knowledge about *Python*.
The syntax of *Python's* dictionary is quite similar to that of the *JSON* dictionary, in that the setting name is supplied within a quote mark
and the argument is stated after a colon. In order to understand the main points of \gls{cnmc}, its main data and workflow are depicted @fig-fig_3_Workflow as an XDSM diagram [@Lambe2012].
\newline
<!-- % ============================================-->
<!-- % ================ 2nd Workflow ==============-->
<!-- % ============================================-->
<!-- NOTE Sideway figure -->
![\gls{cnmc} general workflow overview](../../3_Figs_Pyth/2_Task/0_Workflow.svg){#fig-fig_3_Workflow}
The first action for executing \gls{cnmc} is to define *settings.py*. It contains descriptive information about the entire pipeline, e.g., which dynamical system to use, which model parameters to select for training, which for testing, which method to use for modal decomposition and mode regression.
To be precise, it contains all the configuration attributes of all the 5 main \gls{cnmc} steps and some other handy extra functions. It is written in
a very clear way such that settings to the corresponding stages of \gls{cnmc}
and the extra features can be distinguished at first glance.
First, there are separate dictionaries for each of the 5 steps to ensure that the desired settings are made where they are needed.
Second, instead of regular line breaks, multiline comment blocks with the stage names in the center are used.
Third, almost every *settings.py* attribute is explained with comments.
Fourth, there are some cases, where
a specific attribute needs to be reused in other steps.
The user is not required to adapt it manually for all its occurrences, but rather to change it only on the first occasion, where the considered function is defined.
*Python* will automatically ensure that all remaining steps receive the change correctly.
Other capabilities implemented in *settings.py* are mentioned when they are actively exploited.
In figure @fig-fig_3_Workflow it can be observed that after passing *settings.py* a so-called *Informer* and a log file are obtained.
The *Informer* is a file, which is designed to save all user-defined settings in *settings.py* for each execution of \gls{cnmc}.
Also, here the usability and readability of the output are important and have been formatted accordingly. It proves to be particularly useful when a dynamic system with different settings is to be calculated, e.g., to observe the influence of one or multiple parameters. \newline
One of the important attributes which
can be arbitrarily defined by the user in *settings.py* and thus re-found in the *Informer* is the name of the model.
In \gls{cnmc} multiple dynamical systems are implemented, which can be chosen by simply changing one attribute in *settings.py*.
Different models could be calculated with the same settings, thus this clear and fast possibility to distinguish between multiple calculations is required.
The name of the model is not only be saved in the *Informer* but it will
be used to generate a folder, where all of \gls{cnmc} output for this single
\gls{cnmc} workflow will be stored.
The latter should contribute to on the one hand that the \gls{cnmc} models can be easily distinguished from each other and on the other hand that all results of one model are obtained in a structured way.
\newline
When executing \gls{cnmc} many terminal outputs are displayed. This allows the user to be kept up to date on the current progress on the one hand and to see important results directly on the other.
In case of unsatisfying results, \gls{cnmc} could be aborted immediately, instead of having to compute the entire workflow. In other words, if a computation expensive \gls{cnmc} task shall be performed, knowing about possible issues in the first steps can
be regarded as a time-saving mechanism.
The terminal outputs are formatted to include the date, time, type of message, the message itself and the place in the code where the message can be found.
The terminal outputs are colored depending on the type of the message, e.g., green is used for successful computations.
Colored terminal outputs are applied for the sake of readability.
More relevant outputs can easily be distinguished from others.
The log file can be considered as a memory since, in it, the terminal outputs are saved.\newline
The stored terminal outputs are in the format as the terminal output described above, except that no coloring is utilized.
An instance, where the log file can be very helpful is the following. Some implemented quality measurements give very significant information about prediction reliability. Comparing different settings in terms of prediction capability would become very challenging if the terminal outputs would be lost whenever the \gls{cnmc} terminal is closed. The described *Informer* and the log file can be beneficial as explained, nevertheless, they are optional.
That is, both come as two of the extra features mentioned above and can be turned off in *settings.py*.\newline
Once *settings.py* is defined, \gls{cnmc} will filter the provided input, adapt the settings if required and send the corresponding parts to their respective steps.
The sending of the correct settings is depicted in figure @fig-fig_3_Workflow, where the abbreviation *st* stands for settings.
The second abbreviation *SOP* is found for all 5 stages and denotes storing output and plots. All the outcome is stored in a compressed form such that memory can be saved. All the plots are saved as HTML files. There are many reasons to do so, however, to state the most crucial ones. First, the HTML file can be opened on any operating system.
In other words, it does not matter if Windows, Linux or Mac is used.
Second, the big difference to an image is that HTML files can be upgraded with, e.g., CSS, JavaScript and PHP functions.
Each received HTML plot is equipped with some post-processing features, e.g., zooming, panning and taking screenshots of the modified view. When zooming in or out the axes labels are adapted accordingly. Depending on the position of
the cursor, a panel with the exact coordinates of one point and other information such as the $\beta$ are made visible. \newline
In the same way that data is stored in a compressed format, all HTML files are generated in such a way that additional resources are not written directly into the HTML file, but a link is used so that the required content is obtained via the Internet.
Other features associated with HTML plots and which data are saved will be explained in their respective section in this chapter.
The purpose of \gls{cnmc} is to generate a surrogate model with which predictions can be made for unknown model parameter values ${\beta}$.
For a revision on important terminology as model parameter value $\beta$
the reader is referred to subsection [-@sec-subsec_1_1_1_Principles].
Usually, in order to obtain a sound predictive model, machine learning methods require a considerable amount of data. Therefore, the \gls{ode} is solved for a set of $\vec{\beta }$. An in-depth explanation for the first is provided in
section [-@sec-sec_2_2_Data_Gen].
The next step is to cluster all the received trajectories deploying kmeans++ [@Arthur2006]. Once this has been done, tracking can take be performed.
Here the objective is to keep track of the positions of all the centroids when $\beta$ is changed over the whole range of $\vec{\beta }$.
A more detailed description is given in section [-@sec-sec_2_3_Tracking].\newline
The modeling step is divided into two subtasks, which are not displayed as such in figure @fig-fig_3_Workflow . The first subtask aims to get a model that yields all positions of all the $K$ centroids for an unseen $\beta_{unseen}$, where an unseen $\beta_{unseen}$ is any $\beta$ that was not used to train the model. In the second subtask, multiple tasks are performed.
First, the regular \gls{cnm} [@Fernex2021] shall be applied to all the tracked clusters from the tracking step. For this purpose, the format of the tracked results is adapted in a way such that \gls{cnm} can be executed without having to modify \gls{cnm} itself. By running \gls{cnm} on the tracked data of all $\vec{\beta }$, the transition property tensors $\boldsymbol Q$ and $\boldsymbol T$ for all $\vec{\beta }$ are received. \newline
Second, all the $\boldsymbol Q$ and the $\boldsymbol T$ tensors are stacked to form $\boldsymbol {Q_{stacked}}$ and $\boldsymbol {T_{stacked}}$ matrices.
These stacked matrices are subsequently supplied to one of the two possible implemented modal decomposition methods.
Third, a regression model for the obtained modes is constructed.
Clarifications on the modeling stage can be found in section [-@sec-sec_2_4_Modeling].\newline
The final step is to make the actual predictions for all provided $\beta_{unseen}$ and allow the operator to draw conclusions about the trustworthiness of the predictions.
For the trustworthiness, among others, the three quality measurement concepts explained in subsection
[-@sec-subsec_1_1_3_first_CNMc]
are leveraged. Namely, comparing the \gls{cnmc} and \gls{cnm} predicted trajectories by overlaying them directly. The two remaining techniques, which were already applied in regular \gls{cnm} [@Fernex2021], are the \glsfirst{cpd} and the autocorrelation.\newline
The data and workflow in figure @fig-fig_3_Workflow do not reveal one additional feature of the implementation of \gls{cnmc}. That is, inside the folder *Inputs* multiple subfolders containing a *settings.py* file, e.g., different dynamical systems, can be inserted to allow a sequential run. In the case of an empty subfolder, \gls{cnmc} will inform the user about that and continue its execution without an error.
As explained above, each model will have its own folder where the entire output will be stored.
To switch between the multiple and a single *settings.py* version, the *settings.py* file outside the *Inputs* folder needs to be modified. The argument for that is *multiple\_Settings*.\newline
Finally, one more extra feature shall be mentioned. After having computed expensive models, it is not desired to overwrite the log file or any other output.
To prevent such unwanted events, it is possible to leverage the overwriting attribute in *settings.py*. If overwriting is disabled, \gls{cnmc} would verify whether a folder with the specified model name already exists.
In the positive case, \gls{cnmc} would initially only propose an alternative model name. Only if the suggested model name would not overwrite any existing folders, the suggestion will be accepted as the new model name.
Both, whether the model name was chosen in *settings.py* as well the new final replaced model name is going to be printed out in the terminal line.\newline
In summary, the data and workflow of \gls{cnmc} are shown in Figure @fig-fig_3_Workflow and are sufficient for a broad understanding of the main steps.
However, each of the 5 steps can be invoked individually, without having to run the full pipeline. Through the implementation of *settings.py* \gls{cnmc} is highly flexible. All settings for the steps and the extra features can be managed with *settings.py*.
A log file containing all terminal outputs as well a summary of chosen settings is stored in a separate file called *Informer* are part of \gls{cnmc}'s tools.