-
Notifications
You must be signed in to change notification settings - Fork 0
/
evaluation.tex
247 lines (188 loc) · 24.3 KB
/
evaluation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
%!TEX root = ./BPM17.tex
\section{Evaluation} % (fold)
\label{sec:evaluation}
%%%% CONTENT
In this section, we provide an evaluation of our predictive business process monitoring techniques based on a-priori knowledge. In particular, we compare the proposed approaches and the most performant state-of-the-art approach~\cite{niek96732} and check: (i) whether the \nocycle algorithm leveraging knowledge about the structure of the process execution traces (and in particular about the presence of cycles) actually improves the predictions; and (ii) whether the \protrack algorithm is able to leverage a-priori knowledge to improve the performance of the LSTM model.
\subsection{Event Logs}
\label{ssec:datasets}
%presentation of <datasets, constraints>
\begin{table}[t]
\centering
\begin{scriptsize}
\begin{tabular}{l|c|c|c|c|c}
\toprule
% & & & %\multicolumn{3}{c}{\textbf{Average}} \\ \cmidrule{5-7}
\textbf{Log} &\textbf{\#Tr.} & \textbf{\#Act.} & \textbf{avg-TL} & \textbf{avg-CR} & \textbf{Spars.}\\
\midrule
EnvLog & 937 & 381 & 41 & 0.14 & 0.4066 \\
HelpDesk & 3804 & 9 & 3.6 & 0.22 & 0.0024 \\
BPIC11 & 911 & 424 & 54 & 5.05 & 0.4654 \\
BPIC12 & 9\,658 & 6 & 7.5 & 1.35 & 0.0006 \\
BPIC13 & 7\,554 & 13 & 7 & 1.45 & 0.0017 \\
BPIC17 & 31\,508 & 26 & 18 & 0.46 & 0.0008 \\
\bottomrule
\end{tabular}
\end{scriptsize}
\caption{The event logs}
\label{table:dataset}
\end{table}
For the evaluation of the techniques, we used six real-life event logs. Four of them were provided for the BPI Challenges (BPIC) 2011~\cite{bpichallenge2011}, 2012~\cite{bpichallenge2012}, 2013~\cite{bpichallenge2013}, and 2017~\cite{bpichallenge2017}, respectively. We also used two additional event logs, one pertaining to an environmental permit application process (``WABO''), used in the context of the CoSeLoG project~\cite{EnvironmentalLog} (\emph{EnvLog} for short in this paper), and another containing cases from a ticketing management process of the help desk of an Italian software company (\emph{Helpdesk}\footnote{\url{https://data.mendeley.com/datasets/39bp3vv62t/1}} for short). The logs containing less than 2 activities were filtered out, as they do not contribute to the evaluation. We briefly describe datasets in the following.
\begin{enumerate}
\item \textbf{Environment permit log} (\emph{EnvLog}) contains data from a Dutch municipality\footnote{\url{https://data.4tu.nl/repository/uuid:e8c3a53d-5301-4afb-9bcd-38e74171ca32}}. The cases are the application process of the environment permits.
\item \textbf{Helpdesk} is a dataset containing events from a ticketing management process of the help desk of an Italian software company. The process consists of 9 activities, and all cases start with the insertion of a new ticket into the ticketing management system. Each case ends when the issue is resolved and the ticket is closed.
\item \textbf{BPIC11 log} is a real-life log taken from Dutch Academic Hospital. The cases of the log correspond to the Gynaecology department of the hospital. Phases (traces) consist of the diagnosis and treatment of a patient. Some attributes are repeated as the procedures could take place few times. As the log contains sensitive information it was anonymized.
\item \textbf{BPIC12 log\footnote{\url{http://data.4tu.nl/repository/uuid:3926db30-f712-4394-aebc-75976070e91f}.}} This log is part of the BPI 2012 challenge, a real-life log, taken from a Dutch Financial Institute. The process of the event log is an application process for a personal loan or overdraft within a global financing organization.
To enable comparison with state-of-the-art, the same filtering procedure as described~\cite{niek96732,breuker2016comprehensible,evermann} is applied to the log.
\item \textbf{BPIC13 log\footnote{\url{http://www.win.tue.nl/bpi/doku.php?id=2013:challenge}}} is a log by Volvo IT in Belgium. The log consists of an incident and problem management from the Volvo "VINST" system.
\item \textbf{BPIC17 log\footnote{\url{http://data.4tu.nl/repository/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b}}} is a log of a process taken from Dutch financial institute. In particular, the process we used is related to a loan application.
\end{enumerate}
The characteristics of these logs are summarized in Table~\ref{table:dataset}. For each log, we report the total number of traces, the number of activity labels (i.e., the size of the activity set of the log), the average trace length (avg-TL), the average number of repetitions of all cycles in the log (avg-CR), and the ratio between the number of activity labels and the number of traces, indicating the sparsity of the activity labels over the log.
%Concerning the test set we did use 67\% of the whole dataset for training. Then we selected for testing a relevant subset of the remaining 33\% satisfying, for each dataset, the additional (a-priori) knowledge, described in terms of LTL formulas. To maximize the size of the testing set we selected LTL formulas satisfied by a conspicuous number of traces among the remaining 33\% ones.
%For the a-priori knowledge, we divided the event logs in training and testing set and we selected LTL formulas satisfied by a conspicuous number of traces in the testing set. For each event log we selected a conjunction of \textit{strong} and of \textit{weak} formulas, which respectively strongly and weakly constrain the trace. For instance a formula LTL of the type $\sometime A$ is weaker than a formula $\always(A \imp \sometime B) \wedge \sometime A$ since the latter imposes the occurrence of both $A$ and $B$ and of a temporal relationships between them. The schematic form of the formulas used in the evaluation is reported in Table~\ref{table:apriori}, where - for the sake of readability - we replace the original activity names with propositional atoms.
% reported in Table~\ref{table:apriori} by means of LTL (linear time temporal logic) formulas.
%experiment setup table
Having a look at table~\ref{table:dataset}, one can see the difference in characteristics of the logs. Environment Permit and BPIC11 are logs with a very long traces. They also have a large event alphabet size (299 and 355 respectively). Moreover BPIC15 has a big number of cycles in the log (5.05 per trace). Due to these characteristics, we expect these logs to have inferior results, as the in sparse solution space, machine learning algorithms generally have poor performance. BPIC12, BPIC13, BPIC17, and HelpDesk logs expectingly will have good performance because they contain lots of traces whilst having small alphabet size. Moreover, the traces BPIC12 and BPIC13 should be most benefited by the \nocycle technique, as they contain a number of cycles per trace.
\subsection{Extraction of LTL rules} \label{extractionLTLrules}
For testing with an a-priori algorithm, we need to establish the knowledge for every log. In order to provide unbiased knowledge and reproducible results, we decided not to handpick the LTL formulas but to establish a strict procedure for finding them instead.
Also, we wanted to test the algorithm with rules that differ in complexity, or "strength". With this in mind we defined 2 conjunctive rules describing a \textit{strong a-priori knowledge} and a \textit{weak a-priori knowledge}, which respectively strongly and weakly constrain the traces. In particular, we discovered rules of type $\sometime A$ (which imposes the occurrence of $A$) for defining the weak a-priori knowledge and rules of type $\always(A \imp \sometime B) \wedge \sometime A$ (which imposes the occurrence of both $A$ and $B$ and that every occurrence of $A$ is followed by an occurrence of $B$) for defining the strong a-priori knowledge.
So we derived the a-priori knowledge on the traces of the testing set as follows:
\begin{enumerate}
\item We randomly selected 10\% of traces of the testing set (in order not to find rules that are 100\% satisfied in the whole testing set).
\item We used the DeclareMiner ProM plug-in~\cite{Maggi2012} to discover LTL rules satisfied in the set chosen at step 1 (Figure~\ref{figure:ltl} shows the LTL extraction from DeclareMiner).
%From each trace, we extracted 4 prefixes of lengths corresponding to the 4 integers in the interval $\left[mid-2,mid+2\right]$, where $mid$ is half of the median of the trace lengths (for example median of trace length of BPIC11 is 31, so we take prefixes of lengths 13,14,15, and 16). We derived all suffixes of these prefixes and we used the DeclareMiner ProM plug-in~\cite{Maggi2012} to discover LTL rules satisfied in all those suffixes (Figure~\ref{figure:ltl} shows the LTL extraction from DeclareMiner).
\subitem For the weak a-priori knowledge, we randomly selected from the discovered rules one, two or three~\footnote{Depending on the sparsity of the log (see table~\ref{table:dataset}), as a general rule if the sparsity is little (less than 0.001) we choose one rule, if sparsity is very high (more than 0.45) we choose three rules. These parameters were determined by experimentation, allowing formulas to be satisfied on around 50\% of the testing set} rules of type $\sometime A$ and we composed them into a single conjunctive formula.
\subitem For the strong a-priori knowledge, we randomly selected one, two or three rules from the discovered rules of type $\always(A \imp \sometime B) \wedge \sometime A$ and we composed them into a single conjunctive formula.
The schematic form of the rules used in the evaluation is reported in Table~\ref{table:apriori}, where - for the sake of readability - we replace the original activity names with single characters.
\item Starting from strong and weak a-priori knowledge, we built a \emph{strong a-priori testing set} and a \emph{weak a-priori testing set}, respectively composed of the subsets of traces of the testing set that satisfy strong and weak a-priori knowledge.
\end{enumerate}
\subsection{Experimental Procedure}
\label{ssec:procedure}
%presentation of baseline + metrics
In order to evaluate the techniques presented in the paper, we adopted the following procedure. For each event log:
\begin{enumerate}
% \item We divided all our event logs in three parts: a \textbf{training set} composed of the 67\% of the whole event log and two \textbf{testing sets}, a so-called strong-apriori testing set and a so-called weak-apriori testing set, respectively composed of the subsets of the remaining 33\% which satisfy the strong and weak a-priori knowledge reported in Table~\ref{table:apriori}\footnote{Note that strong-apriori and weak-apriori are not necessarily disjoint.}.
\item We divided the event log into two parts: a \textbf{training set} composed of the 67\% of the whole event log and a \textbf{testing set}, composed of the remaining 33\%.
\item The we extract LTL rules as described in section~\ref{extractionLTLrules}
\begin{figure}[!ht]
\begin{center}
\includegraphics[height=15cm]{3_ltl_prom.png}
\caption{Declare Miner in action}
\label{figure:ltl}
\end{center}
\end{figure}
\item we compared \nocycle and \protrack \footnote{We set $bSize$ to $3$ and, for the coefficient in charge of weakening the probabilities of activities in a cycle, we used the exponential formula ($e^{j}$, where $j$ is the number of cycle repetitions).} against a baseline provided by the technique presented in \cite{niek96732}. For each technique, we computed: (i) the length of the predicted suffixes; and (ii) their similarity with the prediction ground truth measured using the Damerau-Levenshtein similarity~\cite{Damerau:1964:TCD:363958.363994}.
\end{enumerate}
\begin{table}[t!]
\centering
\begin{scriptsize}
\begin{tabular}{l|c|c}
\toprule
\textbf{Log} & \textbf{A-priori Strong} & \textbf{A-priori Weak} \\
\midrule
EnvLog & $\always(a \imp \sometime b) \wedge \sometime a \wedge \always(c \imp \sometime d) \wedge \sometime c$ & $\sometime a \wedge \sometime c$ \\
HelpDesk & $\always(e \imp \sometime f) \wedge \sometime e$ & $\sometime e$\\
BPIC11 & $\always(g \imp \sometime h) \wedge \sometime g \wedge \always(i \imp \sometime l) \wedge \sometime i \wedge \always(m \imp \sometime n) \wedge \sometime m$ &
$\sometime i \wedge \sometime h \wedge \sometime o$ \\
BPIC12 & $\always(p \imp \sometime q) \wedge \sometime p$ & $\sometime p$ \\
BPIC13 & $\always(r \imp \sometime s) \wedge \sometime r \wedge \always(t \imp \sometime r) \wedge \sometime t$ & $\sometime s \wedge \sometime r$ \\
BPIC17 & $\always(u \imp \sometime v) \wedge \sometime u$ & $\sometime u$ \\
\bottomrule
\end{tabular}
\end{scriptsize}
\caption{The a-priori knowledge.}
\label{table:apriori}
\end{table}
The experiments have been performed interchangeably on both a GPU Tesla K40c and on a conventional laptop with Code i5 CPU. As for the LSTM training settings we used the ones identified by Tax et al.~\cite{niek96732} as it has been shown to be the most performing ones for facing the problem of predicting sequences of future activities.\footnote{We used an architecture characterized by two LSTM layers. The algorithm used is the Adam learning algorithm with categorical cross entropy loss and the dropout coefficient has been set to 0.2.} The time required for training the LSTM neural network is about 2 minutes per epoch using the GPU and 15 minutes using the CPU. The inference time for \nocycle %(not modified version of LSTM model prediction)
is about 0.1-2 seconds per trace (depending on the log), whereas the inference time for \protrack is 4 times higher on average.
%We used two hidden layers for the network: one shared layer for sequence of activities and time feature, and two joint layers for activities and time sequence preceding output layer. The dropout coefficient is 0.2, and the network is equipped Adam learning algorithm. These parameters shown to work best for the problem at hand\cite{niek96732}. We also have 500 maximum number of epoch, and categorical cross entropy loss.
\subsection{Results and Discussion}
\label{ssec:results}
%big table of results + comment the table (comment on logs with cycles, comment that and on why protrack traces are longer than backtrack, ... )
Tables~\ref{table:results-strong} and~\ref{table:results-weak} report, for each event log, the performances of the two techniques we propose on the strong a-priori and weak a-priori testing sets. The results for both testing sets are compared with the baseline presented in~\cite{niek96732}. For each log, we provide the average Damerau-Levenshtein similarity between the predicted sequence (in square brackets, its average length) and the ground truth length of the traces (column "Groundtruth"). The best average Damerau-Levenshtein similarity for each log is emphasized in gray. Column "Tested" reports the number of traces tested while column "Prefix" specifies the range of the prefix lengths used for the specific event log. %The prefixes range is chosen around middle point of the average lengths of the traces.
\newcommand{\maxf}[1]{{\cellcolor[gray]{0.8}} #1}
%\begin{table}[h]
% \centering
% \sisetup{detect-weight=true,detect-inline-weight=math}
% \begin{tabular}{lS[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]Sc}
% \toprule
% \textbf{Log} & \multicolumn{2}{c}{\textbf{Baseline}} & \multicolumn{2}{c}{\nocycle} & \multicolumn{2}{c}{\backtrack} & \multicolumn{2}{c}{\protrack} & \textbf{Groundtruth} & \textbf{Prefix} \\\hline
% EnvLog & \maxf{0.22}&[\num{25.99}] & 0.22&[\num{25.49}] & 0.22&[\num{26.26}] & 0.22&[\num{27.67}] & 33.61 & 2-20 \\
% HelpDesk & 0.65&[\num{1.26}] & 0.65&[\num{1.26}] & \maxf{0.77}&[\num{1.15}] & \maxf{0.77}&[\num{1.15}] & 1.71 & 2-Max \\
% BPI11 & 0.19&[\num{199.00}] & 0.24&[\num{42.59}] & \maxf{0.34}&[\num{57.61}] & 0.25&[\num{145.86}] & 89.41 & 10 \\
% BPI13 & 0.58&[\num{32.30}] & \maxf{0.60}&[\num{2.60}] & 0.57&[\num{1.76}] & 0.53&[\num{2.29}] & 1.92 & 3-10 \\
% BPI12 & 0.08&[\num{58.74}] & 0.26&[\num{2.10}] & 0.16&[\num{1.33}] & \maxf{0.32}&[\num{2.75}] & 7.22 & 3-10 \\
% BPI17 & 0.40&[\num{13.62}] & 0.40&[\num{13.62}] & \maxf{0.41}&[\num{13.46}] & 0.40&[\num{13.62}] & 9.39 & 5 \\
% \bottomrule
% \end{tabular}
% \caption{Results related to predictions.}
% \label{table:results}
%\end{table}
\begin{table}[t]
\centering
\sisetup{detect-weight=true,detect-inline-weight=math}
\begin{scriptsize}
\begin{tabular}{lS[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = right]s[table-unit-alignment = left]Sc}
\toprule
\textbf{Log} & \multicolumn{2}{c}{\textbf{Baseline}} & \multicolumn{2}{c}{\nocycle} & \multicolumn{2}{c}{\protrack} & \textbf{Groundtruth} & \textbf{Tested} & \textbf{Prefix} \\\hline
EnvLog & \maxf{0.250}&[\num{17.4}] & \maxf{0.250}&[\num{17.4}] & 0.070&[\num{95.00}] & 29.40 & 80 & 19-22 \\
HelpDesk & 0.551&[\num{1.44}] & 0.551&[\num{1.44}] & \maxf{0.831}&[\num{2.36}] & 3.00 & 576 & 2-5 \\
BPIC11 & 0.204&[\num{199.00}] & \maxf{0.281}&[\num{199.00}] & 0.274&[\num{198.00}] & 117.11 & 144 & 13-16 \\
BPIC12 & 0.071&[\num{47.07}] & 0.387&[\num{6.86}] & \maxf{0.416}&[\num{7.53}] & 10.95 & 1\,548 & 2-5 \\
BPIC13 & 0.116&[\num{100.80}] & 0.502&[\num{14.71}] & \maxf{0.596}&[\num{6.13}] & 7.15 & 3\,209 & 2-5 \\
BPIC17 & 0.448&[\num{11.78}] & 0.448&[\num{11.78}] & \maxf{0.510}&[\num{15.14}] & 16.01 & 10\,153 & 6-9 \\
\bottomrule
\end{tabular}
\end{scriptsize}
\caption{Prediction results on the strong a-priori testing set}
\label{table:results-strong}
\end{table}
\begin{table}[t]
\centering
\sisetup{detect-weight=true,detect-inline-weight=math}
\begin{scriptsize}
\begin{tabular}{lS[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = left]S[table-unit-alignment = right]s[table-unit-alignment = right]s[table-unit-alignment = left]Sc}
\toprule
\textbf{Log} & \multicolumn{2}{c}{\textbf{Baseline}} & \multicolumn{2}{c}{\nocycle} & \multicolumn{2}{c}{\protrack} & \textbf{Groundtruth} & \textbf{Tested} & \textbf{Prefix} \\\hline
EnvLog & \maxf{0.246}&[\num{18.22}] & \maxf{0.246}&[\num{18.22}] & 0.084&[\num{89.91}] & 31.31 & 108 & 19-22 \\
HelpDesk & 0.551&[\num{1.44}] & 0.551&[\num{1.44}] & \maxf{0.747}&[\num{2.14}] & 3.00 & 576 & 2-5 \\
BPIC11 & 0.220&[\num{199.00}] & \maxf{0.292}&[\num{199.00}] & 0.282&[\num{197.4}] & 112.66 & 450 & 13-16 \\
BPIC12 & 0.100&[\num{48.08}] & 0.263&[\num{6.81}] & \maxf{0.264}&[\num{7.00}] & 8.33 & 3\,179 & 2-5 \\
BPIC13 & 0.130&[\num{95.19}] & 0.459&[\num{14.92}] & \maxf{0.569}&[\num{5.14}] & 5.85 & 4\,364 & 2-5 \\
BPIC17 & 0.448&[\num{11.78}] & 0.448&[\num{11.78}] & \maxf{0.469}&[\num{14.00}] & 16.01 & 10\,153 & 6-9 \\
\bottomrule
\end{tabular}
\end{scriptsize}
\caption{Prediction results on the weak a-priori testing set}
\label{table:results-weak}
\end{table}
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=\textwidth]{2_evaluation.png}
\caption{Plots of prediction results}
\label{figure:results}
\end{center}
\end{figure}
The tables and Figure~\ref{figure:results} show that the proposed algorithms outperform the baseline in most of the logs (specifically, the figure shows the trend of the techniques performances with a change of a prefix. As one can see that sometimes lines on the plot perfectly overlap meaning that on some logs the weak and strong formulas do not make a difference). The presence of cycles in the logs has a strong impact on the performance of the \nocycle algorithm. In particular, if the logs have an average number of cycle repetitions smaller than $0.5$, as in the case of EnvLog, HelpDesk, and BPIC17, then \nocycle does not show any improvement over the baseline. Therefore, we can conclude that \nocycle correctly deals with the presence of cycles in the logs to improve the predictions.
%Concerning the ability of \protrack to leverage the regularity of the logs, it is evident that the best improvements obtained by the \protrack algorithm refer to event logs that have a small activity set (small number of activity labels) and not HelpDesk and BPIC13 in our experiments.
\protrack perform worse on logs EnvLog and the BPIC11. The reason for this can be explained by the fact that, in these two logs, activity labels are sparse with an unusually high number of labels with respect to the number of traces. Indeed, Table~\ref{table:dataset} shows that the ratio between the number of activity labels and the number of traces (column 8) for these logs is higher with respect to the other logs.
%While a full investigation of the impact of the log characteristics over the performance of LSTM-based techniques is out of the scope for this paper and is left to future work, we can already say that logs characterized by a high degree of sparsity of activity labels (and, therefore, containing sparse behaviors) are unlikely candidates to benefit from the proposed techniques.
We can also notice that the availability of highly constraining rules in the a-priori knowledge improves the performance of \protrack. Therefore, we can conclude that \protrack is able to correctly leverage a-priori knowledge in a way that it performs better when the activity set of the log is not particularly large (and the log does not contain sparse behaviors) and when the a-priori knowledge constrains more the process behavior.
%To sum up, the evaluation shows that both the \nocycle and the \protrack techniques actually improve the performance of the baseline approach and they are particularly suitable to be used when the event logs present several cycles or when the alphabet is not extremely large, respectively.
% , we can observe
% \item {Big performance for apriori shown on logs which have small alphabet and not much cycles (helpdesk, bpi13)}
% \item {For logs mentioned above and for the same reasons, the difference between STRONG and WEAK formulas are bigger(STRONG performs very good). }
% \item {Applying STRONG formulas gives better results as they more constraining in most cases.}
%\todoincg{shall we add a sentence in terms of lessons learned or is this enough?}
% \subsection{Discussion}
%
% \begin{itemize}
% % \item{Environment permit, BPI11 do not improve with apriori knowledge because the training data is too sparse and alphabet size is too big.}
% % \item{If there are no cycles (in our logs it is less than 0.5 per trace) there is no improvement for NOCycle technique. (Examples env permit, helpdesk, bpi17)}
% \item {Big performance for apriori shown on logs which have small alphabet and not much cycles (helpdesk, bpi13)}
% \item {For logs mentioned above and for the same reasons, the difference between STRONG and WEAK formulas are bigger(STRONG performs very good). }
% \item {Applying STRONG formulas gives better results as they more constraining in most cases.}
%
% \end{itemize}
% \label{ssec:discussion}
% "lessons learned"': when we have cycles in the log, we should use the cycle approach, when the average lenght of the log is high then use protrack, ...
% section evaluation (end)