RQ1a - What is the overhead introduced by monitoring function calls?

Figure 1 shows the overhead that we observed for operations in each category and for each subject application. Note that not all types of operations occur in every application, for instance Captive operations are present in Paint.NET only.

The overhead profile per category is quite consistent. In the case of Instantaneous operations the overhead is always close to 0. This is probably due to the nature of Instantaneous operations that perform simple operations that imply the execution of a limited amount of logic and thus produce a limited number of function calls. A similar result can be observed for Immediate operations, where the overhead is small for Adobe Reader DC and Notepad++. Paint.NET represents an exception because its overhead is higher. The overhead profile is again quite consistent across operations in the Continuous Simple and Complex categories, with the overhead ranging between 0% and 200%.

Although there are similarities for operations in the same category even if present in different applications, we can also observe that there are exceptions. In fact, there are several outliers represented in the boxplot, with some of them showing very different overhead values compared to the rest of the samples. For example, we had two Continuous Simple operations in Notepad++ (selecting the Java highlighting and dismissing a save operation) with a high overhead (the two outliers) compared to the other operations, which experienced 100% overhead at most.

Figure 2 shows the percentage of operations in each category affected by overhead levels within specific ranges. Collecting function calls produces an overhead in the interval 0-10% in the majority of the cases (65% of the executed operations). In 8% of the cases, operations are exposed to an overhead between 10% and 30%. In 12% of the cases monitoring produced an overhead in the interval 30-80%, and for less than 15% of the operations the overhead is higher.

Percentage of operations undergoing a specific overhead interval.

We can conclude that the observed behavior within operations of a same category is not significantly different, although specific operations may violate this pattern (Figure 1). Moreover, collecting function calls exposes operations to an overhead that is lower than 10% in the large majority of cases, and is seldom higher than 80% (Figure 2). Estimating if and how much this overhead can be intrusive with respect to user activity is studied with the next research question.

RQ1b - What is the impact of monitoring function calls on the user experience?

Table [table:slow_operations] reports the analytical results obtained for the operations recorded as slow in the four subject applications. For each application the table shows the number of operations in each category that have been executed in the experiment and how the operation has been classified once affected by the overhead caused by function calls monitoring. The overhead is not recognizable by users if the category does not change with monitoring overhead. A perfect result implies having all 0s outside the values in the diagonal (highlighted with a grey background). When an operation changes its category, the table shows what the new category of the operation is. The column $`>`$ Captive shows the number of operations whose duration is longer than the maximum allowed for a Captive operation. The last column, Slow Operations $`[\%]`$, specifies the percentage of slow operations across all the executions.

Percentage of with respect to the SRT Categories.

Figure 3 visually illustrates how slow operations distribute across operations categories. The last column in each category shows the percentage of slow operations for that category across all subject applications.

The empirical data suggests that Instantaneous operations seldom present a slowdown that affects the user experience: in fact only 6% of the cases produced a recognizable slowdown. We obtained a similar result for Immediate operations with the exception of Paint.NET, where the slowdown has been significant for every Immediate operation that has been executed. This result is coherent with the exceptional overhead reported for Immediate operations in Paint.NET for RQ1a. This is likely caused by the nature of the Immediate operations in Paint.NET, which execute non trivial logic (e.g., the operation that closes an image) and are more expensive to monitor.

When the portion of logic of the application that is executed increases, the percentage of operations that become slow also increases, as observed for Continuous operations that in some cases become even slower than Captive operations (see Table [table:slow_operations]): for instance, the execution time of five Continuous Simple operations in Notepad++ exceeded the time expected for a Captive operation. The higher cost of monitoring Continuous operations is visible also in Figure 3, where more than 20% of the Continuous operations (both Simple and Complex) have been significantly slowed down in average, compared to Instantaneous and Immediate operations where about 5% of the operations have been slowed down, if we do not consider those from Paint.NET (which is a special case).

Extremely long tasks, such as Captive operations, seem to tolerate well the overhead caused by function calls monitoring. However, since they are present in one application only, it is hard to distill a more general lesson learnt.

We can conclude that the operations that are likely to be perceived as slowed down are quite limited in number ($`<`$20% overall) and mostly concentrated in the Continuous operations. Moreover, applications that implement small pieces of logic that must be executed quickly, as Paint.NET does, might be particularly hard to monitor, in fact its Immediate operations have been all significantly slowed down when collecting function calls.

RQ1c - What is the tolerance of the operations to the introduced overhead?

Percentage of for different overhead intervals.

Since we exposed operations in different categories to various overhead levels, this research question studies how often a certain overhead is the cause of operations resulting in a too slow response time. Figure 4 shows the percentage of operations, for all the categories, reported to be slow for overhead within a given range and for operations in all categories.

In our previous study , we identified 30%, 80%, and 180% as interesting overhead values that may produce different reactions by users, so we used these ranges in this study to analyze the collected data.

We obtained a similar result with this experiment: an overhead level between 30% and 80% is hard to tolerate for operations in any category with the exception of Instantaneous operations, while overhead values higher than 80% can be prohibitive.

We can conclude that overhead levels up to 30% are not harmful, but higher overhead levels must be introduced wisely with the exception of Instantaneous operations that seem to tolerate overhead slightly better than operations in the other categories.

RQ2 - What is the impact of monitoring function calls when the availability of computational resources is limited?

In this section we study the impact of the monitoring activity when the computational resources cannot be completely allocated to the monitored applications but they are also allocated to other tasks. We first discuss the impact of CPU availability and then we discuss the impact of memory availability.

Similarly to RQ1, we study the impact of collecting function calls by analyzing the overhead and studying the number of operations changing category when CPU and RAM are under stress.

RQ2a - What is the impact of CPU availability on the intrusiveness of monitoring?

Execution time for various CPU load levels per operation category.

Figure 5 shows the system response time (presented in log scale) of the executed operations per operation category. We report timing information considering four CPU load levels: 0%, 60%, 75%, and 90%.

The figure includes two types of boxplots: the orange boxplot corresponds to the execution time observed when monitoring is in place, while the brown boxplot corresponds to the execution time when no monitoring is in place.

Treatment	Chi-square	p-value	df
Instantaneous	2.2107	0.5298	3
Immediate	1.1327	0.7692	3
Continuous Simple	3.3914	0.3351	3
Continuous Complex	4.4726	0.2147	3
Captive	1.54	0.6731	3

Kruskal-Wallis test results per operation category.

The trend is quite similar for all classes of operations with the exception of Immediate operations, which show decreasing values of the overhead for higher CPU load values. We conducted a Kruskal-Wallis test to check if the overhead introduced for a given CPU load and a given class of operations differs from the overhead for the same class of operations exposed to a different CPU load (significance expected for p-value $`< 0.05`$). The test revealed no significant differences (see Table 1), suggesting that the impact of monitoring is not affected by a significant degree by the CPU load level, that is, an application is slowed down similarly by function calls monitoring regardless of the CPU availability.

Percentage of slow operations for various CPU load levels per application.

Percentage of slow operations for various CPU load levels per operation category.

We also considered how monitoring affects the number of slow operations per application, shown in Figure 6, and the number of slow operations per operation category, shown in Figure 7. The usage of a loaded CPU already generates a number of slow operations for each application. Adding function calls monitoring further increases the number of operations that have been slowed down. We can however notice that the only addition of monitoring makes the user experience worse by a similar degree across CPU load levels, confirming that the CPU load level is not a significant factor when considering the impact of monitoring. To confirm this intuition we computed the linear regression of the number of slow operations for the instrumented and non-instrumented version of each application, and considered the difference between the angular coefficients of the computed lines. We further considered the percentage of operations with a different classification when the CPU saturates to 100% (highest saturation possible) based on the computed trends.

Table 2 reports the results. For each application we indicate the difference between the angular coefficients (on the left) and the percentage of operations with a different categorization (on the right).

We can notice that the difference in the increase of the number of slow operations is between 2.66% and 14.33% of the operations, indicating a similar trend (i.e., slope) for the two cases (with and without monitoring). The small positive values of the difference between the coefficients indicates that, when a difference is observed (e.g., 14.33% of the operations in Paint.NET), the saturation of the CPU increases the number of slow operations by a lower degree when monitoring is active.

	Adobe Reader DC	Notepad++	Paint.NET	VLC Media Player
CPU	0.047 – 2.66%	0.123 – 8.78%	0.308 – 14.33%	0.256 – 13.85%

Trend analysis for CPU.

The plot of the data per operation category, Figure 7, reveals that Instantaneous operations behave better than the other operations in terms of their ability to tolerate monitoring, in fact the number of slow operations does not change significantly when monitoring is introduced in the system. Even if Captive operations behave similarly to Instantaneous operations, it is hard to generalize the result since they are present in one application only. On the other hand, Instantaneous operations are more sensitive to the load of the CPU, in fact, more than 80% of the Instantaneous operations are slow when the CPU load reaches 90%.

We can conclude that the CPU load level does not significantly affect the intrusiveness of function calls monitoring. In fact, the impact of the addition of monitoring tends to be the same regardless of CPU availability, and when a difference is observed, monitoring results to be slightly less intrusive with a higher saturation of the CPU.

RQ2b - What is the impact of memory availability on the intrusiveness of monitoring?

	Adobe Reader DC	Notepad++	Paint.NET	VLC Media Player
Max RAM	356 MB	278 MB	520 MB	303 MB

Maximum memory used during experimentation.

To better discuss the results for RQ2b, we report the memory usage of each application as summarized in Table 3: the maximum memory consumption observed during the execution of our tests for Adobe Reader DC is 353 MB of RAM, for Notepad++ is 278 MB of RAM, for Paint.NET is 520 MB of RAM, and for VLC Media Player is 303 MB of RAM.

Figure 8 shows the overhead introduced in the system response time (presented in log scale) per operation category when varying the amount of occupied memory up to 90%.

The orange boxplot corresponds to the execution time observed when function calls are collected, while the brown boxplot corresponds to the execution time when no monitoring is in place.

Execution time for different RAM availability per operation category.

Treatment	Chi-square	p-value	df
Instantaneous	3.5831	0.3101	3
Immediate	3.5298	0.3169	3
Continuous Simple	2.1604	0.5398	3
Continuous Complex	1.1438	0.7665	3
Captive	0.1913	0.979	3

Kruskal-Wallis test results per operation category.

Similar to Section 18.2 we check for statistical differences between groups using a Kruskal-Wallis test (see Table 4), obtaining no significant difference between different levels of RAM load. Particularly, the results show a clearly negligible effect of the memory on the overhead, indeed the overhead is similar for different values of memory occupation.

We also investigated how memory occupation impacts on the operations that become slow. Figure 9 shows the number of slow operations per application, while Figure 10 shows the number of slow operations per operation category. The behavior of the applications does not reveal any trend. To confirm this intuition we computed the linear regression of the number of slow operations for the instrumented and non-instrumented version of each app and considered the difference between the angular coefficients of the computed lines. We further considered the percentage of operations with a different classification when the memory saturates to 100% (highest saturation possible) based on the computed trends.

Table 5 reports the results. For each application we indicate the difference between the angular coefficients (on the left) and the percentage of operations with a different categorization (on the right).

We can notice negligible difference in the coefficients and the number of slow operations, suggesting similar trend for the two cases.

The results per operation category confirm the same behavior we observed for CPU utilization: Instantaneous operations better tolerate low availability of the computational resources compared to operations in the other categories. Anyway, memory occupation does not produce relevant effects when analyzing the results per operation category, either.

Percentage of slow operations for various RAM load levels per application.

Percentage of slow operations for various RAM load levels per operation category.

	Adobe Reader DC	Notepad++	Paint.NET	VLC Media Player
RAM	0.084 – 4.77%	0.059 – 4.25%	0.023 – 1.08%	0.054 – 2.91%

Memory trend analysis.

We can conclude that the memory load level does not affect the intrusiveness of function calls monitoring by a significant degree. In fact, the monitoring overhead tends to be the same regardless of memory availability.

Discussion

The analysis of the impact of function calls monitoring on the user experience when the availability of the computational resources is limited revealed little influence of the computational resources.

As a consequence, the logic of the monitoring can be activated and deactivated with limited attention to computational resources. Only in the case of CPU saturation higher than 90%, monitoring should be avoided since this could turn the application unresponsive.

Finally, results revealed that Instantaneous operations are less sensitive to memory availability compared to other kinds of operations.