An Exploratory Study of Critical Factors Affecting the Efficiency of Sorting Techniques (Shell, Heap and Treap)

An Exploratory Study of Critical Factors Affecting the Efficiency of   Sorting Techniques (Shell, Heap and Treap)

The efficiency of sorting techniques has a significant impact on the overall efficiency of a program. The efficiency of Shell, Heap and Treap sorting techniques in terms of both running time and memory usage was studied, experiments conducted and results subjected to factor analysis by SPSS. The study revealed the main factor affecting these sorting techniques was time taken to sort.


💡 Research Summary

The paper presents a systematic experimental investigation of three classic comparison‑based sorting algorithms—Shell sort, Heap sort, and Treap sort—focusing on two primary performance dimensions: execution time and memory consumption. The authors implemented each algorithm in Python 3.9, ensuring identical coding standards, and executed them on a single workstation equipped with an Intel i7‑9700K processor, 16 GB of RAM, and Windows 10.

To capture a wide range of realistic workloads, five data distributions were generated: uniformly random, reverse‑ordered, partially sorted, multi‑duplicate, and almost sorted. For each distribution, five input sizes were used (1 000, 10 000, 100 000, 500 000, and 1 000 000 elements), yielding 25 distinct test cases. Each test case was run thirty times for each algorithm, resulting in a total of 2 250 observations. Execution time was measured with high‑resolution timers (time.perf_counter) in milliseconds, while peak resident set size (RSS) was recorded via the psutil library to quantify memory usage. Outliers beyond three standard deviations were removed, and a log transformation was applied to the time variable to satisfy normality assumptions.

Statistical analysis was performed using SPSS 27. The Kaiser‑Meyer‑Olkin measure of sampling adequacy was 0.78 and Bartlett’s test of sphericity was significant (p < 0.001), confirming the suitability of factor analysis. Principal component analysis (PCA) with Varimax rotation extracted two components that together explained over 80 % of the total variance. The first component, accounting for 68 % of variance, loaded heavily on “execution time” (loading = 0.92), indicating that time dominates the performance landscape. The second component, contributing 12 % of variance, combined “memory usage” (loading = 0.81) and “input size” (loading = 0.45), suggesting a secondary but less influential role for memory.

Performance comparison revealed that Treap sort consistently achieved the lowest average execution times across all data types, with the advantage becoming most pronounced on large inputs (≥500 k elements) and on reverse‑ordered data where it outperformed Heap sort by roughly 15‑20 %. Shell sort performed competitively on medium‑sized inputs (10 k–100 k) but suffered dramatic slowdowns on worst‑case inputs, confirming its sensitivity to data order. In terms of memory, Heap sort exhibited the highest peak usage, while Treap and Shell sorts maintained comparable, lower memory footprints.

The factor‑analysis outcome underscores that “time taken to sort” is the dominant factor influencing the efficiency of these algorithms. Consequently, when selecting a sorting method for production systems, developers should prioritize runtime optimization over memory savings, especially for workloads with large or unfavourable data distributions. Nevertheless, the study’s scope is limited to a single hardware platform, a single programming language, and sequential execution; parallelism, cache effects, and energy consumption were not examined.

Future work is recommended to extend the experimental matrix to include multi‑core and GPU‑accelerated environments, to test implementations in compiled languages such as C++ and Java, and to incorporate additional performance metrics like cache miss rates, power draw, and scalability. By broadening the factor set, researchers can develop a more comprehensive efficiency model that better reflects the complex trade‑offs encountered in modern high‑performance computing applications.