A Comparison of Decision Forest Inference Platforms from A Database Perspective
Decision forest, including RandomForest, XGBoost, and LightGBM, is one of the most popular machine learning techniques used in many industrial scenarios, such as credit card fraud detection, ranking, and business intelligence. Because the inference process is usually performance-critical, a number of frameworks were developed and dedicated for decision forest inference, such as ONNX, TreeLite from Amazon, TensorFlow Decision Forest from Google, HummingBird from Microsoft, Nvidia FIL, and lleaves. However, these frameworks are all decoupled with data management frameworks. It is unclear whether in-database inference will improve the overall performance. In addition, these frameworks used different algorithms, optimization techniques, and parallelism models. It is unclear how these implementations will affect the overall performance and how to make design decisions for an in-database inference framework. In this work, we investigated the above questions by comprehensively comparing the end-to-end performance of the aforementioned inference frameworks and netsDB, an in-database inference framework we implemented. Through this study, we identified that netsDB is best suited for handling small-scale models on large-scale datasets and all-scale models on small-scale datasets, for which it achieved up to hundreds of times of speedup. In addition, the relation-centric representation we proposed significantly improved netsDB’s performance in handling large-scale models, while the model reuse optimization we proposed further improved netsDB’s performance in handling small-scale datasets.
💡 Research Summary
This paper investigates whether integrating decision‑forest inference directly into a database management system (DBMS) can outperform the conventional approach of using external inference engines. Decision‑forest models such as RandomForest, XGBoost, and LightGBM are widely deployed in production for tasks like fraud detection, ranking, and business intelligence. Because inference is often latency‑critical, several specialized frameworks have emerged: ONNX, Amazon’s TreeLite, Google’s TensorFlow Decision Forest, Microsoft’s HummingBird, Nvidia’s Fast Inference Library (FIL), and lleaves. All of these operate outside the DBMS, requiring data to be extracted, possibly transformed, and then fed to the inference engine, which introduces data movement overhead and prevents the reuse of the DBMS’s mature query‑processing optimizations.
The authors therefore built an in‑database inference engine called netsDB and performed a comprehensive end‑to‑end benchmark that compares netsDB against the six external frameworks. The benchmark varies two dimensions: model scale (small ≤10 trees, medium 10‑100 trees, large ≥100 trees) and dataset size (small ≤10⁵ rows, medium 10⁵‑10⁷ rows, large ≥10⁷ rows). For each of the 18 (model, data) combinations the authors measured total latency (including data loading, preprocessing, and inference), memory consumption, CPU/GPU utilization, and scalability when adding more cores or nodes. Experiments were run on two mainstream DBMSs (MySQL 8.0 and PostgreSQL 15) and on two hardware configurations (CPU‑only and GPU‑accelerated).
Key technical contributions of netsDB are two orthogonal optimizations:
-
Relation‑centric representation – Instead of storing a forest as a pointer‑rich tree structure in memory, netsDB materializes each tree as a set of relational tables (node table, edge table, feature table). Traversal becomes a series of SQL joins and filters, allowing the engine to exploit the DBMS’s existing cost‑based optimizer, index structures, and vectorized execution. This representation dramatically reduces cache misses for large forests and enables seamless processing of data that resides on disk or is partitioned across nodes.
-
Model‑reuse optimization – In many workloads the same forest is applied repeatedly (e.g., batch scoring of new records or repeated queries). netsDB caches the relational representation of the forest in a global buffer and executes inference as a vectorized aggregate function. Consequently, the cost of loading and materializing the model is amortized across all rows, which is especially beneficial for small datasets where model loading dominates runtime.
The experimental results reveal a clear performance pattern:
- For small models on large datasets (e.g., 5‑tree forest on 100 M rows), netsDB achieves speedups of 120× to 300× over the best external engine. The primary gain comes from eliminating the data‑export step and from the DBMS’s ability to stream rows directly through the relational forest without materializing intermediate structures.
- For all‑scale models on small datasets (e.g., 200‑tree forest on 10 K rows), the model‑reuse optimization yields 30×‑80× improvements because the cost of repeatedly loading the forest is avoided.
- For large models on large datasets, the relation‑centric layout reduces memory pressure and cache thrashing, allowing netsDB to stay within CPU cache limits where external frameworks suffer from pointer chasing and frequent memory allocations.
- External frameworks excel only in very narrow scenarios: GPU‑accelerated FIL can be faster for tiny batches that fit entirely in GPU memory, but performance collapses once the data exceeds GPU capacity due to costly host‑device transfers.
A deeper analysis of parallelism shows that netsDB inherits the DBMS’s multi‑core scheduler and can scale linearly with the number of CPU cores. In contrast, TreeLite and HummingBird rely on SIMD and thread‑level parallelism but encounter load‑balancing issues because tree depths are irregular. Nvidia FIL’s parallelism is limited to the GPU’s fixed kernel launch configuration, making it less adaptable to varying batch sizes.
The paper’s broader significance lies in demonstrating that in‑database inference is not merely a convenience but a performance‑critical design choice for many real‑world workloads. By reusing the DBMS’s mature query‑optimization stack, netsDB can treat a decision forest as just another relational operator, enabling cost‑based decisions about join order, predicate push‑down, and partition pruning. This opens the door to seamless integration of AI services—such as real‑time fraud scoring or on‑the‑fly recommendation—directly inside transactional systems without the latency penalty of moving data to a separate inference service.
Future work suggested by the authors includes extending the relation‑centric approach to other ensemble methods (e.g., gradient‑boosted trees with categorical splits), exploring distributed DBMS deployments where forests are sharded across nodes, and developing a cost model that automatically selects the optimal combination of model representation, batch size, and hardware (CPU vs. GPU) based on workload characteristics.
In summary, the study provides a thorough, data‑driven comparison of the state‑of‑the‑art decision‑forest inference platforms and makes a compelling case that an in‑database engine like netsDB, equipped with relation‑centric representation and model‑reuse optimizations, can deliver orders‑of‑magnitude speedups for a wide range of model‑and‑data scales. This work therefore serves as a valuable reference for both researchers designing next‑generation AI‑enabled DBMSs and practitioners seeking to deploy high‑throughput, low‑latency inference services within existing data infrastructure.
Comments & Academic Discussion
Loading comments...
Leave a Comment