A TEE-based Approach for Preserving Data Secrecy in Process Mining with Decentralized Sources

A TEE-based Approach for Preserving Data Secrecy in Process Mining with Decentralized Sources
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Process mining techniques enable organizations to gain insights into their business processes through the analysis of execution records (event logs) stored by information systems. While most process mining efforts focus on intra-organizational scenarios, many real-world business processes span multiple independent organizations. Inter-organizational process mining, though, faces significant challenges, particularly regarding confidentiality guarantees: The analysis of data can reveal information that the participating organizations may not consent to disclose to one another, or to a third party hosting process mining services. To overcome this issue, this paper presents CONFINE, an approach for secrecy-preserving inter-organizational process mining. CONFINE leverages Trusted Execution Environments (TEEs) to deploy trusted applications that are capable of securely mining multi-party event logs while preserving data secrecy. We propose an architecture supporting a four-stage protocol to secure data exchange and processing, allowing for protected transfer and aggregation of unaltered process data across organizational boundaries. To avoid out-of-memory errors due to the limited capacity of TEEs, our protocol employs a segmentation-based strategy, whereby event logs are transmitted to TEEs in smaller batches. We conduct a formal verification of correctness and a security analysis of the guarantees provided by the TEE core. We evaluate our implementation on real-world and synthetic data, showing that the proposed approach can handle realistic workloads. The results indicate logarithmic memory growth with respect to the event log size and linear growth with the number of provisioning organizations, highlighting scalability properties and opportunities for further optimization.


💡 Research Summary

The paper introduces CONFINE, a framework that enables secrecy‑preserving inter‑organizational process mining by leveraging Trusted Execution Environments (TEEs). Traditional inter‑organizational mining faces a fundamental tension: organizations must share event logs to obtain a global view of a collaborative process, yet the logs contain sensitive operational details that cannot be disclosed to partners or third‑party service providers. Existing approaches mitigate this risk by anonymising, aggregating, or otherwise transforming the data before analysis, but these techniques inevitably degrade mining accuracy or restrict the set of applicable algorithms.

CONFINE resolves this dilemma by keeping the original, unaltered logs under the control of each data‑owning organization while still allowing a joint mining computation. The core idea is to run the mining algorithms inside a hardware‑isolated enclave (e.g., Intel SGX, AMD SEV, or ARM TrustZone) that guarantees data confidentiality, integrity, and code integrity. The framework defines a four‑stage protocol:

  1. Metadata Exchange – Organizations agree on log schemas, segment sizes, and cryptographic hashes that will be used for later verification.
  2. Attestation – Each participating enclave produces a remote attestation proof, signed with a hardware‑bound key, allowing the verifier to confirm that the code running inside the enclave is exactly the trusted mining application.
  3. Secure Transmission & Merge – Event logs are split into small batches (segments) and encrypted with a session key derived during attestation. Encrypted segments are streamed into the enclave, where they are merged using metadata‑only operations without ever exposing raw data outside the enclave. This segmentation strategy prevents out‑of‑memory failures, as the enclave only needs to hold a logarithmic amount of data relative to the total log size.
  4. Mining Execution – Once all segments are merged, standard process‑mining algorithms (e.g., Heuristics Miner for discovery, Declare‑Conformance for compliance checking) are executed inside the enclave. The results can be released to the participants, while the raw logs remain confidential.

The authors formally specify the protocol using pseudo‑code (Algorithms 1‑4) and prove two key properties: soundness (every segment that is sent is eventually incorporated into the merged log) and completeness (no segment is omitted). A security analysis maps the confidentiality, integrity, and code‑integrity guarantees of TEEs to a threat model that includes external attackers, compromised operating systems, and malicious insiders.

Empirical evaluation uses both real‑world event logs and synthetic datasets. Memory consumption is measured while varying (i) the number of events, (ii) the number of cases, and (iii) the number of participating organizations. Results show logarithmic memory growth with respect to log size and linear growth with the number of organizations. Even with logs containing up to one million events, the enclave’s memory usage stays well within the typical SGX EPC limit (≈ 128 MiB), and no out‑of‑memory errors occur. The overhead introduced by segment transmission is mitigated through asynchronous pipelining and batch‑size tuning, keeping total execution time comparable to a non‑secure baseline.

The discussion highlights practical deployment considerations: the cost of enclave initialization, key management across organizational boundaries, and the need for mutually agreed policies on segment size and attestation procedures. Limitations include the linear memory increase when many organizations collaborate, which may become a bottleneck for large consortia, and the reliance on specific TEE implementations whose security guarantees can differ. Future work proposes multi‑enclave federation, dynamic segment sizing, and extending the approach to other analytics domains such as machine‑learning model training on distributed confidential data.

In conclusion, CONFINE demonstrates that it is feasible to perform accurate, full‑fidelity inter‑organizational process mining without sacrificing data secrecy. By exploiting hardware‑based confidential computing, the framework avoids the accuracy‑loss trade‑offs of prior privacy‑preserving techniques and opens a path toward broader collaborative analytics in regulated industries such as healthcare, supply‑chain management, and manufacturing.


Comments & Academic Discussion

Loading comments...

Leave a Comment