Fraud detection and risk assessment of online payment transactions on e-commerce platforms based on LLM and GCN frameworks

With the rapid growth of e-commerce, online payment fraud has become increasingly complex, posing serious threats to financial security and consumer trust. Traditional detection methods often struggle to capture the intricate relational structures inherent in transactional data. This study presents a novel fraud detection framework that combines Large Language Models (LLM) with Graph Convolutional Networks (GCN) to effectively identify fraudulent activities in e-commerce online payment transactions. A dataset of 2,840,000 transactions was collected over 14 days from major platforms such as Amazon, involving approximately 2,000 U.S.-based consumers and 30 merchants. With fewer than 6000 fraudulent instances, the dataset represents a highly imbalanced scenario. Consumers and merchants were modeled as nodes and transactions as edges to form a heterogeneous graph, upon which a GCN was applied to learn complex behavioral patterns. Semantic features extracted via GPT-4o and Tabformer were integrated with structural features to enhance detection performance. Experimental results demonstrate that the proposed model achieves an accuracy of 0.98, effectively balancing precision and sensitivity in fraud detection. This framework offers a scalable and real-time solution for securing online payment environments and provides a promising direction for applying graph-based deep learning in financial fraud prevention.

💡 Research Summary

The paper addresses the escalating problem of online payment fraud in e‑commerce, where traditional detection methods often fail to capture the intricate relational patterns inherent in transaction data. To overcome this limitation, the authors propose a hybrid framework that integrates Large Language Models (LLMs) with Graph Convolutional Networks (GCNs). The core idea is to model consumers and merchants as nodes and individual transactions as edges, thereby constructing a heterogeneous graph that reflects the true network of financial interactions.

Data were collected over a 14‑day period from major platforms such as Amazon, comprising 2,840,000 transactions involving roughly 2,000 U.S. consumers and 30 merchants. Only about 6,000 of these transactions are labeled as fraudulent, creating a highly imbalanced dataset (≈0.21% fraud). Each transaction record includes both structured fields (amount, timestamp, payment method) and unstructured textual metadata (customer reviews, payment notes).

For the unstructured components, the authors employ GPT‑4o and Tabformer to generate dense semantic embeddings. GPT‑4o processes free‑form text via carefully crafted prompts that highlight fraud‑related cues, while Tabformer encodes tabular metadata into comparable vectors. These embeddings are attached to the corresponding node features, enriching the graph with contextual information that traditional numeric features lack.

The graph is then fed into a Relational GCN (RGCN) with two convolutional layers followed by a pooling operation. Edge types (e.g., purchase, refund) receive distinct weight matrices, allowing the model to learn type‑specific propagation rules. Node representations after the GCN are concatenated with the LLM‑derived embeddings and passed through a multilayer perceptron (MLP) for binary classification (fraud vs. legitimate). To mitigate class imbalance, the loss function incorporates class‑wise weighting, and the training pipeline combines SMOTE‑like oversampling of fraudulent nodes with under‑sampling of legitimate nodes at the graph level.

Experimental results compare the proposed hybrid model against three baselines: (1) XGBoost trained solely on structured features, (2) a standalone GCN using only structural information, and (3) an LLM‑only MLP using semantic embeddings without graph context. The hybrid approach achieves an overall accuracy of 0.98, precision of 0.96, recall of 0.94, and an F1‑score of 0.95. Notably, recall improves by 8–12 percentage points over the baselines, indicating a substantial reduction in missed fraudulent cases. Inference latency averages 12 ms per transaction, demonstrating feasibility for real‑time deployment. An ablation study confirms that removing either the LLM component or the GCN component degrades performance, underscoring their complementary roles.

The authors discuss several limitations. First, the dataset is confined to U.S. consumers and a limited set of merchants, raising questions about cross‑regional generalizability. Second, reliance on GPT‑4o incurs API costs and latency that may be prohibitive for large‑scale production environments. Third, the graph construction excludes auxiliary behavioral logs such as clickstreams or search histories, which could further enrich fraud patterns.

Future work is outlined along four dimensions: (a) expanding the dataset to include international users and a broader merchant base, (b) exploring lightweight LLM alternatives (e.g., DistilGPT) to reduce computational overhead, (c) integrating multimodal logs (click, browse, device fingerprints) into the graph, and (d) adopting federated learning or privacy‑preserving techniques to protect sensitive financial data while still benefiting from collective model updates. Additionally, the authors suggest investigating dynamic graph updates and streaming GCN training to adapt to evolving fraud tactics in near real‑time.

In conclusion, the study demonstrates that fusing semantic insights from LLMs with structural learning from GCNs yields a powerful, scalable solution for e‑commerce payment fraud detection. The framework not only attains high accuracy and low latency but also offers a flexible foundation for future research at the intersection of graph‑based deep learning and large‑scale language modeling in financial security.

💡 Research Summary

📜 Original Paper Content