Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network

Reading time: 2 minute
...

📝 Original Info

  • Title: Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network
  • ArXiv ID: 2512.06648
  • Date: 2025-12-07
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명 및 소속을 확인할 경우 추가해 주세요.) **

📝 Abstract

Since the emergence of joint-stock companies, financial fraud by listed firms has repeatedly undermined capital markets. Fraud is difficult to detect because of covert tactics and the high labor and time costs of audits. Traditional statistical models are interpretable but struggle with nonlinear feature interactions, while machine learning models are powerful but often opaque. In addition, most existing methods judge fraud only for the current year based on current year data, limiting timeliness. This paper proposes a financial fraud detection framework for Chinese A-share listed companies based on convolutional neural networks (CNNs). We design a feature engineering scheme that transforms firm-year panel data into image like representations, enabling the CNN to capture cross-sectional and temporal patterns and to predict fraud in advance. Experiments show that the CNN outperforms logistic regression and LightGBM in accuracy, robustness, and early-warning performance, and that proper tuning of the classification threshold is crucial in high-risk settings. To address interpretability, we analyze the model along the dimensions of entity, feature, and time using local explanation techniques. We find that solvency, ratio structure, governance structure, and internal control are general predictors of fraud, while environmental indicators matter mainly in high-pollution industries. Non-fraud firms share stable feature patterns, whereas fraud firms exhibit heterogeneous patterns concentrated in short time windows. A case study of Guanong Shares in 2022 shows that cash flow analysis, social responsibility, governance structure, and per-share indicators are the main drivers of the model's fraud prediction, consistent with the company's documented misconduct.

💡 Deep Analysis

📄 Full Content

并利用卷积神经网络提取特征 [4] ;Zhao 等人将卷积神经网络框架用于时间序列 数据的分类,其在分类精度和噪声容限方面均优于现有的时间序列分类方法 [5] 。 [6] 。 2018 年,Miller 首先对人工智能领域的可解释性(interpretability)做出了定 义:观察者(人类)可以理解决策原因的程度 [7] 。同年 Christoph Molnar 发表了 第一本系统性介绍机器学习可解释性的书籍《Interpretable Machine Learning》 ,并 根据解释的范围分为全局(Global)可解释性和局部(Local)可解释性 [8] 。

患者进行生存、 死亡预测, 并使用梯度类激活映射图 (graddient based class activate mapping,Grad-CAM)对其进行解释 [9] ;苏盈利用表征可视化技术和反卷积神经 网络对电力系统负荷模式进行双向解释 [10] ;付贵山使用卷积神经网络对乳腺超 声图像进行分类,并通过热度图和语义回归两种方法对模型进行可解释 [11] 。

指标体系构建对于机器学习模型来说至关重要。梁力军等人构建了财务指标 体系,但仅有财务指标不足以把握公司运作过程中的复杂逻辑 [13] 。为此,叶钦华 等人基于复式簿记与会计信息系统论,除财务税务指标维度外,还考虑了公司治 理 (Govern) 等其他四个维度, 五个维度分别对应于会计信息生产的各个环节 [14] 。 但随着可持续发展理念深入人心,环境(Environment)与社会责任(Social)因 素对于企业风险预测的作用越来越显著 [15][16] 。因此,本文参考叶钦华等人提出的

提是特征(feature)具有解释性 [8] ,所以指标体系中的每个特征都必须具有现实 虚构利润" “P2502 虚列资产” “P2503 虚假记载(误导性陈述) " “P2506 披露不 实(其它) " [17] 。样本标签列名为"是否舞弊” ,舞弊样本标记为 1,作为正样

灰色样本是指由于证监会认定财务舞弊存在滞后性或虽然实际舞弊但尚未 发现和披露的样本 [2] 。考虑到本文采用的数据集样本多、维度高,本文采用孤立 森林(Isolation Forest)算法剔除灰色样本 [18] 。孤立森林方法是一种基于树模型

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut