A Scoping Review of Deep Learning for Urban Visual Pollution and Proposal of a Real-Time Monitoring Framework with a Visual Pollution Index

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Urban Visual Pollution (UVP) has emerged as a critical concern, yet research on automatic detection and application remains fragmented. This scoping review maps the existing deep learning-based approaches for detecting, classifying, and designing a comprehensive application framework for visual pollution management. Following the PRISMA-ScR guidelines, seven academic databases (Scopus, Web of Science, IEEE Xplore, ACM DL, ScienceDirect, SpringerNatureLink, and Wiley) were systematically searched and reviewed, and 26 articles were found. Most research focuses on specific pollutant categories and employs variations of YOLO, Faster R-CNN, and EfficientDet architectures. Although several datasets exist, they are limited to specific areas and lack standardized taxonomies. Few studies integrate detection into real-time application systems, yet they tend to be geographically skewed. We proposed a framework for monitoring visual pollution that integrates a visual pollution index to assess the severity of visual pollution for a certain area. This review highlights the need for a unified UVP management system that incorporates pollutant taxonomy, a cross-city benchmark dataset, a generalized deep learning model, and an assessment index that supports sustainable urban aesthetics and enhances the well-being of urban dwellers.

💡 Research Summary

The paper presents a comprehensive scoping review of deep‑learning approaches for detecting, classifying, and managing Urban Visual Pollution (UVP), followed by a proposal for a real‑time monitoring framework that incorporates a Visual Pollution Index (VPI). Following the PRISMA‑ScR methodology, the authors systematically searched seven major scholarly databases—Scopus, Web of Science, IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerNatureLink, and Wiley—using combined keyword sets for “visual pollution” and “deep learning”. From an initial pool of 3,439 records, 1,207 were retained after title and abstract screening, and ultimately 26 articles (24 after full‑text verification) were included for detailed analysis.

The review reveals that most UVP studies focus on specific pollutant categories such as billboards, signage, waste dumps, and chaotic wiring, and they predominantly employ object‑detection architectures. YOLO variants (v3‑v8) appear in 23 of the 26 papers, reflecting a strong preference for models that balance detection accuracy with inference speed suitable for edge deployment. Faster R‑CNN is used in 11 studies, offering higher precision at the cost of two‑stage processing and greater computational demand. EfficientDet appears in five works, leveraging BiFPN and shared prediction heads to achieve competitive accuracy (up to 97 % mAP) while remaining lightweight. Transformer‑based models such as Swin‑Transformer are mentioned only in two papers, indicating that their adoption is still nascent due to data and resource requirements.

Dataset analysis shows a fragmented landscape: 18 studies created custom, location‑specific datasets, while 14 leveraged publicly available collections like TACO and Place Pulse 2.0. Most datasets are limited to single cities or regions, lack standardized labeling taxonomies, and therefore hinder cross‑city model generalization. The authors highlight the absence of a unified pollutant taxonomy; different papers treat the same visual element (e.g., a billboard) as distinct classes or merge disparate objects into a single “visual pollution” label. This inconsistency propagates to the calculation of VPI, which, to date, relies largely on expert‑driven Analytic Hierarchy Process (AHP) weighting schemes and paper‑based audit scores ranging from 0 to 100. Only a few recent works attempt data‑driven or hybrid scoring, but no consensus metric exists.

Beyond detection, the review identifies a modest number of application‑level implementations. Several studies integrate detection models into mobile apps, Raspberry‑Pi edge devices, or cloud‑based GIS dashboards, enabling citizen reporting, real‑time alerts, and decision‑support visualizations. However, these systems are often geographically skewed, with a concentration in a few developing and developed nations, and they rarely adopt a standardized VPI for comparative assessment.

Building on these findings, the authors propose a comprehensive real‑time UVP monitoring framework composed of five core components: (1) a standardized visual‑pollution taxonomy; (2) a cross‑city benchmark dataset that captures diverse urban contexts and pollutant types; (3) a generalized, lightweight deep‑learning model (e.g., a quantized YOLO‑v5 or EfficientDet‑D1) that can run on edge devices and be updated via federated learning; (4) a Visual Pollution Index that quantifies severity by aggregating weighted pollutant counts, contextual factors (e.g., proximity to residential zones), and temporal trends; and (5) a mobile‑cloud hybrid architecture that streams detections to a GIS‑based dashboard, visualizing VPI as choropleth heat maps for planners and the public.

The framework acknowledges practical challenges: data privacy and labeling costs, model drift and the need for continuous retraining, limited battery and compute resources on edge hardware, variability in lighting and weather conditions, and the difficulty of deriving objective VPI weights. To address these, the authors recommend (a) international consortiums to define taxonomy and share annotated data, (b) semi‑automatic labeling pipelines using weak supervision or active learning, (c) model compression techniques (pruning, quantization, knowledge distillation) for edge efficiency, (d) adaptive inference strategies that adjust confidence thresholds based on environmental cues, and (e) a hybrid VPI calculation that blends expert judgments with statistical measures derived from detection frequencies.

In conclusion, the review demonstrates that deep‑learning‑based UVP detection has matured technologically, yet the field suffers from fragmented data resources, lack of standard evaluation metrics, and limited integration into scalable, city‑wide monitoring systems. The proposed framework aims to bridge these gaps, offering a pathway toward a unified, data‑driven UVP management platform that can improve urban aesthetics, support sustainable city planning, and enhance the psychological well‑being of urban residents.

A Scoping Review of Deep Learning for Urban Visual Pollution and Proposal of a Real-Time Monitoring Framework with a Visual Pollution Index

💡 Research Summary

Comments & Academic Discussion

Leave a Comment