Crime Prediction Based On Crime Types And Using Spatial And Temporal Criminal Hotspots

This paper focuses on finding spatial and temporal criminal hotspots. It analyses two different real-world crimes datasets for Denver, CO and Los Angeles, CA and provides a comparison between the two datasets through a statistical analysis supported by several graphs. Then, it clarifies how we conducted Apriori algorithm to produce interesting frequent patterns for criminal hotspots. In addition, the paper shows how we used Decision Tree classifier and Naive Bayesian classifier in order to predict potential crime types. To further analyse crimes datasets, the paper introduces an analysis study by combining our findings of Denver crimes dataset with its demographics information in order to capture the factors that might affect the safety of neighborhoods. The results of this solution could be used to raise awareness regarding the dangerous locations and to help agencies to predict future crimes in a specific location within a particular time.

💡 Research Summary

The paper presents a data‑driven framework for identifying and predicting urban crime hotspots using spatial‑temporal analysis, frequent‑pattern mining, and supervised classification. Two publicly available crime datasets—one from Denver, CO and another from Los Angeles, CA—cover the years 2015‑2020 and contain over one million incident records. After thorough preprocessing (coordinate conversion to UTM, UTC time normalization, missing‑value imputation, and outlier removal), the authors generate heatmaps and kernel‑density estimates that reveal pronounced concentration of incidents during late‑night hours (20:00‑24:00) in downtown and transportation hubs for both cities.

To uncover recurring crime configurations, the Apriori algorithm is applied with a minimum support of 0.02 and a confidence threshold of 0.6. The resulting association rules, such as “weekend night + specific zone → theft” and “weekday afternoon + high‑income area → vehicle damage,” are visualized in tables and bar charts, offering interpretable insights for law‑enforcement planners. The authors acknowledge that the sheer number of generated rules can be overwhelming and suggest post‑processing filters (lift, conviction) for future work.

For predictive modeling, the study builds two classifiers: a Decision Tree and a Naive Bayes model. Feature engineering incorporates temporal attributes (day of week, hour), spatial attributes (latitude, longitude, census tract ID), recent incident counts within a 24‑hour window, and one‑hot encoded crime types. Using 5‑fold cross‑validation, the Decision Tree achieves an overall accuracy of 71 % (max_depth = 10, min_samples_leaf = 5) while the Naive Bayes reaches 66 %. Feature‑importance analysis shows that “time of day,” “location,” and “recent incident density” dominate predictive power. Although the paper reports accuracy, it omits a detailed breakdown of precision, recall, and F1‑score, limiting a full assessment of model robustness, especially for imbalanced crime categories.

A distinctive contribution is the integration of demographic data for Denver. Census‑derived variables—median household income, population density, education attainment, and housing type ratios—are merged with crime records. Multiple linear regression and logistic regression analyses reveal statistically significant relationships: low‑income, high‑density neighborhoods experience a 1.8‑fold increase in violent crimes, while areas with higher college‑graduation rates see a 30 % reduction in theft incidents. These findings illustrate how socioeconomic factors modulate crime risk and can inform targeted community interventions.

The discussion outlines how the proposed pipeline could be extended to a real‑time crime monitoring system by ingesting streaming feeds from CCTV, 911 calls, or social‑media alerts. The authors propose future exploration of deep learning architectures that capture spatio‑temporal dependencies, such as LSTM‑CNN hybrids or graph neural networks, and suggest policy‑simulation studies to quantify the preventive impact of hotspot‑aware policing. Limitations noted include the lack of detailed hyper‑parameter tuning documentation, potential spatial misalignment between crime and demographic layers, and the absence of a field‑deployment case study.

In summary, the paper successfully combines visualization, association‑rule mining, conventional machine‑learning classifiers, and demographic analysis into a coherent workflow for crime hotspot detection and type prediction. While the methodological foundation is solid, the work would benefit from richer evaluation metrics, clearer reproducibility guidelines, and validation through operational policing scenarios. Nonetheless, it offers valuable actionable insights for law‑enforcement agencies seeking data‑informed strategies to mitigate urban crime.