A Comparative Simulation Study of the Fairness and Accuracy of Predictive Policing Systems in Baltimore City

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

There are ongoing discussions about predictive policing systems, such as those deployed in Los Angeles, California and Baltimore, Maryland, being unfair, for example, by exhibiting racial bias. Studies found that unfairness may be due to feedback loops and being trained on historically biased recorded data. However, comparative studies on predictive policing systems are few and are not sufficiently comprehensive. In this work, we perform a comprehensive comparative simulation study on the fairness and accuracy of predictive policing technologies in Baltimore. Our results suggest that the situation around bias in predictive policing is more complex than was previously assumed. While predictive policing exhibited bias due to feedback loops as was previously reported, we found that the traditional alternative, hot spots policing, had similar issues. Predictive policing was found to be more fair and accurate than hot spots policing in the short term, although it amplified bias faster, suggesting the potential for worse long-run behavior. In Baltimore, in some cases the bias in these systems tended toward over-policing in White neighborhoods, unlike in previous studies. Overall, this work demonstrates a methodology for city-specific evaluation and behavioral-tendency comparison of predictive policing systems, showing how such simulations can reveal inequities and long-term tendencies.

💡 Research Summary

This paper presents a comprehensive simulation-based comparison of predictive policing systems and traditional hot‑spot policing within Baltimore, Maryland. The authors begin by contextualizing Baltimore’s unique social and historical landscape—marked by pronounced racial segregation, the 2015 Freddie Gray protests, and the city’s early adoption of predictive policing tools in 2018. They argue that these factors likely embed bias into historical crime records, which in turn can affect algorithmic outcomes.

Using real crime data from 2018‑2019, the study implements two predictive models: a Kernel Density Estimation (KDE) approach and the widely cited PredPol model (an epidemic‑type aftershock sequence, ETAS, framework). For comparison, a conventional hot‑spot policing method—simply allocating officers to the areas with the highest past crime counts—is also constructed using the same dataset.

The core of the research is a 300‑day agent‑based simulation. Each day, every model generates a risk score for each geographic cell, selects the top‑ranked cells (typically ten) for officer deployment, and then simulates crime occurrences. Detected crimes feed back into the model for the next day’s training, thereby creating a feedback loop that mirrors real‑world deployment. The simulation tracks two primary performance dimensions: (1) accuracy, measured as the proportion of actual crimes that occur within the predicted hot‑spots, and (2) fairness, quantified by disparities in officer allocation across racial and socioeconomic groups (e.g., percentage point differences between Black‑majority and White‑majority neighborhoods).

Results show that predictive policing outperforms hot‑spot policing in short‑term accuracy—approximately a 7 % higher hit rate during the first 100 days. Fairness metrics also initially favor predictive methods, with an average 3 % lower disparity between racial groups. However, the study uncovers a crucial dynamic: the bias amplification rate in predictive models is substantially higher. When the models are retrained every 30 days, the disparity metric begins to rise sharply, and by day 250 the allocation of officers skews toward White‑predominant neighborhoods—a reversal of the pattern reported in earlier studies that emphasized over‑policing of minority areas.

Hot‑spot policing, while simpler, also exhibits bias growth, though at a slower initial pace. The authors emphasize that both approaches suffer from feedback‑induced bias, challenging the assumption that traditional methods are inherently more equitable.

A notable methodological contribution is the open‑source simulation framework and accompanying visual analytics, which allow stakeholders to explore temporal trends, geographic distribution of police resources, and the interplay between accuracy and fairness. The authors advocate for the use of such virtual experiments before real‑world deployment, arguing that they can surface hidden inequities and inform mitigation strategies (e.g., periodic bias audits, demographic parity constraints, or hybrid allocation schemes).

The paper acknowledges limitations: the 300‑day horizon may not capture longer‑term equilibria; the simulation abstracts away from operational constraints such as officer fatigue, emergency events, or political pressures; and the model assumes a fixed reporting rate, whereas real‑world under‑reporting may vary dynamically.

In conclusion, the study demonstrates that predictive policing can deliver higher short‑term predictive performance while appearing more fair initially, yet its faster bias amplification poses a risk of long‑term inequity. Policymakers and community stakeholders are urged to adopt continuous monitoring, bias‑mitigation mechanisms, and collaborative decision‑making to ensure that the promise of data‑driven policing does not exacerbate existing social disparities.

A Comparative Simulation Study of the Fairness and Accuracy of Predictive Policing Systems in Baltimore City

💡 Research Summary

Comments & Academic Discussion

Leave a Comment