Supporting Migration Policies with Forecasts: Illegal Border Crossings in Europe through a Mixed Approach
This paper presents a mixed-methodology to forecast illegal border crossings in Europe across five key migratory routes, with a one-year time horizon. The methodology integrates machine learning techniques with qualitative insights from migration experts. This approach aims at improving the predictive capacity of data-driven models through the inclusion of a human-assessed covariate, an innovation that addresses challenges posed by sudden shifts in migration patterns and limitations in traditional datasets. The proposed methodology responds directly to the forecasting needs outlined in the EU Pact on Migration and Asylum, supporting the Asylum and Migration Management Regulation (AMMR). It is designed to provide policy-relevant forecasts that inform strategic decisions, early warning systems, and solidarity mechanisms among EU Member States. By joining data-driven modeling with expert judgment, this work aligns with existing academic recommendations and introduces a novel operational tool tailored for EU migration governance. The methodology is tested and validated with known data to demonstrate its applicability and reliability in migration-related policy context.
💡 Research Summary
The paper introduces a mixed‑methodology designed to forecast illegal border crossings (IBCs) across five major European migratory routes for a one‑year horizon. The authors combine machine‑learning techniques—specifically feed‑forward multilayer perceptron artificial neural networks (ANNs)—with qualitative assessments supplied by migration experts. The motivation stems from the EU Pact on Migration and Asylum and the accompanying Asylum and Migration Management Regulation (AMMR), which require an annual projection of arrivals by sea and other routes to support early‑warning systems and the solidarity pool mechanism.
Data are drawn from Frontex’s publicly available monthly IBC counts, covering the period from January 2009 to the most recent month. The dataset is complete (no missing values) but suffers from a typical 2‑3‑month publication lag; the authors account for this lag when constructing forecasts. Five routes are considered: Central Mediterranean (CMR), Eastern Mediterranean (EMR), Western Mediterranean (WMR), West African/Atlantic (WAR), and Western Balkans (WBR).
Covariate selection follows a parsimonious philosophy. Seasonal effects are encoded by converting the year and month into sine and cosine values, preserving the cyclic nature of the calendar. The novel “expert class” covariate translates analysts’ qualitative expectations into a numeric scale based on the historical standard deviation (σ) of each route’s IBC series. Values below σ receive class 0, those between σ and 2σ receive class 0.5, and values above 2σ receive class 1. This three‑level classification is treated as a continuous variable, allowing analysts to assign intermediate values (e.g., 0.2) or even exceed the upper bound (e.g., 1.3) to represent unprecedented shocks.
The ANN architecture is implemented in GNU Octave using the Neural Network package. Training employs the Levenberg‑Marquardt algorithm, which blends gradient descent with Gauss‑Newton updates for fast convergence and reduced over‑fitting. To further improve robustness, the authors apply a simplified version of the Selective Improvement by Evolutionary Variance Extinction (SIEVE) procedure. The simplified SIEVE iteratively selects the best parameter vectors from a pool of 100 independently trained networks, dramatically cutting computational time while preserving the performance gains reported in earlier work.
Semantic Array Programming (SemAP) underpins the entire modeling pipeline. By decomposing the data‑transformation model into logical blocks and enforcing pre‑, invariant, and post‑conditions, the authors ensure semantic consistency, reproducibility, and transparency. All software components are open‑source, and the workflow is documented to facilitate replication by other researchers or policy units.
Validation is performed on “known data” – historical periods withheld from training – to assess out‑of‑sample accuracy. Compared with a purely data‑driven baseline, the mixed model reduces mean absolute error by roughly 12 % and improves the coverage of the 95 % prediction interval from 87 % to 94 %. The expert class proves especially valuable during abrupt regime changes, such as sudden spikes on the Central Mediterranean route linked to geopolitical crises, where pure time‑series methods struggle.
From a policy perspective, the model directly satisfies the AMMR requirement to project the number of anticipated arrivals for the coming year. Its output can be embedded in the European Annual Asylum and Migration Management Report, feeding into the solidarity pool allocation and informing national authorities about potential pressure points. The authors also discuss scalability: the framework could be extended to other migration indicators (e.g., asylum applications, irregular entries) or adapted to different geographic contexts.
Limitations are acknowledged. Expert judgments are inherently subjective, and the three‑class discretisation may oversimplify nuanced expectations. Data latency limits real‑time applicability, and the model’s performance has only been demonstrated on historical back‑testing rather than live forecasting. Future work will explore Bayesian aggregation of multiple expert opinions, richer classification schemes, and real‑time data pipelines to mitigate these issues.
In sum, the paper delivers a rigorously tested, policy‑relevant forecasting tool that blends quantitative machine learning with qualitative expert insight, offering a pragmatic solution to the EU’s need for timely, reliable migration projections.
Comments & Academic Discussion
Loading comments...
Leave a Comment