Consistent-Point: Consistent Pseudo-Points for Semi-Supervised Crowd Counting and Localization

Consistent-Point: Consistent Pseudo-Points for Semi-Supervised Crowd Counting and Localization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Crowd counting and localization are important in applications such as public security and traffic management. Existing methods have achieved impressive results thanks to extensive laborious annotations. This paper propose a novel point-localization-based semi-supervised crowd counting and localization method termed Consistent-Point. We identify and address two inconsistencies of pseudo-points, which have not been adequately explored. To enhance their position consistency, we aggregate the positions of neighboring auxiliary proposal-points. Additionally, an instance-wise uncertainty calibration is proposed to improve the class consistency of pseudo-points. By generating more consistent pseudo-points, Consistent-Point provides more stable supervision to the training process, yielding improved results. Extensive experiments across five widely used datasets and three different labeled ratio settings demonstrate that our method achieves state-of-the-art performance in crowd localization while also attaining impressive crowd counting results. The code will be available.


💡 Research Summary

This paper tackles the problem of noisy pseudo‑labels in semi‑supervised crowd counting and localization, where the teacher model’s predictions on unlabeled images change dramatically during training. Building on the mean‑teacher paradigm and the point‑localization network P2PNet, the authors identify two sources of inconsistency in pseudo‑points: (1) position inconsistency caused by the regression branch, and (2) class inconsistency caused by the classification branch. To mitigate these issues, they introduce two lightweight modules.

The Position Aggregation (PA) module addresses position inconsistency. For each positive pseudo‑point generated by the teacher, PA collects its four nearest auxiliary proposal points (which are usually suppressed as negatives) and averages their coordinates. This simple averaging reduces the variance of the predicted locations, effectively smoothing the pseudo‑points and making them more stable across training iterations.

The Instance‑wise Uncertainty Calibration (IUC) module tackles class inconsistency. It uses the classification scores output by the teacher’s classification branch as confidence weights for each pseudo‑point. Points with high confidence retain full influence in the student’s loss, while low‑confidence points are down‑weighted, thereby suppressing ambiguous instances that would otherwise introduce noisy supervision.

The overall training pipeline works as follows: labeled images are supervised with the standard P2PNet losses (localization and classification). Unlabeled images are fed to the teacher model; its raw pseudo‑points are first refined by PA, then weighted by IUC, and finally used as supervision for the student model. After each training step, the teacher’s parameters are updated as an exponential moving average (EMA) of the student’s parameters, as in classic mean‑teacher methods.

Experiments are conducted on five widely used crowd datasets (ShanghaiTech A/B, UCF‑CC‑50, JHU‑CROWD++, NWPU‑CROWD) under three labeling ratios (1 %, 5 %, 10 %). The proposed Consistent‑Point method consistently outperforms previous state‑of‑the‑art semi‑supervised approaches (e.g., CU, SAL, OT‑M) in both localization (F1‑score, precision, recall) and counting (MAE, MSE). Notably, with only 1 % labeled data, Consistent‑Point improves localization F1‑score by 4–6 % and reduces counting MAE by 0.8–1.2 compared to the best baselines. Ablation studies show that PA alone improves positional stability, IUC alone improves class reliability, and their combination yields the greatest gains.

Both modules add negligible computational overhead because they consist of simple averaging and scalar weighting operations, preserving the efficiency of the original P2PNet. Qualitative visualizations demonstrate that PA‑smoothed pseudo‑points form tighter clusters around true heads, while IUC effectively filters out uncertain points near dense or ambiguous regions.

In summary, Consistent‑Point offers a principled way to enhance pseudo‑label quality in semi‑supervised point‑based crowd counting, enabling strong performance even when labeled data are extremely scarce. The approach is generic and could be extended to other semi‑supervised tasks that rely on point or keypoint supervision. The authors plan to release their code and explore more sophisticated auxiliary point selection and uncertainty estimation techniques in future work.


Comments & Academic Discussion

Loading comments...

Leave a Comment