Focal Inverse Distance Transform Maps for Crowd Localization

Dingkang Liang Wei Xu Yingying Zhu Yu Zhou

Huazhong University of Science and Technology
Beijing University of Posts and Telecommunications

Accepted by IEEE Transactions on Multimedia (TMM)

Arxiv PyTorch

Abstract

In this paper, we focus on the crowd localization task, a crucial topic of crowd analysis. Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for the crowd localization task. Compared with the density maps, the FIDT maps accurately describe the persons' locations without overlapping in dense regions. Based on the FIDT maps, a Local-Maxima-Detection-Strategy (LMDS) is derived to effectively extract the center point for each individual. Furthermore, we introduce an Independent SSIM (I-SSIM) loss to make the model prone to learning independent region representation at the feature level, better recognizing local maxima. Extensive experiments demonstrate that the proposed method reports state-of-the-art localization performance on six crowd datasets and one vehicle dataset. Additionally, we find that the proposed method shows superior robustness on the negative and extremely dense scenes, which further verifies the effectiveness of the FIDT maps.

Focal Inverse Distance Transform Maps for Crowd Localization

Dingkang Liang Wei Xu Yingying Zhu Yu Zhou

Huazhong University of Science and Technology Beijing University of Posts and Telecommunications

Accepted by IEEE Transactions on Multimedia (TMM)

Arxiv PyTorch

Abstract

Demo of FIDTM

Huazhong University of Science and Technology
Beijing University of Posts and Telecommunications