Abstract:To enhance the persistent tracking capability of drones for moving objects and overcome the limitations of single-drone systems, this paper proposes a multi-drone multi-object tracking method that leverages collaborative perception. The approach integrates multi-view projection and the spatiotemporal topology of objects. By utilizing the positional and attitude data of the drones and their onboard photoelectric pods—without relying on image features—rapid projection between views is achieved through a consistency constraint between drone pose and object height. This enables preliminary object association under dynamic, complementary perspectives from multiple drones. Furthermore, bidirectional association matching is performed using the spatiotemporal topological features of objects from different viewpoints. Spatial and temporal cues refine the initial associations, improving crossview object matching accuracy and enhancing tracking robustness in occluded scenarios. Focusing on occlusions during various drone maneuvers such as climbing, descending, circling, and rapid motion, a dedicated multi-drone multi-object tracking dataset (DP-MDMT) incorporating pose data was constructed. Experiments in real-task scenarios show that the proposed method achieves recall, precision, and multi-device association (MDA) score of 60.2%, 85.6%, and 47.1%, respectively, on the DP-MDMT dataset, representing improvements of 6.4%, 13.1%, and 7.4% over the MIA-Net algorithm. The tracking metrics, including the multiple object tracking accuracy (MOTA) and ID F1-score (IDF1) reach 80.1% and 85.1%, respectively, with an average processing efficiency of 29.7 fps, meeting the real-time requirements for multi-drone ground object tracking.