Abstract:Pantograph-catenary arc faults pose a serious threat to the safe and stable operation of high-speed railway and urban rail transit systems. The intense light, high temperature, and electromagnetic interference generated by these arcs accelerate the wear of catenary components, shorten their service life, and may even trigger power supply system failures, leading to serious safety incidents. Traditional visible-light-based pantograph-catenary arc detection methods are susceptible to environmental interferences such as illumination variations, occlusions, and adverse weather conditions, leading to reduced detection accuracy and robustness and thereby limiting their applicability in complex online monitoring scenarios. This paper proposes a multimodal imaging-based arc detection method that integrates visible, infrared, and acoustic signals to enhance performance in complex scenes. Initially, the acoustic signals of arc are collected by a microphone array and transformed into time-frequency matrices. Subsequently, a variational inference-based noise suppression strategy is introduced to attenuate environmental background noise while preserving arc-related acoustic information. Building on this, time-domain beamforming is employed to achieve acoustic source imaging and energy focusing, yielding acoustic intensity maps. The acoustic images are then registered and spatially aligned with visible and thermal imagery to obtain a multimodal representation of arc morphology. The registered images are then fed into a multimodal object detection model to produce arc locations and confidence scores, thereby completing the detection and localization of the arc fault. To evaluate the proposed method, an acoustic propagation model and an experimental platform have been established to analyze the propagation characteristics of arc sources and systematically verify the impact of the noise-suppression strategy on signal-to-noise ratio and imaging performance. The experimental findings demonstrate that, in comparison with single visible-light modality and visible/infrared bimodal schemes, the proposed multimodal imaging fusion method enhances recognition accuracy by 15.9% and 8.1%, respectively, thus providing an effective solution for robust online detection of pantograph-catenary arcs.