Abstract:Synthetic Aperture Radar and optical images capture surface features through distinct dimensions, providing highly complementary information for land classification research with significant application value. However, existing MCANet-CM algorithms struggle to effectively capture target contours in multimodal data during cross-modal feature interaction, resulting in insufficient spatial detail representation of fused features for object boundaries in complex scenarios. This makes the effective integration of dual-modal data for precise pixel-level classification remain a critical challenge. To address this issue, this paper proposes an enhanced multimodal remote sensing image semantic segmentation algorithm based on improved MCANet-CM. The algorithm introduces the DyCPCA attention mechanism, which dynamically calibrates inter-channel dependencies to adaptively enhance feature responses related to target contours, thereby significantly improving the model′s capability to capture fine-grained information from multimodal data. Simultaneously, a Rectangular Self-Calibration Module is incorporated, which constructs an asymmetric receptive field structure to strengthen the model′s perception of edge information across different orientations, markedly enhancing localization accuracy for foreground objects. Through the synergistic operation of these two modules, effective fusion of optical and SAR data is achieved. Experiments on the WHU-OPT-SAR dataset demonstrate that compared with the baseline MCANet-CM model, the improved model achieves 2.85% and 2.81% enhancements in mean Intersection over Union and mean F1-score, respectively. When compared with state-of-the-art algorithms like FTransUNet, the proposed model also exhibits superior segmentation performance.