Abstract:To address the issues of single feature extraction and low detection accuracy in current malicious URL detection models when handling URLs with complex structures and diverse character combinations, this paper proposes a malicious URL detection model based on multi-scale attention feature fusion. First, Character Embeddings and DistilBERT are employed to encode characters and words separately, capturing both character-level and word-level feature representations in URL strings. Next, an improved convolutional neural network (CNN) is used to extract multi-scale character structural features and word-level semantic features, while a bidirectional long short-term memory (BiLSTM) network is employed to further extract deep sequence features. Additionally, an innovative attention feature fusion (AFF) module is introduced to dynamically fuse multi-scale features at both the character and word levels, effectively reducing information redundancy and enhancing the extraction of long-range sequence features. Experimental results show that the proposed model outperforms other baseline models, with accuracy improvements ranging from 0.32% to 4.7% and F1 score improvements from 0.46% to 5.5%, achieving excellent detection performance on datasets such as ISCX-URL2016.