
Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369
- Most Read
- Most Cited
- Most Downloaded
Jiang Chenxin , Xiao Hao , Xu Han , Zhu Jiaoyang
2025, 48(19):1-9.
Abstract:The Hough Transform is a commonly used method for detecting lines and has excellent interference resistance and robustness. However, due to its high computational complexity and large storage requirements, deploying the Hough Transform on hardware is challenging. This study proposes an improved Hough Transform algorithm based on the concept of hierarchical Hough Transform. The algorithm decomposes a single Hough Transform into two transformation operations. The first operation involves downsampling the image, which reduces the storage demand of the first-level voting unit. The storage range of the second operation′s voting unit is limited by the parameters obtained from the first operation, effectively addressing the issue of high storage requirements for hardware deployment. Moreover, by improving the Hough Transform algorithm using trigonometric transformation formulas, each transformation can be designed with parallel pipelining, enhancing computational efficiency. A hierarchical pipelined Hough Transform hardware architecture based on FPGA has been implemented. Experimental results show that the proposed architecture reduces on-chip RAM resource usage by 89.8% compared to the classic Hough Transform hardware architecture, and the detection accuracy is improved by 39.94%. At a clock frequency of 100 MHz, it takes 13.11 ms to detect lines in a 1 024×1 024 image, which is a significant improvement over the speed of software-based Hough Transform line detection.
Tang Dinglong , Wang Heng , Feng Jiao , Xie Shijun
2025, 48(19):10-17.
Abstract:For dynamic satellite communication systems with adjustable user coding and modulation modes, if the coding and modulation mode selected by the user is inappropriate, the bandwidth will be limited when the system allocates resources. In order to solve this problem, the system allocates modulation and coding modes according to the actual business needs of each user, and improves the utilization efficiency of system resources through the joint allocation of power and coding and modulation modes. This paper first analyzes the links of satellite communication systems, considers the construction of a mathematical model for the joint allocation of power and coding and modulation modes, and proposes a hybrid gravitational search and particle swarm optimization algorithm to solve this problem. In order to improve the performance of the proposed algorithm in the constrained objective optimization function, a dynamic inertia weight coefficient is introduced to avoid the algorithm from falling into the local optimal trap, and a penalty function mechanism is added to process the constrained objective optimization function to achieve the optimization of the objective function. The final simulation results show that compared with a single particle swarm optimization algorithm or a gravitational search algorithm, the hybrid gravitational search particle swarm optimization algorithm designed in this paper reduces the total second-order service rejection of the system and effectively improves the total capacity of the system.
Wang Chaoyong , Ge Junxiang , Wang Jie , Liu Hengchao , Zhou Jiacheng
2025, 48(19):18-24.
Abstract:This paper presents the design of a low-profile, polarization-reconfigurable broadband substrate integrated waveguide (SIW) antenna based on the SIW structure, with a thickness of approximately 2.5%λ. By introducing a circular slot on the SIW cavity, two radiation resonant frequencies are excited. The adjustment of the via positions within the SIW cavity allows for the control of these two resonant frequencies, thereby extending the operating bandwidth of the low-profile antenna. Additionally, by controlling the RF diodes mounted on the surface of the SIW antenna, the two radiation resonant frequencies can be manipulated, enabling the reconfiguration of the antenna′s left-hand and right-hand circular polarizations. Simulation and experimental results demonstrate that the low-profile SIW antenna, with a thickness of only 2.5%λ, achieves a relative operating bandwidth of 6.4%. Within the frequency range of 5~5.3 GHz, the antenna gain exceeds 5 dBi. Furthermore, the antenna can achieve reconfiguration between left-hand and right-hand circular polarizations within the frequency range of 5.17±0.035 GHz, with a polarization isolation better than 15 dB.
Yang Xuhong , Ding Chuanhao , Qian Fengwei , Xu Qingguo , Wang Shun
2025, 48(19):25-35.
Abstract:In order to improve the disturbance immunity and speed tracking performance of modular multilevel inverters in the face of external disturbances, an MMC control strategy based on improved terminal sliding mode auto disturbance rejection control is proposed. Firstly, the mathematical model of MMC is established, and then the Cascaded nonlinear Extended state observer are designed to estimate the perturbations in real time and add them into the system model. And the linear error state feedback controller adopts terminal sliding mode control to provide feedback based on the observed perturbations and state errors, which compensates the impact of perturbations on the system performance and improves the closed-loop stability and robustness of the system. Meanwhile, in order to avoid the singularity problem, the sliding film surface adopts the integral terminal sliding mode form, and in order to reduce the chattering problem of the traditional sliding mode control, the convergence law adopts a new type of variable gain exponential convergence law, which can weaken the phenomenon of the chattering caused by the high gain of the discontinuous term. Finally, Matlab is used to conduct simulation experiments, and the grid-connected current of the proposed control strategy is stabilized at 0.015s, and the THD value is 1.85%, which meets the grid-connected conditions. The proposed control strategy is compared with the terminal sliding mode control, adaptive control and PI control through the power mutation and grid voltage dips. The experimental results show that the proposed control strategy has good anti-interference performance and speed tracking performance.
Hu Yu , Li Denghua , Ding Yong
2025, 48(19):36-43.
Abstract:Monocular visual measurement suffers from a lack of depth information, making it difficult to accurately calculate the three-dimensional deformation of cracks. To address this issue, this paper proposes a three-dimensional crack measurement method that integrates multiple coordinate systems. By designing a concentric circle array target and establishing an equivalent displacement model for cracks, the problem of measuring the three-dimensional deformation of cracks is transformed into the problem of measuring the three-dimensional deformation of the main and auxiliary plates of the target. By taking photos with a camera, density clustering and eccentricity correction are performed on the images to obtain a set of feature points. Then, the EPnP algorithm is used to obtain the projection matrix, and the least squares method is used to reconstruct the sub panel point set in three dimensions. The coordinates of the sub panel point set in a given world coordinate system are obtained, and the change in the front and rear coordinates of the sub panel, that is, the three-dimensional deformation value of the crack, is calculated. The accuracy, robustness, and generalization ability of the proposed method were verified through three-axis sliding table tests. The results showed that the algorithm had a maximum deviation of 0.35 mm under indoor conditions, and the measurement errors in all three directions were within ±0.35 mm, can still maintain a measurement accuracy of ±0.4 mm under on-site test conditions,meeting the requirements of crack measurement standards (±0.5 mm).
Deng Guang , Yao Jiangyun , Wang Kuantian , Chen Guoqing
2025, 48(19):44-50.
Abstract:When a wheel-legged robot is crossing an obstacle, its dynamic model will become highly nonlinear due to the wheel-legged switching factor, but the existing linear control methods are difficult to accurately describe this nonlinear characteristic, resulting in poor control effect of the robot. Therefore, an obstacle crossing control method for wheel-legged robot with high motion performance in complex environment is proposed. On the basis of in-depth analysis of the stress conditions in the obstacle crossing process of the wheel-legged robot, the method takes the angular velocity of the motor controlling the wheel leg movement of the wheel-legged robot as the key control object, and further analyzes the gait of the wheel-legged robot during the obstacle crossing process to obtain the obstacle crossing position error of the wheel-legged robot, which is input into the fuzzy cascade PID controller. At the same time, adaptive scaling factor is introduced to optimize PID controller parameters to adapt to the nonlinear dynamic characteristics of the wheelleg robot in the obstacle crossing process, and the angular velocity adjustment of the wheelleg robot motor is dynamically generated according to the complex obstacle environment to realize the obstacle crossing control of the wheelleg robot. Experimental verification shows that this method can achieve obstacle crossing of wheellegged robots under different steps, slopes, gullies and multiple compound obstacles. In the control process, the slip rate of wheellegged robots can be less than 0.1%, the operational stability is higher than 95%, and the change trajectory of the centroid of wheellegged robots is relatively gentle, which fully proves that this method can achieve the control stability of wheellegged robots over obstacles. It can effectively promote the development of wheel-leg robot in obstacle crossing field.
Ran Yejun , Jin Liangqiong , Luo Shuxia , Li Qiongyi , Tao Yong
2025, 48(19):51-59.
Abstract:In the field of fine-grained vehicle recognition, deep learning faces a challenge: various new car models are constantly being introduced, but my ability to collect and annotate data is limited, which can lead to the problem of "small sample class incremental learning". In response to the above challenges, this article proposes a new method based on prompt based small sample class incremental learning, aiming to enable the model to recognize existing categories and learn new categories with a small number of new vehicle category samples, without the need for retraining or relying on a large amount of raw data. This method combines the advantages of prompt mechanisms and pre trained visual transformer (ViT) models. We have designed two types of prompts-domain prompts and FSCIL prompts-to address the challenges in FSCIL. In class incremental learning, the average accuracy of Stanford Cars and CompCars datasets reached 70.47% and 73.56%, respectively, which is superior to current existing methods.
Wu Weilin , Pan Zhiqiang , Lin Meihuan , Huang Hongben
2025, 48(19):60-68.
Abstract:A consensus control method based on distributed observer is proposed for a class of multi-agent systems with actuator faults and external disturbances. This method estimates the state and fault information of the system through a distributed observer, and designs a consensus control protocol, which can estimate the state and fault of the system under the condition of actuator failure and external disturbance, and realize the fault-tolerant consensus of the system. Firstly, based on the output information of the agent subsystem and the state estimation information of the adjacent subsystems, a distributed fault observer is constructed, and a new Lyapunov function is designed to obtain a sufficient condition for the global dynamic stability error convergence of the system. It is proved that the gain matrix of the designed observer can effectively estimate the system state and fault information. Secondly, based on the fault estimation results, a consensus control protocol based on output feedback is proposed, which can effectively compensate for actuator faults and suppress the influence of external disturbances on system stability. Finally, the feasibility and effectiveness of the proposed method are verified by the simulation experiment of the UAV system. The results show that the designed control method can achieve fast convergence and fault-tolerant consistency of the system, and has good robustness.
2025, 48(19):69-76.
Abstract:The traditional binocular Semi-Global Matching (SGM) algorithm is computationally complex and demands significant computational resources, making it challenging to meet the real-time processing and low-power requirements of small-scale embedded systems. To address this issue, this paper proposes an improved solution based on FPGA architecture, aiming to enhance the real-time performance, resource utilization, and reduce resource overhead of the stereo SGM algorithm. The improved SGM algorithm adjusts the direction of the cost aggregation to align with the data flow direction of the FPGA, enabling four-path parallel computation. In the disparity calculation phase, a binomial-based subpixel interpolation technique is introduced, allowing disparity computation and optimization to proceed simultaneously, thus reducing computation delay and further reduce resource consumption and system power usage. Experimental results show that, compared to the traditional SGM algorithm, the proposed method reduces the average disparity error by 32.4%, improves the LUT resource utilization by 45%, decreases resource consumption by 25%, achieves a matching rate of 65.3 fps, and maintains a system power consumption of only 2.85 W, meeting the requirements for small-scale real-time embedded systems.
Chang Shengjun , Hao Runfang , Cheng Yongqiang , Yang Kun , Bai Yunpeng
2025, 48(19):77-85.
Abstract:NOx produced during biomass combustion in thermal power plants causes serious environmental pollution. Accurate prediction of NOx emissions is crucial to reducing environmental pollution. The NOx emission prediction model established based on traditional data-driven methods does not extract deep feature information sufficiently and has poor robustness. To address the existing problems, a hybrid prediction model for NOx emissions based on flame imaging, SADAE-MSViL, is proposed. First, a self-attention mechanism is introduced into the adversarial denoising autoencoder to extract deep features of the image and effectively remove noise interference. Secondly, a multi-scale feature fusion mechanism with a combination of scales of 8 and 16 is designed to fully capture the flame frequency domain information of image blocks at different scales. Finally, by improving Linformer and integrating the gated low-rank attention mechanism, the NOx emission prediction accuracy is improved while ensuring the operating efficiency of the model. Experimental results show that the R2 of the model reaches 0.98 and the RMSE is 3.0. The prediction accuracy is better than other models, showing high robustness and reliability.
Ling Yuxin , Zhang Tianqi , Sun Haoyuan , Zou Han
2025, 48(19):86-94.
Abstract:In high-speed satellite communication systems, the problem of inter-symbol interference (ISI) becomes increasingly prominent as the data transmission rate increases. To effectively mitigate this issue, this paper proposes a blind equalization algorithm that first performs timing synchronization followed by a dual-mode switching mechanism. The proposed algorithm initially employs timing synchronization to resample the received signal, ensuring accurate symbol boundary alignment. Subsequently, a dual-mode switching strategy is implemented: first, the Modified Constant Modulus Algorithm (MCMA) is applied for preliminary equalization, accelerating convergence and ensuring the correct convergence direction; then, the algorithm switches to an improved Decision-Directed (DD) equalization scheme to achieve superior steady-state performance. By integrating timing synchronization with the dual-mode switching mechanism, this approach leverages the rapid convergence property of MCMA and the high-precision equalization capability of the DD algorithm, making it particularly suitable for the complex and dynamic channel conditions encountered in satellite communications. Through comparative analyses of constellation diagrams, mean squared error (MSE), and ISI across different modulation schemes and channel conditions, simulation results demonstrate that the proposed algorithm significantly enhances constellation clarity and compactness. The MSE is reduced by up to 18.77 dB, while ISI suppression reaches a maximum improvement of 14.32 dB, thereby significantly improving the overall performance of the communication system.
Zhang Shuzhao , Peng Liqiang , Guo Akang , Wang Lixin
2025, 48(19):95-105.
Abstract:Aiming at the problems of low detection accuracy and false detection and missed detection of small arcing in the existing pantograph arcing detection model, a lightweight pantograph arcing detection algorithm RIL-YOLO based on improved YOLOv8 is proposed. Firstly, combined with RepConv module and GhostNet idea, a lightweight feature extraction module RELAN is designed to reduce the amount of parameters and calculations while maintaining the performance of the model for arc feature extraction. Secondly, aiming at the problem of small arc missed detection, a small target detection module is added, and a weighted bidirectional feature pyramid network structure is used to achieve a higher level of feature fusion, so as to improve the detection ability of the model for small targets. In order to solve the problem that the computational cost of the small target detection module is greatly increased, the neck network is reconstructed, the reconstructed IBiFPN structure only increases the computational complexity by 0.3G while ensuring the accuracy of the model. Finally, a lightweight detail enhancement detection head is designed to replace the YOLOv8 detection head, which improves the model ′s ability to capture detailed features while reducing model parameters. The research results show that compared with the YOLOv8n model, the RIL-YOLO model has an average accuracy of AP@0.5 and AP@0.5:0.95 increased by 5.2% and 3.7%, respectively, when the number of model parameters is reduced by 66% and the calculation amount is reduced by 13.6%. The detection speed reaches 112.4 fps,which can effectively realize rapid and accurate detection of ignition arc. The method provides theoretical method reference for real-time detection of pantograph arc.
Li Zhangpei , Zhang Tianqi , Sun Haoyuan , Zhong Yang
2025, 48(19):106-114.
Abstract:Currently, the subcarrier modulation recognition methods in non-cooperative multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems suffer from insufficient recognition accuracy under low signal-to-noise ratio (SNR) conditions and limitations in recognizing higherorder modulation schemes. To address these issues, this paper proposes a modulation recognition algorithm based on feature fusion. First, the received signal undergoes preprocessing. Subsequently, the in-phase and quadrature (I/Q) components of the signal are extracted, and multiple features—including wavelet transform, fourth-power spectrum, higher-order cumulants, and zero-centered normalized instantaneous amplitude—are computed as input features. These features are then fed into a neural network for training. Finally, the modulation scheme of the MIMO-OFDM subcarriers is classified. Experimental results demonstrate that the proposed algorithm effectively recognizes six modulation types: BPSK, QPSK, 8PSK, 16QAM, 64QAM and 128QAM, achieving a recognition accuracy of 90% at an SNR of 6 dB.
Wang Shuai , Shen Jiewen , Xu Bin , Zhu Zhendong
2025, 48(19):115-125.
Abstract:Accurately predicting the Dynamic Line Rating of overhead transmission lines is crucial for ensuring safe line capacity expansion. Traditional prediction models, which rely on manual experience for selecting hyperparameters, often struggle to effectively reduce the volatility of DLR, leading to suboptimal prediction accuracy. To address this issue, this study innovatively proposes an SSA-VMD-LSTM-based prediction method. This approach deeply integrates the global optimization capability of the Sparrow Search Algorithm, the multi-scale data decomposition characteristics of Variational Mode Decomposition, and the temporal modeling advantages of Long Short-Term Memory networks, constructing a hierarchical artificial intelligence prediction model. First, the powerful search ability of SSA is employed to iteratively optimize the hyperparameters of VMD, obtaining the optimal hyperparameters. Subsequently, VMD is used to decompose the DLR data into multiple scales, yielding a series of components with different central frequencies but local stationarity. On this basis, separate LSTM models are established to predict each component. Finally, the prediction results of all components are aggregated to produce the final prediction. Experimental results demonstrate that, compared to several traditional prediction models, the proposed method achieves at least a 4.78% improvement in prediction accuracy, fully validating its effectiveness and superiority in DLR prediction.
Xu Jitong , He Jieying , Wang Xinbiao
2025, 48(19):126-133.
Abstract:A digital signal spectrum analysis system design based on polyphase decimation filtering and parallel FFT architecture is proposed to address challenges in ground-based microwave detection systems, such as weak ozone absorption peaks easily masked by noise and limitations in data input rates versus processing speeds in backend digital spectrometers. The method employs polyphase decomposition for input data decimation filtering, followed by parallel FFT processing on multi-channel data to derive the signal power spectrum. Its performance was validated through simulations and experimental atmospheric detection on a dedicated testbed. Results demonstrate that, within a reasonable observation period, this approach significantly suppresses noise, achieving a mean channel sensitivity of 2.77 K with an inter-channel sensitivity standard deviation of 0.2 K, reflecting the system response function′s stability. By maintaining spectral resolution while ensuring exceptional sensitivity, the proposed spectrometer meets practical detection system requirements.
2025, 48(19):134-143.
Abstract:The integrity and freshness of sample data directly determine the generalization ability and prediction accuracy of machine learning model. As the core data source in the open environment, the network can provide wide coverage and high real-time sample support for model training. However, the dynamic, complexity and scale of network data sources cause the traditional acquisition methods to face the severe challenges of low development efficiency and high maintenance cost. Through the analysis of the current mainstream collection framework, it is concluded that selenium collection framework has the advantages of high development efficiency and strong dynamic support ability compared with other collection frameworks. Therefore, this paper innovatively proposes a lightweight script language design method for network sample data collection, and constructs a lightweight script syntax system based on type 3 regular grammar based on Chomsky hierarchy theory, which is further optimized on the basis of selenium. In order to make the script syntax proposed can be used in practice, this paper implements a hierarchical syntax parser that supports multi-threaded asynchronous execution, which can be dynamically translated into standardized selenium code. By abstracting the native selenium API, binding dynamic intelligent waiting and other mechanisms, this scheme significantly reduces the cost of development and maintenance, and greatly improves the efficiency of developers. At the same time, through the acquisition task of two kinds of DOM structure difference scenarios, it is verified that in the satellite orbit parameter (TLE) acquisition task, compared with the traditional selenium scheme, the script language designed in this paper can reduce the amount of code by more than 85%, the maintenance cost after page structure change can be reduced by more than 70%, and the increase of average acquisition delay can be ignored. This research provides a lightweight solution for efficient data acquisition in a highly dynamic network environment. The simplified script language is more conducive to the training and reasoning of large language model LLM in the future, and realizes the automatic generation of sample data acquisition tasks.
Wu Jiabin , Yang Xiaoming , Cao Taiqiang
2025, 48(19):144-152.
Abstract:To fully integrate multi-dimensional emotional information from EEG signals and improve emotion recognition performance, this paper proposes a network model based on multi-attention and multi-feature fusion. The model combines the asymmetry of the brain hemispheres and the spatial, spectral, and temporal characteristics of EEG signals, performing feature extraction through parallel dual-input pathways. A parallel attention mechanism is used to enhance the expression of frequency channels and spatial information, while the size of convolution kernels is adjusted through dynamic kernel selection. Additionally, depthwise separable convolutions are employed to further extract and compress features. Finally, the temporal dependencies and global associations between features are captured through fusion in the Transformer encoding layer, enabling emotion classification. In three-class experiments on the SEED dataset, the model achieved an average accuracy of 98.53%, demonstrating the superiority of this approach. Furthermore, visual analysis of the attention module further enhances the interpretability of the model.
Ye Zilong , Wang Sainan , Liu Wugang , Song Junbai , Zhou Wei
2025, 48(19):153-160.
Abstract:Thermal protection tile composites are widely used in the aerospace field due to their excellent high-temperature resistance. However, internal defects are prone to occur during service, posing potential threats to aircraft safety. Therefore, conducting nondestructive testing research on thermal protection tile composites holds significant engineering importance.This study systematically investigates the nondestructive testing of thermal protection tile materials with prefabricated hole defects based on a THz-TDS system. A reflective terahertz scanning system was used to acquire time-domain signals from the samples, and the Savitzky-Golay filtering algorithm was introduced to optimize signal denoising. Key feature parameters were then extracted to construct imaging. To address the limitations of single-parameter imaging in defect detection accuracy, an innovative detection method combining wavelet image fusion and the Canny edge detection algorithm was proposed.Experimental results demonstrate that this method not only achieves a 100% recognition rate for all prefabricated hole defects but also controls the detection error within 0.5 mm for hole defects with diameters of 10 mm and 5 mm, with a relative error not exceeding 6%. This high-precision detection method provides technical support and methodological reference for the intelligent development of defect detection in thermal protection tile composites.
Hu Shuaichen , Zhang Peng , Cui Min
2025, 48(19):161-167.
Abstract:Traditional stitching methods perform poorly in complex scenes, and supervised methods face challenges due to the difficulty of annotating data. Existing unsupervised image stitching methods suffer from large model parameters and long stitching times. Therefore, a lightweight unsupervised deep learning-based image stitching framework is proposed, which consists of two stages: an unsupervised image deformation network and an unsupervised image fusion network. In the image deformation network, MobileNetV2 is used as the backbone, combined with the ECA attention mechanism module to obtain image deformation information. The image fusion module employs UNeXt as the backbone network to generate seamless stitching by identifying the seam lines in the overlapping regions of the images. The accuracy is improved by incorporating the AG module and enhancing the tokenized MLP module. Additionally, due to the lack of datasets for underwater image stitching, a real-world unsupervised underwater image stitching dataset is constructed. Comparative experiments are conducted on this dataset and the publicly available UDIS-D dataset, evaluating SIFT+Ransac, ORB+Ransac, UDIS, and UDIS++ algorithms. The experimental results demonstrate that the proposed algorithm reduces the model parameters by 74% and improves stitching speed by 46% while maintaining stitching accuracy.
Bao Liuzhen , Jia Wei , Zhao Xuefen , Kong Defeng , Jiang Haifeng
2025, 48(19):168-182.
Abstract:Breast cancer whole slide image classification is critical for accurate diagnosis. However, existing pseudo-label-based multiple instance learning methods suffer from low-quality pseudo-labels and suboptimal selection of hard negative instance ratios. To address these issues, this paper proposes a multiple instance learning method combining frequency domain features and dynamic hard negative instance screening. First, a multi-scale frequency domain feature encoding module is designed, which enhances high-frequency details and complex texture representations through frequency domain residual connections and cross-layer feature fusion. Second, a dual-branch bag prediction module is proposed to dynamically adjust instance weights via an attention mechanism, mitigating feature dilution caused by heterogeneity and improving pseudo-label generation quality. Finally, a dynamic hard negative instance pseudo-label mining strategy is introduced, progressively increasing the proportion of hard negative instances to enhance the model’s ability to capture discriminative features. Experimental results on the Camelyon and TCGA-BRCA datasets demonstrate significant improvements: ACC, AUC, Precision, and Recall increased by 3.15%、1.72%、3.06%、2.12% and 2.32%、2.79%、2.22%、2.22%, respectively. These advancements validate the effectiveness of the proposed method.
Cao Hongbo , An Weisheng , Liang Haipeng , Lin Qiang
2025, 48(19):183-192.
Abstract:Road surface defects impact traffic safety, road durability, and driving comfort. To address the low detection accuracy of complex features such as cracks and potholes in existing methods, this paper proposes YOLOv8n-Edge—a road defect detection algorithm based on YOLOv8n with enhanced edge features. RFAConv is integrated into the backbone to enlarge the receptive field while avoiding kernel parameter sharing. An Edge Enhance Conv module is introduced to fuse high-frequency details with the input, reinforcing feature representation. Additionally, the Manet-Star structure, combining Manet and Starnet, replaces parts of the C2f module to boost feature extraction. A shallow-layer auxiliary branch, Sub-GEIM, generates multi-scale edge features that are fused with corresponding detection heads to improve localization accuracy. Experimental results show that YOLOv8n-Edge achieves a mAP@50 of 72.1% on the preprocessed RDD2022 dataset—an improvement of 3.3% over the baseline—while only slightly increasing model complexity. Its effectiveness is further validated through generalization and comparative experiments.
Gu Suhang , Wang Ye , Zhang Yuanpeng , Jiao Zhuqing
2025, 48(19):193-204.
Abstract:The human visual system often focuses on the key features and structures of the target while weakening non-target areas when processing external information. In addition, in classical CNN models, noise in the image that propagates layer by layer may interfere with the representation of key information of the target, resulting in inaccurate feature extraction. Therefore, this article proposes an image classification method based on the dual fuzzy attention mechanism, named DFAM-CNN. Specifically, for the feature maps output by CNN convolutional layers, fuzzy channel attention mechanism and fuzzy spatial attention mechanism were first designed by introducing fuzzy logic technology. These two mechanisms were used to map and transform the feature maps along both the channel direction and spatial direction for generating important fuzzy feature maps that correspond to the original feature maps. Then, the channel weights of all feature maps and the weights of each element within each feature map were calculated based on all the determined important fuzzy feature maps, thereby highlighting the features related to the target in both the channel and spatial directions. Finally, dimensionality reduction was performed on the feature maps through fuzzy aggregation operations while retaining target-relevant features. To validate the effectiveness of DFAM-CNN, extensive experiments were conducted on both the public MedMNIST dataset and application-specific datasets. The experimental results validated the effectiveness of DFAM-CNN. Notably, compared with traditional max-pooling method, DFAM-CNN achieved accuracy improvements of 8.67% and 7.40% on the BreastMNIST and DermaMNIST subsets, respectively.
Liang Liequan , Li Xiang , He Yonghua , Zhou Xuan
2025, 48(19):205-216.
Abstract:Object detection is one of the key technologies for intelligent perception in autonomous parking systems in the era of autonomous driving. The perception process of fisheye cameras faces several challenges, including complex environmental factors, a diverse range of obstacles, and image distortion of detection targets under fisheye lenses. Conventional algorithms struggle to maintain high detection accuracy for various objects in complex parking scenarios. To address this, this paper proposes a rotation-based object detection method using an improved YOLOv10n model. The approach introduces the SPPELAN module into the backbone network, and utilizes DSConv to enhance the C2f module by improving the convolution fusion of the iRMB. This improves feature extraction capability under fisheye lenses and enhances the localization ability of small objects. Additionally, an ATFL function is employed to strengthen the model’s focus on target features. Experimental results demonstrate that the improved algorithm achieves a mAP@0.5 of 89.89% and a mAP@0.5:0.95 of 69.36% on the fisheye camera parking dataset, outperforming the baseline model by 0.62% and 0.6%, respectively. This provides new insights into the development of parking perception technologies.
Zhang Qian , Wu Yulu , Zheng Bingjie , Dong Jie , Yang Guan
2025, 48(19):217-224.
Abstract:Multi-organ lesion detection is of great clinical significance. However, lesions in different anatomical regions vary significantly in size and shape, and in CT images, lesion areas are typically small and similar to surrounding tissues, which increases the difficulty of detection. To address these challenges, this paper proposes an improved multi-organ small lesion detection algorithm based on the Salience-DETR model.Firstly, an Efficient Spatial-Channel Collaborative Attention (ESCA) mechanism is designed to reconstruct the multi-scale features extracted by the backbone, enhancing the model’s focus on important lesion information. Secondly, the DenseASPP and AugFusion modules are incorporated to optimize the cross-layer token fusion network, improving multi-scale feature fusion across different levels. Finally, an Inner-GIoU loss function is introduced to accelerate model convergence and improve the detection performance for small lesions.Experimental results show that, under the condition of 0.5 to 4 false positives per image, the improved model achieves average detection sensitivities of 83.26% and 82.33% on the public DeepLesion dataset and an external validation set, respectively. These results demonstrate that the proposed algorithm achieves high detection accuracy and good generalization performance for multi-organ small lesion detection, with promising potential for real-world clinical applications.

Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369