
Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369
- Most Read
- Most Cited
- Most Downloaded
Xu Yiwei , Zhou Ranran , Wang Yong
2026, 49(7):1-8.
Abstract:With the development of semiconductor technology, the number of transistors is growing exponentially, and voltage drop violations have become a key challenge in the electronic design and testing of very large scale integrated circuits. The dynamic voltage drop is highly dependent on the instantaneous current generated by the standard cell switching, therefore, it is of great significance to efficiently and accurately predict the dynamic currents of the standard cells. This article proposes a standard cell transient current prediction model based on LightGBM. Training data is obtained through SPICE, features are extracted, and cross validation and grid search methods are used to search for the optimal parameters of the model. The model can predict the dynamic current of a standard cell under different combinations of output toggle directions, power voltages, input transition times and output load capacitance. When using this method for modeling, there is no need for internal structural information of standard cells, and the modeling process is efficient and versatile. The experimental results show that in terms of model accuracy, the determination coefficients of the model for dynamic current prediction on various standard cells are all greater than 0.928. The model accuracy is superior to XGBoost and RFR methods; the generalization ability is superior to ANN and LSTM methods; in terms of running time, the modeling time and prediction time of this model are both shorter than those of ANN, LSTM and RFR methods. This model achieves a good balance in model accuracy and computing resources and can be used for dynamic current prediction caused by standard cell switching behavior, providing efficient and reliable support for dynamic voltage drop analysis and violation detection.
2026, 49(7):9-17.
Abstract:This paper addresses the issues of low computational efficiency and insufficient flexibility encountered during the encryption and homomorphic operations of the Paillier homomorphic encryption algorithm. We design and implement an acceleration scheme for the Paillier algorithm. Utilizing software-hardware Co-design technology, this scheme efficiently processes algorithmic precomputation, data interaction, and the requirements for parsing algorithm operations, thereby effectively enhancing its flexibility and reducing resource consumption. Furthermore, significant improvements in computational throughput and real-time performance are achieved through the customized design and implementation of a dual-high-radix Montgomery modular multiplication core. Test results demonstrate a significant acceleration effect on the algorithm′s critical computational steps. Under a 1 024-bit computational width, the average latencies for modular multiplication and modular exponentiation are approximately 0.523 and 667.42 μs, respectively. Compared to an Intel Core i9.13900HX processor, these latencies are reduced by approximately 68.74% and 42.76% (corresponding to speedups of 3.20× and 1.75×). The proposed scheme is capable of providing efficient privacy computation support for secure multi-party computation and federated learning.
Liu Yunting , Li Siwei , Feng Xinyue , Zhang Zhixing
2026, 49(7):18-27.
Abstract:Obtaining defective samples in industrial products is difficult, and the manifestations of defects are diverse. To better identify defects and improve detection accuracy, an anomaly detection model SPGAN based on GANomaly improvement is proposed. First, a SPAM dual attention module is designed, which realizes the joint perception of local defect texture and global spatial relationships through the synergistic mechanism of spatial attention (SAM) and position-aware attention (PAM). Second, an improved Inception module is introduced between the encoder and decoder to enhance the reconstruction ability of tiny defect features using multi-scale convolutional kernels. Finally, a deep discriminator network based on ResNet18 is constructed to strengthen the discrimination performance of abnormal features through residual connections. To verify the effectiveness of the improved network, a series of comparative experiments and ablation experiments were conducted using a self-made tire dataset. The experimental results show that the improved network has significantly improved detection and segmentation performance on the self-made tire defect image detection dataset, with an AUC value of 0.948 and an AP value of 0.885, an increase of 9% in AUC and 8.9% in AP compared to the original model. The experimental results demonstrate that this method has good application potential in the field of industrial defect detection.
Hong Qingqing , Zhang Zhizhong
2026, 49(7):28-39.
Abstract:In recent years, modulation recognition, as a key technology in wireless communication signal processing, has faced challenges such as insufficient open-set recognition capability and limited utilization of input correlation in deep learning models in complex open environments. To address these issues, this paper proposes a modulation signal open-set recognition method that integrates complex-valued attention and multi-dimensional loss functions. Specifically, this method introduces a multi-dimensional attention mechanism into the deep separable complex-valued network structure, effectively mining the correlation features between signal amplitude and phase, and achieves multi-loss fusion optimization through a decoder-assisted feature extraction, including smooth label cross-entropy, dynamic center constraint, and reconstruction error, to enhance the feature distribution discrimination and model generalization ability. Experiments on the public dataset RadioML2016.10a show that this method achieves a classification accuracy of 95% for known categories in the closed-set recognition task, and in the open-set recognition scenario, the recognition accuracy for known categories is 93%, the detection rate for unknown categories is 86%, and the overall open-set recognition performance is 89%. These results demonstrate excellent adaptability to open environments.
Ding Zhimin , Zhang Minjuan , Jing Ning , Li Linpeng , Ren Xiangxin
2026, 49(7):40-46.
Abstract:With the development of high-speed digital circuits and broadband communications, higher requirements have been placed on high-speed signal testing technology. As the core component of a sampling oscilloscope, the performance of the sampler directly determines the system bandwidth, and the pulse width of the trigger pulse is a key factor limiting the sampler bandwidth. Therefore, this paper designs a broadband microwave sampler based on ultra-narrow pulse triggering. By cascading a three-stage amplifier circuit with a step recovery diode, pulse waveform amplification and edge compression are achieved, generating a trigger pulse with an amplitude of >7 V and a falling edge of <150 ps. In the sampling gate circuit, short-line reflection technology is employed to further compress the pulse width to the picosecond level. For the comb-like spectral signal output by the sampling, a high-impedance integrating conditioning circuit is designed to achieve signal spreading and holding. Actual measurements show that this narrow pulse can effectively drive the sampler, achieving equivalent sampling within a 30 GHz bandwidth. The output sawtooth waveform envelope is consistent with the input signal, verifying the feasibility of the sampler in broadband microwave equivalent sampling.
Kang Xiaofei , Guo Hanyu , Jing Yiyang
2026, 49(7):47-54.
Abstract:With the wide application of multi-task learning in non-contact WiFi perception, how to simultaneously improve the accuracy of joint activity recognition and indoor positioning tasks and maintain a balance among tasks has become a key challenge. To this end, this paper proposes an improved MMoE method to achieve joint activity recognition and indoor localization tasks. This method designs a unified and shared feature extraction layer to enhance the expressive ability of the input features. By integrating XceptionTime and ResNet, a variety of experts are constructed. The former is suitable for extracting high-frequency dynamic features to improve the accuracy of activity recognition, while the latter is suitable for modeling low-frequency static features to enhance localization accuracy. It also introduces a dual-gate mechanism and regularization constraints, effectively balancing the differences between the two tasks while enhancing the overall performance. The experimental results show that the proposed method outperforms the existing representative models in both activity recognition and indoor localization tasks, demonstrating higher accuracy and stability.
Ren Shengtao , Wang Yang , Zha Sixi , Lin Nan
2026, 49(7):55-63.
Abstract:With the extension of the service life of polyethylene gas pipelines, defect detection has become the core issue for ensure safety. To solve the problem of missed detection and insufficient accuracy in identifying internal defects of PE gas pipelines, this paper proposes an improved YOLOv8 target detection model. A new C2f-KS module is designed that has been optimized by introducing Kolmogorov-Arnold Networks into the innovative structure of bottleneck. In addition, the attention mechanism EffectiveSE is integrated after the split operation to distinguish effective information in complex backgrounds and enhance target features extraction. The three detection heads of YOLOv8 are modified to four, and EefConv convolution is introduced to reduce model complexity and parameter count, thus enhancing the sensitivity to small targets and effectively reducing the missed detection and false detection rates for small target foreign bodies. Finally, to optimize the precise positioning of the bounding box, the loss function Inner-Shape IOU is used. The experimental results show that the accuracy of the improved algorithm on the pipeline defect data set is 94.0%, the recall rate is 90.7%, the average accuracy is 94.2%, and the model size is only 4.9 MB, which can fully meet the needs of real-time detection of inner surface defects of PE gas pipelines.
Liu Chao , Li Shuqing , Shen Yue , Liu Hui
2026, 49(7):64-73.
Abstract:To address the low pose estimation accuracy and unreliable information output in small unmanned surface vessels caused by complex water surface environments and low-frequency vibration interference, this paper proposes a pose estimation algorithm based on smoothed iterated error-state Kalman filtering. Under low-speed operating conditions, the algorithm employs an accelerometer to compensate and correct pitch and roll angles. In the data fusion process of micro-electro-mechanical system (MEMS) sensors, an improved fixed-interval smoothing algorithm is adopted, which utilizes the innovation from the next time step to perform backward smoothing corrections on the error state variables while conducting time-reversed inverse corrections, thereby reducing the interference of low-frequency linear vibrations on effective signals. The smoothed estimates are used to predict and correct the measurement values, with each time step′s innovation iteratively refining both the estimated and measured values to enhance overall pose estimation accuracy.Experimental results demonstrate that compared to the standard error-state Kalman filter,the proposed SIESKF algorithm reduces the root mean square errors (RMSEs) of roll, pitch, and yaw angles by 0.762 1°, 1.818 8° and 0.340 5°, respectively. Under normal water surface navigation conditions, the RMSEs of eastward, northward, and upward velocities decrease by 0.402 3, 0.239 4 and 0.116 5 m/s, respectively. Similarly, the RMSEs of eastward, northward, and upward positions are reduced by 0.148 4, 0.258 9 and 0.083 2 m. This algorithm can provide more precise pose information for USVs.
Wang Hailiang , Li Min , Liu Yahong , Hao Haixia
2026, 49(7):74-82.
Abstract:To improve the accuracy of short-term wind power prediction, a BiLSTM network integrating a single-head attention mechanism(SA) and an improved northern goshawk algorithm for parameter are proposed. Firstly, wind power data is preprocessed, and the correlation degree between each factor and wind power is calculated by using the Pearson correlation coefficient method. The factors high correlation degree are retained to improve the prediction accuracy of the model. Secondly, a single-headed attention mechanism is introduced to capture long-range dependencies in the sequence, which increases the generalization ability of the model. Finally, in view of the problem of difficult hyperparameter selection of BiLSTM, the improved northern goshawk algorithm which integrated refraction reverse learning initialization and the of positive cosine is used to optimize the three super parameters of the number of hidden units, the maximum training cycle and the initial learning rate in the model, and the INGO_LSTM_SA model is used to predict after obtaining the optimal parameters. Experimental verification is carried out through the data of a wind power station in Xinjiang. The coefficient of determination of the proposed model is 2.08% than that of the original BiLSTM network, and the root mean square error and the mean absolute error are reduced by 23.0% and 24.8% respectively.
2026, 49(7):83-91.
Abstract:Aiming at the issue that servo motor control systems require accurate motor dynamic parameters to improve controller accuracy, this study investigates the difficulty in achieving online mechanical parameter identification in the traditional speed loop PI control of PMSM, and proposes an STA online parameter identification strategy. First, the friction torque feedforward compensation is performed using Cm and Bm obtained by RLS identification. Then, amplitude extraction is conducted based on the SOGI-QSG algorithm. On the basis of offline parameter fitting using the recursive least squares method, the proposed online mechanical parameter identification strategy analyzes the frequency characteristics of the system′s speed loop. It realizes the self-tuning of PI parameters for the speed loop of the PMSM vector control system while achieving online inertia identification of the PMSM vector control system, with simulation experiments verified using Matlab Simulink. The research results show that the STA inertia identification strategy can realize online identification of the moment of inertia while adjusting the PI parameters of the PMSM speed loop, achieving a relative error of 0.352 9%. Compared with the RLS and EKF, the accuracy is improved by 4.33% and 17.3% respectively, which proves the effectiveness and practicality of this strategy.
Wang Wenxue , Qiao Yanjun , Kou Zhiwei , Cui Xiaoming , Ren Gang
2026, 49(7):92-102.
Abstract:Wind power output exhibits significant volatility and randomness, posing challenges to grid scheduling and wind power integration. To enhance forecasting accuracy, this paper presents CESF-Net, an ultra-short-term wind power prediction model that combines channel-wise EMA decomposition with a hybrid neural network. First, the model employs the densitybased spatial clustering of applications with noise (DBSCAN) algorithm to identify and remove outliers in the wind speed-power relationship, and uses linear interpolation to impute missing values. Second, CW-EMA is applied to decompose multivariate time series into trend and seasonal components along the channel dimension. Then, a time-series segmentation mechanism is introduced to enhance the model′s ability to capture local temporal structures, and a dual-stream network based on fast fourier transform attention and gated recurrent units is constructed to separately extract seasonal and trend features. Finally, the outputs of the dual streams are concatenated to generate the prediction through CESF-Net. Experiments conducted on real wind farm datasets demonstrate that, for the 15-minute forecasting task, the CESF-Net model outperforms commonly used models by 18.78%, 11.11% and 0.26% in terms of MAE, RMSE and R2, respectively. For the 60-minute forecasting task, although the prediction accuracy of all models decreases, CESF-Net still achieves improvements of 2.98%, 2.74% and 0.61% in the respective metrics.
Song Tengfei , Ayiguzhali Tuluhong , Xu Zhisen , Chang Qingpu
2026, 49(7):103-110.
Abstract:To address the issues of neutral-point potential balancing, current ripple, and common-mode voltage in T-type three-level inverters, an adaptive model predictive control strategy is proposed. This strategy enhances control accuracy by tackling the problem of low precision caused by model parameter variations in conventional model predictive control applications. Through system identification and real-time updating of parameter changes, the impact of system disturbances on model predictions is reduced. MATLAB simulations were conducted to validate the proposed method; the inverter output current exhibited a total harmonic distortion of 0.75%, the maximum neutral-point voltage fluctuation was 2.6 V, and the common-mode voltage peak was 2 V. These results indicate that, compared with conventional model predictive control, the proposed approach significantly enhances the control accuracy of neutral-point potential balancing, output-current ripple and common-mode voltage in T-type three-level inverters.
Cui Kebin , Lyu Siyi , Yang Liran
2026, 49(7):111-122.
Abstract:Drone inspection is one of the important methods for power transmission line detection. To address the issues of large target scale variations, difficulty in detecting small targets, and the inability to effectively capture defect details in complex scenarios with existing power transmission line detection algorithms, an improved YOLO11 model, HCDNet-YOLO11, is proposed. Firstly, the HCDNet network is designed to replace the original feature pyramid network, reducing the model′s parameter count and enhancing its expression of features at different scales. Secondly, the MulCAA attention module is constructed, which extracts key information through a dual-branch structure of average pooling and max pooling, reducing information loss and weakening the interference of complex backgrounds on detection targets through long-range pixel dependencies. Finally, the RepConv reparameterization convolution is introduced to implement the Rep_C3k2 module, enabling the model to introduce a larger receptive field during the training phase, enhancing its nonlinear feature modeling ability. Experimental results show that the HCDNet-YOLO11 algorithm has increased the accuracy by 1.9%, recall by 6.5%, and mAP50 by 5.7% on the self-built power transmission line dataset, with a 24.42% reduction in parameters. The algorithm attains good performance on the premise of reducing the number of parameters. On the public VisDrone2019 dataset, the HCDNet-YOLO11 algorithm has increased mAP50 by 6.5% and 4.6% on the val set and test set, respectively, verifying its strong generalization ability in complex aerial scenarios.
Liu Bo , Shi Gang , Zhao Wei , Wang Yu , Li Yongqing
2026, 49(7):123-131.
Abstract:Based on the research on non-isothermal flow of molten steel in the tundish, and in response to the practical demand for monitoring multi-point temperature of molten steel in the tundish under production conditions, a distributed continuous temperature measurement system embedded in the tundish for molten steel was developed to address the problem that the existing temperature measurement system in the continuous casting workshop cannot accurately and dispersedly monitor multi-point temperature. Firstly, an innovative measurement method using a through-wall blackbody cavity was proposed. The weak signal generated by the photovoltaic cell is collected with low power consumption and resistant to high-temperature leakage interference. A customized insulation box is designed through thermal conduction analysis to resist the thermal shock of the permanent layer. Finally, a comparative test was conducted on the continuous temperature measurement performance of the system. The results indicate that the blackbody cavity structure of the temperature sensing probe can sensitively detect temperature changes and the temperature measurement node can accurately collect pA level current signals. The static power consumption of the temperature measurement node is around 60 μA, and the thermal insulation box can last for 12 h to ensure that the internal temperature does not exceed 50℃. By comparing and analyzing data with high-temperature furnaces and thermocouple temperature measurement systems, it can be concluded that the overall system′s continuous temperature measurement performance is stable and accurate. In summary, the system can monitor the multi-point temperature of molten steel in the tundish, providing practical system level support for studying the non isothermal flow of molten steel in the tundish.
Li Xiang , Arkin Hamdulla , Wang Lulu
2026, 49(7):132-142.
Abstract:To address the limitations in model decoding performance caused by the low spatial resolution and high inter-subject variability of motor imagery EEG signals, this paper proposes a novel DBTNet model based on an attention-enhanced dual-branch convolutional network and a temporal multi-scale attention mechanism. The model employs a dual-branch convolutional network to extract multi-scale spatiotemporal features and integrates an efficient multi-scale attention mechanism to enhance the extraction of spatial features from EEG signals. Subsequently, a temporal multi-scale attention mechanism is applied to capture both local features and global dependencies under different receptive fields, thereby obtaining more comprehensive feature representations. Finally, a classifier is used to fuse the extracted features for efficient decoding. In subject-dependent evaluations, the proposed model achieves a four-class classification accuracy of 86.57% with a Kappa coefficient of 0.821 0 on the BCI Competition IV-2a dataset, and a two-class classification accuracy of 88.95% with a Kappa coefficient of 0.779 0 on the BCI Competition IV-2b dataset. Experimental results demonstrate that the DBTNet model achieves superior model decoding performance.
2026, 49(7):143-150.
Abstract:To address the limitations of dynamic time warping and its conventional weighted variants—namely, insufficient sensitivity to critical time periods and suboptimal weight allocation mechanisms—in the context of urban traffic state time-series measurement, this paper proposes an enhanced DTW method based on Gaussian function weighting, termed Gaussianweighted DTW, to improve the discriminative accuracy of trajectory similarity measurement. The proposed approach constructs a time-dependent multi-peak Gaussian weighting function that integrates the quasi-Gaussian distribution characteristics of urban traffic flow and prior knowledge of peak-hour patterns into the sequence alignment process, thereby nonlinearly amplifying the contribution of peak traffic periods in distance computation. Experiments are conducted using taxi GPS trajectory data from Chengdu, from which hourly origin-count time series are constructed on a spatial grid basis. Coupled with the K-Medoids clustering algorithm, the performance of GS-WDTW is quantitatively evaluated against DTW and WDTW using the silhouette score and Calinski-Harabasz index. Results show that, at the optimal cluster number K=4, GS-WDTW achieves a 39.9% and 13.1% improvement over DTW, and a 41.1% and 13.0% improvement over WDTW, in terms of SS and CHI, respectively, demonstrating significantly enhanced capability in identifying spatiotemporal characteristics of travel patterns. Spatial analysis further confirms a high degree of consistency between the clustering results and the actual urban functional zone distribution. The GS-WDTW method thus enables more precise capture of critical nonlinear features in time-series data, offering a valuable reference for state perception, resource optimization, and functional zone identification in intelligent transportation systems.
Wu Shengbiao , You Tao , Gui Jiazheng
2026, 49(7):151-160.
Abstract:To address issues such as low prediction accuracy, poor real-time performance, and weak model generalization in lower limb joint angle prediction, this study proposes a method based on VMD-Informer and surface electromyography signals (sEMG). First, sEMG signals and corresponding joint angle data were collected from subjects during walking and stair-climbing patterns.To enhance the stability of raw data, the variational modal decomposition (VMD) algorithm was applied to decompose the EMG signals. The negative gradient and adaptive particle swarm optimization (NGSPSO) algorithms were used to optimize two key parameters of VMD: The number of Intrinsic modal function (IMF) components and the penalty factor. Next, multi-domain features were extracted from each IMF. Principal component analysis (PCA) was applied to identify key factors within the feature sequences, thereby reducing the model′s input dimension. Finally, the Informer model was employed for dynamic temporal modeling of the multivariate feature sequences. Experimental results demonstrate that the proposed VMD-Informer model achieves RMSE values of 2.688 5° and 3.351 6° for hip and knee joints, respectively, in flat walking scenarios, and 3.508 8° and 3.856 2° for the stair-climbing scenario, respectively. Compared to VMD-Transformer, this represents an 18% reduction in average error, significantly enhancing prediction accuracy and system real-time performance. This provides a technical foundation for recognizing movement intentions in rehabilitation exoskeletons.
Zheng Fangyan , He Xiaohai , Qing Linbo , He Haibo , Teng Qizhi
2026, 49(7):161-170.
Abstract:The accurate extraction of core regions is of great significance for tasks such as digital core construction and intelligent reservoir evaluation. However, core images often suffer from complex backgrounds, blurred edges, and multi-scale structural distributions, posing significant challenges to automated segmentation. To address these issues, this paper proposes a core object extraction algorithm based on an improved UNeXt architecture, aiming to enhance the model′s segmentation performance for core regions. The method effectively strengthens the model′s ability to represent edge details and global contextual features by introducing CBAM or EMA modules at different network levels. Simultaneously, a multi-scale feature enhancement module is designed and incorporated into the network neck to further improve the model′s perception of multi-scale structures and complex textures. Additionally, given the current lack of publicly available datasets dedicated to core region extraction, this paper independently constructs a relevant core image dataset. Experimental results show that, compared to the original UNeXt network, the proposed algorithm achieves improvements of 1.49%, 2.06% and 0.75% in mIoU, F1-score and mPA, respectively, while the MSE decreases by 78.63%. Statistical tests confirm that these improvements are all significant. To validate the model′s generalization capability, comparative experiments were also conducted on two public medical image datasets, BRISC 2025 and EBHI-Seg. The results demonstrate that the proposed algorithm performs excellently on both the self-constructed dataset and the public datasets.
You Yihui , Zhang Lixin , Zhu Linglong
2026, 49(7):171-180.
Abstract:Single-image desnowing is an important subtask in the field of image restoration. Its primary challenges lie in snow particle occlusion and snow-fog blur, which degrade image quality and affect the performance of downstream visual tasks. To address the limitations of existing methods in feature modeling and expert selection adaptability, a single-image desnowing model named SynergyRestorer was proposed. The model is based on a complementary mixture of experts and an agreement-biased sub-network routing scheme. A complementary mixture of experts decoder was designed to capture complementary information across multi-dimensional features by combining specialized and cooperative experts, thereby enhancing the model′s representation capacity. An agreement-biased sub-network router was also introduced to fuse multi-source features and incorporate agreement signals. It dynamically balanced coordination and conflict among features, improving the discriminative and adaptive capacity of expert selection. Experimental results showed that the proposed method achieved an average PSNR of 33.71 dB and SSIM of 0.950 on three benchmark datasets: CSD, Snow100K and SRRS. The results validate its effectiveness in complex snowy scene restoration tasks.
Cui Bobin , Yi Junkai , Tan Lingling
2026, 49(7):181-189.
Abstract:Addressing the issues of insufficient object detection accuracy in UAV aerial images caused by factors such as high proportion of small targets, large scale differences among targets, and complex backgrounds, and considering the limited computational power and power consumption of edge devices, this paper proposes an improved object detection algorithm called EGD-YOLO based on YOLOv8n. First, a P2 layer for small target detection is added while the P5 layer for large target detection is removed, and the shallow channel expansion strategy is adopted to enhance the feature representation capability for small targets. Secondly, a global hierarchical fusion architecture cascading Multi-scale feature fusion and weighted feature fusion was designed to achieve efficient propagation and deep integration of cross-scale semantic information in the neck network. Finally, a DyHead dynamic detection head with multiple attention mechanisms is employed to further optimize the model′s small target detection performance. Experiments on the VisDrone2019 dataset demonstrate that the proposed EGD-YOLO achieves improvements of 12.0% in mAP@0.5 and 8.6% in mAP@0.5:0.95 over the baseline while maintaining a clear computational advantage; results on the DOTA dataset further confirm its strong generalization capability, providing an effective solution for small-object detection in UAV aerial imagery.
2026, 49(7):190-202.
Abstract:This paper proposes a lightweight object detection model based on YOLOv10n to improve the accuracy of rail surface defect detection and enhance the recognition of small targets. The model incorporates C2f_CGBlock into the P3 and P4 layers of the backbone network to strengthen local context perception and feature representation. The feature fusion part uses RepGFPN and integrates SimAM into some feedback paths to emphasize critical features. The training process adopts Inner-SIoU loss function to optimize localization accuracy. Experimental results on a rail surface defect dataset showed that the improved model outperformed the original one, with improvements of 3.38%, 3.72%, 3.55% and 4.01% in Precision, Recall, F1 and mAP@0.5. The model demonstrates clear advantages over the baseline in detecting small-size defects and challenging backgrounds. It effectively enhances the performance of rail defect detection while maintaining a balance between accuracy and real-time efficiency, and has good potential for engineering applications.
Hu Xinru , Lyu Xiaoqi , Gu Yu
2026, 49(7):203-214.
Abstract:Chest diseases are important in early diagnosis, and the existing X-ray image classification methods have poor classification results due to the problems of insufficient information interaction in feature extraction and difficulty in recognizing small lesions. To this end, a chest X-ray image disease classification network FFA-Net based on attention mechanism and multi-scale feature fusion is proposed. First, the network effectively captures the global context information in horizontal and vertical directions through task crossing attention module to enhance the interaction between features; second, the network fuses the feature information at different scales by constructing a multi-branch extraction module so that its deeper features can focus on the subtle pathology regions identified in the shallow features; finally, a multi-frequency semantic attention module. Comprehensive experiments on the proposed method were performed on the CheX-ray14 dataset, which showed a mean AUC value of 0.856 4 and an AUC value of 0.973 4 for hernias; and generalization experiments were performed by ablation experiments as well as on the two datasets, CheXpert and COVID-19 Radiography Database. The data show that the average AUC value on the CheXpert dataset is 0.811; the average Accuracy on the COVID-19 Radiography Database dataset is 0.956 0. Compared with the current popular classification networks, FFA-Net has better feature extraction ability and classification effect.
Zhang Hong , Chen Xiaotong , Xu Yongyan , Gao Xicheng , Wang Yuanbin
2026, 49(7):215-225.
Abstract:To address the issues of low accuracy and poor robustness in existing behavior detection models caused by complex underground backgrounds, large variations in miner behavior scales, and frequent occlusions, an improved RT-DETR-based unsafe behavior detection method for miners is proposed. The proposed method constructs a backbone network, CANet, featuring multi-path feature extraction and a dual-branch downsampling structure. By effectively fusing deep and shallow features while preserving edge details, CANet enhances the model′s ability to perceive fine-grained behavior details in complex backgrounds.Meanwhile, a Diffusion-Aware Feature Pyramid Network (DAFPN) is designed by integrating a dimension-aware selective integration module with a cross-layer diffusion strategy, forming a two-stage fusion-diffusion mechanism to strengthen semantic interactions among multi-scale behavior features. This design significantly improves the model′s adaptability to diverse postures and large-scale variations.In addition, a variable kernel convolution module (AKConv) is introduced, which dynamically adjusts sampling positions to enable the network to focus adaptively on key behavior regions under occlusion, thereby enhancing the robustness of miner behavior detection.Experimental results show that the improved RT-DETR model achieves 92.9% mAP@0.5 and 66.1% mAP@0.5:0.95, improving by 2.9% and 1.9% over the original model, while reducing parameters by 18% and computational cost by 13%. Compared with mainstream detection algorithms such as Faster R-CNN, SSD, YOLOv5m, YOLOv8m and YOLOv10m, the proposed model demonstrates superior overall performance, validating its effectiveness and engineering applicability for unsafe behavior detection in complex coal mine environments.
Li Rui , Ao Yinhui , Chen Xinsheng , Li Guishen
2026, 49(7):226-235.
Abstract:Aiming at the problems of insufficient local detail capture ability and high computational complexity of the LSTR algorithm in complex road scenarios, this paper proposes a dynamic multi-path covariance Transformer detection model DMCTR. Firstly, a feature boosting and suppression module is constructed. Through feature enhancement and suppression operations, the problem of missed detection of weak features such as curved lanes and dashed line segments in traditional convolution is alleviated. Secondly, construct a dynamic enhancement dual-path aggregation block, utilize deformable convolution to adapt to lane geometric deformation, and combine the dual-attention mechanism to enhance local geometric features. Finally, the cross-covariance attention is introduced into the Transformer architecture to replace the multi-head self-attention in the Transformer encoder. The experimental results show that on the TuSimple dataset, the accuracy rate of the method proposed reaches 96.74%, which is 0.56% higher than that of the baseline LSTR model. In the complex scenarios of the CULane dataset, the F1 has increased by 4.48%, and the detection accuracy in special scenarios such as blurred lane lines and strong light at night has been significantly improved. This enables the model to maintain real-time performance (353 fps) while effectively solving the feature modeling bottleneck of traditional methods in complex scenarios.
Xiong Ming , Li Hongyi , Lyu Kelin , Liu Yuxin
2026, 49(7):236-244.
Abstract:In the application scenarios of industrial robot automation, the existing target detection algorithms have problems such as low detection accuracy when dealing with targets with large scale variations, poor occlusion processing effect and insufficient real-time performance. This paper designs and proposes the YOLOV11n-RLW algorithm based on the YOLOv11n benchmark model. Specific improvements include: Adopting the RepViT backbone network to replace the traditional feature extraction network, enhancing the feature extraction capability; incorporate the LA-CBAM attention mechanism to address the issue of the lack of spatial features in the SE module and enhance multi-scale feature fusion; replace CIoU with the Wise-IoU loss function to improve the regression accuracy. On the VisDrone2019 and KITTI datasets, this model achieved a 38.4% mAP50 at 260 fps, with only 2.24 M of parameters. Compared with the benchmark model, the real-time performance is improved by 6%, the recognition rate is increased by 5%, and the number of parameters is reduced by 13.6%. This algorithm effectively solves the problems of multi-scale target detection, occlusion processing and insufficient real-time performance. It meets the requirements of industrial scenarios for detection speed and accuracy, and is suitable for the engineering application of high-precision industrial robot target detection systems.

Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369