
Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369
- Most Read
- Most Cited
- Most Downloaded
Chen Boxuan , Huang Junyi , Gong Pingping
2026, 49(6):1-9.
Abstract:To address the lack of real-time monitoring for wind turbine pitch control system power supplies and the cumbersome replacement procedures during failures, this paper proposes a hot-swappable dual-module switching power supply system based on microcontroller control. This solution enables faulted power supply replacement without system shutdown. This solution adopts a modular dual-power-supply redundant architecture. The power supply is designed based on a flyback circuit with modular components, featuring a wide input voltage range of 20~80 V and a stable output of up to 24 V/3 A. It integrates an STM32 microcontroller and TPS2491 hot-swap chip, enabling automatic switching during power failures and supporting rapid hot-swap replacement of faulty units. A monitoring platform developed using the Bootstrap5 framework was established to achieve intelligent power supply monitoring and management. Experimental results demonstrate that the system achieves short switching times (≤10 ms under full load conditions) and minimal voltage dips during power failures. Concurrently, the monitoring platform enables real-time power status monitoring, fault alerts and operational data analysis, thereby enhancing the power supply reliability of the variable pitch system.
Zhao Dongdong , Li Yudong , Li Xuejuan , Zhao Wenzhe , Wang Fuhao
2026, 49(6):10-19.
Abstract:In order to maintain the high efficiency operation of the LLC resonant converter, the LLC resonant converter usually works near the resonant frequency, which makes the converter gain range narrow. To address this problem, this paper proposes a topology of primary-side Buck-LLC cascade converter and secondary-side special full-bridge rectifier, which is capable of realizing a wide range of voltage gains. The primary side of this topology adopts a synergistic control strategy of the front-stage Buck unit control and the back-stage LLC resonant converter, Namely, the front-stage realizes the closed-loop voltage stabilization function by PWM modulation, and the back-stage adopts the open-loop of the LLC to work at the point of resonance frequency. The overlapping conduction control method is introduced at the vice-side, and the voltage gain is adjusted by adjusting the overlapping duty cycle of the rectifier bridge switching tubes, so that the system can automatically switch the operation mode according to the output voltage, and the system can realize a 3-fold gain extension range. Theoretical derivation shows that all switching tubes of the system realize soft switching in a wide gain range. Combined with the state plane trajectory diagram, the voltage gain equation and soft-switching boundary conditions are derived. To validate the proposed scheme, an experimental prototype with DC300 V input and DC20-60 V/500 W output is built, and the experimental results and analysis verify the correctness and effectiveness of the system topology and control strategy.
2026, 49(6):20-28.
Abstract:Focused on the issue that multimodal emotion recognition in conversation (MERC) is difficult to effectively capture cross-modal semantic associations in conversation rounds and has limited discrimination ability for minority classes and semantically confusing classes of emotions, a new multimodal sentiment analysis model (FuseNet) is proposed. This model adopts the bidirectional attention dialogue encoder (BiDRN) to capture the context dependency of the dialogue, effectively integrates audio and visual cues from different speakers, and realizes dynamic multimodal fusion through the Hi-gated fusion module based on the hierarchical gated mechanism. Meanwhile, class-aware multimodal contrastive (CAMC) loss is introduced to enhance the inter-class discriminability and improve the discrimination ability of minority classes and semantically similar sentiment categories. Experimental results on the two benchmark ERC datasets of IEMOCAP and MELD show that compared with the current advanced model CORECT, the F1 score of the proposed framework has improved by 2.91% and 2.00%, respectively, which are better than the existing baseline model in terms of classification performance in most emotions, especially in identifying a few classes and semantic similar categories of emotions.
Ling Rui , Yan Kun , Liang Hongyu , Wei Zhuoqi , Hao Hangbo
2026, 49(6):29-38.
Abstract:Existing single-stage deep models for traffic accident detection often suffer from high false alarm rates and computational redundancy in highway scenarios, severely limiting their practical deployment. To address these issues, this paper proposes a two-stage traffic accident detection method tailored for highways, following a "stationary vehicle filtering+appearance-based recognition" strategy. In the first stage, YOLO11 and Bot-SORT are integrated to detect and track vehicles, and inter-frame speed analysis is used to identify stationary vehicles as potential accident candidates. In the second stage, an improved model named YOLO-EA is introduced to perform appearance-based detection exclusively on the stationary vehicles, combined with a multi-frame voting mechanism to enhance stability and robustness. Built upon the YOLO11 architecture, YOLO-EA incorporates an EAS-Stem module and an AWD-Conv module. The former enhances edge and contour extraction in the input stage, while the latter improves downsampling efficiency by retaining critical features and reducing computational cost. Experimental results show that YOLO-EA improves Precision, mAP@0.5 and mAP@0.5:0.95 by 10.9%, 3.4% and 2.8% respectively, while reducing parameter count by 21%. On the constructed accident video dataset, the proposed method achieves an accident recognition rate of 81.25%, with a 24.46% reduction in false alarm rate compared to single-stage detection strategies. This method achieves a favorable balance between accuracy and inference efficiency, demonstrating strong potential for real-world deployment.
Liu Ning , Han Jiasheng , Wang Tao , Feng Shuyi
2026, 49(6):39-46.
Abstract:For coastal estuary water quality monitoring environments, traditional conductivity sensors suffer from issues such as bulky size and susceptibility to corrosion. This paper proposes a non-contact seawater conductivity measurement method based on single-coil sweep-frequency resonant impedance measurement. A coil equivalent circuit model in seawater environments was established, with in-depth analysis of the mechanism by which seawater eddy current losses affect system resonance characteristics. It elucidates the linear mapping relationship between resonant equivalent impedance and seawater conductivity under resonant conditions. Finite element simulation was employed to perform linear fitting on simulated data, validating the accuracy of theoretical derivations. Building on this, a sweep-frequency-based conductivity measurement system was constructed, achieving precise extraction of resonant point impedance. Experimental results demonstrate that in low-conductivity environments (saltwater intrusion), this method maintains consistent high measurement sensitivity, with a maximum fitting error of merely 0.0417 mS/cm. Compared to existing research, the proposed approach significantly enhances detection precision for subtle conductivity variations while improving anti-contamination capabilities. Furthermore, this method enables pre-calculation of fitting parameters via simulation software, thereby reducing human and material resources required for sensor calibration and optimizing sensor fabrication processes. It offers a novel solution for estuary water quality monitoring characterized by low-cost, high-reliability, and high-sensitivity.
Wu Ye , Tao Xu , Hu Changyu
2026, 49(6):47-55.
Abstract:This paper proposes a superconducting parallel-nanowire dual-resolution single-photon detector capable of simultaneously achieving photon-number resolution and spatial-position resolution under a single-output readout scheme. The detector consists of N superconducting nanowire units connected in parallel. Each unit incorporates a uniquely valued marking resistor in parallel to form an asymmetric resistor network, along with a series resistor of identical value. The entire array is biased by a common current source and read out through a single output channel. Taking a four-pixel structure as an example, with gradient-distributed shunt resistors (100, 200, 400, and 800 Ω) and a 50 Ω series resistor, LTspice simulations demonstrate that the superposition of response pulse amplitudes enables simultaneous discrimination of both photon number and spatial location, allowing up to 4-photon events and 15 distinct spatial response patterns to be identified. Further analysis indicates that the proposed structure effectively suppresses current shunting and latching effects commonly found in conventional parallel-nanowire detectors, thereby enhancing operational stability, albeit at the cost of reduced output signal amplitude and signal-to-noise ratio. This study provides a novel and feasible technical pathway for developing dual-resolution PNDs, offering a new perspective for future large-scale, high-count-rate, and low-SWaP-C multifunctional PNDs with full-information acquisition capabilities, thereby broadening potential applications in quantum imaging, lidar, and quantum communication.
2026, 49(6):56-66.
Abstract:To address the issues of low detection accuracy, high missed detection rate, and poor realtime performance in complex indoor and outdoor scenarios, where the instrument area occupies a small pixel ratio due to the long shooting distance, this paper proposes an improved pointer instrument detection algorithm based on YOLOv8, named GRCP-YOLOv8. First, a C2f_CGA module, integrated with the CGA attention mechanism, is designed to enhance the model′s ability to express features at different scales and replace all C2f modules in the backbone network. Secondly, RFAConv is introduced to replace the conventional convolution layers, addressing the insufficient feature representation caused by parameter sharing in standard convolution modules. Subsequently, a new neck network structure, CCFPN is designed. By incorporating high-resolution feature maps extracted from the backbone network, it improves the model′s capability to detect small targets, while reducing the number of channels in convolution layers via 1×1 convolutions, thus reducing the model′s parameter count and computational complexity. Finally, a new detection head, RepHead, based on reparameterized convolution (RepConv), is introduced to reduce computational load and memory consumption during inference. Experimental results show that the proposed algorithm achieves accuracy, recall rate, and mAP@50 of 94.3%, 91.6%, and 92.5%, respectively, with recall and mAP@50 improving by 1.3% and 1.2% compared to the YOLOv8n model. The algorithm also reduces computational complexity and parameter count by 39% and 27%, respectively, while the model size is only 4.22 MB. These results demonstrate that the proposed algorithm not only improves detection accuracy but is also more suitable for deployment on edge devices.
Hua Yan , Li Peng , Yan Dong , Zhang Xiangkai
2026, 49(6):67-75.
Abstract:Electric vehicle charging load forecasting supports power dispatch decisions by addressing load fluctuations from widespread EV grid integration. A new method for predicting short-term EV charging loads is proposed to enhance power grid stability and reliability by improving load forecasting accuracy. First, historical load data is decomposed into subcomponents using VMD, then combined with temperature data and input into multiple TCN-LSTM branches for feature extraction, simplifying EV load sequence complexity. Secondly, a two-stage attention mechanism enhances the LSTM structure, improving load characteristic capture at specific times and feature dimension fusion, boosting complex load pattern recognition. Finally, a time conversion prediction module integrates results via a fully connected layer to enhance prediction accuracy and reduce errors. Case study analyzes real EV charging station load data from a Shaoxing community. Experimental results show the proposed method reduces MSE by 68%, MAE by 60%, and improves the performance index by 4%, demonstrating strong predictive performance.
Zhang Zhenli , Hu Zhiqiang , Song Chenglin , Li Yongqun
2026, 49(6):76-85.
Abstract:Aiming at the problems that the electromagnetic levitation system is easily affected by external disturbances and the inherent contradiction of the integer order PD in the traditional linear active disturbance rejection control, this paper proposes a fractional order linear active disturbance rejection control method. The linear extended state observer is used to estimate the total disturbance of the system in real time, and a fractional order differential operator is introduced into the position loop control law. By utilizing the characteristic that its order can be continuously adjusted within the interval (0, 2), the requirements of phase and amplitude in the frequency domain are flexibly adapted. Theoretical analysis shows that fractional-order linear active disturbance rejection controller can simultaneously enhance the disturbance suppression ability in the low-frequency band and suppress the high-frequency noise amplification effect. Simulation and experimental results show that, compared with linear active disturbance rejection control, fractional-order linear active disturbance rejection controller, reduces the position deviation by 48.72%, shortens the adjustment time by 80.28%, and can effectively deal with stronger disturbances and improve the tracking accuracy, significantly enhancing the anti-interference and tracking performance of the system.
Jiao Huailiang , Liu Liqun , He Junqiang , Zhang Zheng , Wu Qingfeng
2026, 49(6):86-97.
Abstract:In order to solve the problems of slow convergence speed, low convergence accuracy and easy to fall into local optimization of artificial lemmings algorithm (ALA), a multi strategy improved artificial lemmings algorithm (IALA) is proposed. Firstly, Hammersley sequence is introduced to initialize the population of the algorithm, so that the initial population has better search ability; then the reverse differential mutation mechanism is used to improve the diversity of the population and enhance the ability of the algorithm to escape from the local optimum; finally, through the soft frost ice search mechanism, the algorithm takes into account the local and global characteristics in the optimization process, which improves the optimization ability and convergence speed of the algorithm. In order to verify the effectiveness of the improved algorithm, nine benchmark functions are selected to compare the improved algorithm. The comparison results show that IALA has faster convergence speed and higher convergence accuracy. Finally, the improved algorithm is applied to the simulation experiment of robot path planning on three kinds of complex maps. The results show that compared with the original algorithm ala, the improved algorithm IALA in the first kind of map, the optimal value of path decreases by 0.64%, and the average value decreases by 2.86%; in the second map, the optimal value of path decreased by 10.24%, and the average value decreased by 6.91%; in the last map, the optimal value of the path decreased by 2.6%, and the average value decreased by 1.3%. It is proved that the improved algorithm has better path optimization ability.
Liu Jie , Li Zhiwen , Zhang Tengqing , Xie Mingshan
2026, 49(6):98-109.
Abstract:With the continuous expansion of drone application scenarios, small object detection in aerial images has become a research hotspot in the field of computer vision. In view of the problems that small object features are not obvious, complex backgrounds lead to false detection and missed detection, and the existing algorithms are difficult to balance detection accuracy and real-time performance, this paper proposes an aerial image small object detection algorithm FST-RTDETR based on RT-DETR to solve these problems. First, FasterNet is combined with the EMA attention mechanism, and the structure of the Basic Block module of the original module is redesigned to improve the network operation speed and the accuracy of visual tasks. Secondly, in order to solve the problems of excessive calculation and more time-consuming post-processing after adding the traditional P2 detection layer, this study propose to use the P2 feature layer based on the original CCFM architecture to obtain features rich in small object information through SPDConv and give them to P3 for fusion, and then use the CSP idea and Omni-Kernel to improve CSP-OmniKernel for feature integration, effectively learn the feature performance from global to local, and finally reduce the missed detection rate, false detection rate and improve the detection performance of small objects. Finally, in order to simplify the loss function calculation process, improve regression efficiency and accuracy, and have a more comprehensive loss consideration, this study use inner-MPDIoU to replace the original GIoU. Experiments on the improved algorithm on the VisDrone2019 dataset show that the FST-RTDETR model achieves a detection accuracy of 49.6%, which is 2.1% higher than the original RT-DETR model. The FST-RTDETR model significantly improves the object detection performance of drone images, improves model efficiency, and shows good performance compared to other algorithms.
Jiang Junchao , Wang Yonglan , Fang Jiandong , Zhu Jin
2026, 49(6):110-122.
Abstract:In recent years, the application of 3D Gaussian splatting technology in simultaneous localization and mapping systems has made it possible to perform high-quality image rendering using explicit 3D Gaussian models, significantly improving the fidelity of environmental reconstruction. However, the existing methods based on 3DGS have problems such as limited tracking accuracy and lack of global consistency in the 3D reconstruction of complex indoor environments. For this purpose, this paper proposes a dense SLAM algorithm based on 3D Gaussian splatting—SNGO-SLAM. This algorithm combines the advantages of both frame-to-model and frame-to-frame tracking methods, and uses surface normal perception to obtain richer geometric information, significantly improving the tracking accuracy. To address the tracking error that occurs over time, the algorithm introduces a loop closure process and optimizes the 3D Gaussian point representation problem, further enhancing the tracking accuracy. In addition, this algorithm also introduces a dual Gaussian pruning strategy, optimizing memory usage and ensuring precise camera tracking. Experiments on the Replica, ScanNet and TUM RGBD datasets show that while maintaining high rendering quality, the absolute root mean square error of the trajectory of this algorithm on the Replica dataset reaches 0.27 cm. Compared with NICE SLAM, Vox-Fusion, Gaussian SLAM and SplaTAM, the tracking accuracy has increased by 74.53%, 91.26%, 12.90% and 28.95% respectively, providing new ideas for SLAM technology.
Tursun Mamat , Liu Xiangshuo , He Chunguang , Yang Qiuju , Duan Ting
2026, 49(6):123-134.
Abstract:To address the low search efficiency, slow convergence speed, and limited path expansion diversity of the RRT family of algorithms, an adaptive multi-strategy dynamic step-size algorithm, AMDS-Bi-RRT*, is proposed. Based on the Bi-RRT* framework, the algorithm enhances convergence efficiency through a dynamic goal-directed extension strategy and an adaptive step-size evaluation function. A multi-directional emergency maneuver strategy is designed to improve adaptability in complex environments. Meanwhile, node sampling is optimized using an improved artificial potential field method, and a three-stage path smoothing approach is introduced to ensure path feasibility. Comparative experiments conducted in four simulation environments of varying complexity against five benchmark algorithms—Bi-APF-RRT*, Bi-RRT*, APF-RRT*, RRT*, and goal-biased RRT*—demonstrate that AMDS-Bi-RRT* reduces average planning time by 12.22%~23.45%, shortens average path length by 0.88%~1.89%, and decreases the average number of nodes by 6.69%~22.85%. The results verify that AMDS-Bi-RRT* outperforms the comparison algorithms in planning efficiency, path quality, and convergence speed, confirming its superior performance across diverse environments.
Gang Mingxu , Yan Bingjun , Hu Bo
2026, 49(6):135-145.
Abstract:In order to meet the requirements of large-scale infrastructure and supporting equipment for corrosion monitoring in atmospheric environment, a multi-channel atmospheric corrosion monitoring system based on STM32F407ZGT6 is designed and implemented, aiming at the shortcomings of poor real-time performance and low monitoring accuracy in the existing corrosion monitoring system. The system combines the electrochemical impedance spectroscopy measurement technology and the theory of the equivalent circuit model of the galvanic probe of the double electrode primary battery, uses the time division multiplexing method to control the multi-channel excitation signal generation module to generate the excitation signal to act on the electrode system of each channel, uses the multi-channel response signal acquisition module to collect the response voltage data generated by the electrode system of each channel in real time, and transmits the calculated and processed electrochemical impedance spectroscopy data to the upper computer software deployed in the cloud server through the wireless communication module, and finally analyzes and processes the electrochemical impedance spectroscopy data to obtain the corrosion state information. The experimental results show that the system can realize multi-point corrosion status analysis and monitoring in atmospheric environment, the accuracy of corrosion rate is more than 90%, and the monitoring data can be accurately transmitted to the monitoring platform in real time.
Zhang Wenze , Wang Zaijun , Jiang Yuheng , Yang Ruizhe
2026, 49(6):146-155.
Abstract:Accurate assessment of pilot cognitive states is critical for ensuring flight safety, yet existing methods exhibit limitations in fusing multimodal physiological signals. To address this, this paper proposes a dual-stream deep learning network based on bidirectional cross-modal attention. The model adopts a parallel dual-branch architecture: The electroencephalography (EEG) branch quantifies brain functional connectivity through phase locking value (PLV) features and employs a densely connected network enhanced with squeeze-and-excitation (SE) modules for deep feature extraction; the electrocardiogram (ECG) branch extracts heart rate variability (HRV) and waveform features, processed by a residual-connected multilayer perceptron to characterize autonomic nervous system activity. Building upon this, an innovatively designed bidirectional cross-modal attention module dynamically weights and fuses the dual-path deep features to achieve precise classification of three states—concentrated attention, distracted attention, and startle/surprise. Experimental results on the NASA public dataset demonstrate an overall recognition accuracy of 97.44%. Ablation and comparative analyses confirm that the fusion strategy significantly outperforms single-modality analysis and simple feature concatenation methods. The study reveals that deep integration of EEG functional connectivity and ECG physiological information via attention mechanisms effectively enhances cognitive state recognition performance. This approach provides reliable technical support for developing objective and efficient pilot state monitoring systems, holding significant application value for improving flight safety.
Luo Pengyang , Zhu Wenzhong , Wang Wen
2026, 49(6):156-166.
Abstract:Medium and long-term power load forecasting is a core link to ensure the stability and economy of power system planning and operation.Some studies convert the input data to the frequency domain through Fourier transform to obtain different signal components, thereby reducing the interference of noise. However, existing studies often indiscriminately handle all frequency-domain signals, causing the key frequency-domain components and irrelevant frequency-domain components to mix, which makes it difficult for the model to fully capture the features contained in the frequency-domain signals. Therefore, a multivariable long-term prediction model FTAformer that integrates frequency-domain analysis and attention mechanism is proposed. This model integrates time-domain and frequency-domain information and conducts collaborative modeling to enhance the model′s ability to capture global features. Firstly, the input sequence is transformed into a frequency-domain signal by using the fast Fourier transform. A hierarchical filtering and isolation strategy is adopted to isolate the key frequency-domain components and suppress the noise. Then, the correlations among different variables are captured in the time domain through the multi-head attention mechanism, and the global representation of the sequence is modeled by using layer normalization and the feedforward network module. The experimental results show that on two public power load datasets, the predictive performance of this model is significantly higher than that of other benchmark models. Compared with the existing optimal model iTransformer, the mean square error and mean absolute error of the proposed method are reduced by 15.26% and 8.76% respectively in the multi-step prediction scenario, fully verifying the effectiveness and superiority of the collaborative modeling of frequency domain analysis and multi-head attention mechanism in medium and long-term power load forecasting.
Wu Jiaying , Yang Xiaowen , Han Xie , Han Huiyan , Zhang Yuan , Zhao Rong
2026, 49(6):167-176.
Abstract:To address the limitations of existing weakly supervised semantic segmentation models for point clouds,which struggle to balance local feature correlation, generalization, and feature utilization. This paper proposes WS-MLF, a weakly supervised point cloud semantic segmentation model via multi-scale local feature fusion, based on the RAC-Net baseline. Firstly, the raw point cloud data is taken as input, and a multi-scale spherical sampling methods (MSSM) is employed to capture hierarchical features across varying spatial radii. Secondly, a multi-local feature aggregation enhancement module (MFA) is designed to refine geometric context within neighborhoods. Thirdly, a spatial-channel-fused hybrid attention module (SCH-Att) is proposed to prioritize discriminative channels and key points. Finally, a decoder is utilized for upsampling to generate point-level semantic labels, thereby completing the semantic segmentation task. The proposed model is evaluated on large-scale indoor scene datasets, S3DIS and ScanNet-v2. Experimental results demonstrate that on the S3DIS dataset, when the label ratios are 0.02% and 0.06%, the mIoU surpasses RAC-Net by 2.71% and 0.54%, respectively. On the ScanNet-v2 dataset, with a label ratio of 20 pt, the mIoU increases by 1.55% compared with RAC-Net. These results validate WS-MLF′s effectiveness in extracting key features under weak supervision, enhancing segmentation accuracy.
Hou Linjie , Lu Chengfang , Cui Yanrong
2026, 49(6):177-191.
Abstract:Small object detection in UAV aerial imagery encounters critical challenges including extremely small target sizes, complex background interference, and insufficient feature representation. Addressing the limitations of existing RT-DETR models in small object feature extraction and multi-scale fusion, this paper proposes an adaptive multi-scale gated enhancement fusion DETR (MGEF-DETR). A multi-order cross-stage gated aggregation (MCGA) module is designed to achieve selective enhancement of small object texture features through adaptive gating mechanisms. A Micro-OmniPyramid feature pyramid is constructed by integrating space-to-depth (SPD) convolution sparse encoding and cross-stage enhanced spectral kernel (CESK) modules, establishing lossless transmission pathways for small object features. An enhanced feature correlation (EFC) module is introduced to optimize cross-scale feature fusion through grouped attention and multi-level reconstruction strategies. An inner-modified penalty distance IoU (IMIoU) loss function is designed to enhance boundary regression sensitivity for small objects. Experimental results on the VisDrone2019 dataset demonstrate that MGEF-DETR achieves improvements of 3.9% and 3.1% in mAP@0.5 and mAP@0.5:0.95 metrics respectively compared to the baseline RT-DETR, while reducing parameters by 13.6%. Validation on TinyPerson and CODrone datasets further confirms the generalization capability of the algorithm, indicating significant improvements in both accuracy and efficiency for small object detection in aerial scenarios while maintaining lightweight characteristics.
Zhao Xuefeng , Ren Yi , Zhong Zhaoman , Zhong Xiaomin
2026, 49(6):192-201.
Abstract:Underwater litter detection is a crucial technology for maintaining the balance of underwater ecosystems. To address the challenge of significant variations in target scales encountered in underwater litter detection, we propose the YOLO11-MDA based on YOLO11 is proposed.Firstly, a multidomain feature extraction module MFEM is proposed, which is capable of extracting different scales of features from the input feature map by extracting the target features in both spatial and frequency domains, and enhances the ability of expression of the global features and local information. Second, the lightweight dynamic up-sampling DySample module is introduced to integrate contextual information and improve the quality and efficiency of up-sampling. Finally, the adaptive threshold focused classification loss ATFL is introduced to reduce the impact of the uneven distribution of multi-scale samples on the detection results and improve the detection accuracy of multi-scale targets. The experimental results show that compared with the baseline model, the mAP of YOLO11-MDA in TrashCan dataset and Trash_ICRA19 dataset reaches 91.4% and 97% respectively, which is an enhancement of 3.1% and 10.7%, and the FPS reaches the detection speed of 354.3 fps, which fully demonstrates that the overall performance of the improved model outperforms that of other algorithms, and it can provide an effective method for the automated monitoring of underwater environments.
2026, 49(6):202-210.
Abstract:Steel defect detection is critical for industrial quality control, yet performance is constrained by multi-scale variations, small targets, and background interference. To enhance the accuracy and efficiency of the detection model, this paper proposes a defect detection network based on an improved version of YOLO11, named LiteSteel-YOLO. First, a Lightweight Multi-Scale Fusion module (C3k2-LMSF) is designed to enhance multi-scale defect perception through fused convolutional kernels and feature guidance mechanisms. Second, a spatial-channel aware upsampling module (SCAM) is proposed, which improves the robustness of small target detection and suppresses noise through channel reorganization and spatial offset operations. Finally, an Efficient-Head detector optimized via structural reconfiguration is introduced to maximize computational efficiency. Experimental results show that the LiteSteel-YOLO receives mAP@50 of 81.7% and 70.7% with inference speed of 338 and 530 FPS on the NEU-DET and GC10-DET datasets (surpassing YOLO11 by 4.0% and 2.3%). The proposed framework enhances the accuracy and efficiency of steel defect detection, providing a solution for industrial inspection scenarios.
Lu Jingyi , Chen Bo , Wu Yang , Liang Qihao , Wang Peng
2026, 49(6):211-219.
Abstract:Fire and smoke detection is a critical component of intelligent surveillance and disaster early warning systems, with wide applications in forest fire prevention, industrial safety and other fields. However, existing algorithms often suffer from low detection precision, slow speed, and large model size under natural environments. To address these issues, this paper proposes a fire and smoke detection method based on the lightweight YOLOv8n. The proposed model replaces the original backbone with PP-LCNet to reduce model size, introduces the CARAFE upsampling operator to enhance feature reconstruction, and integrates the EMA attention mechanism to improve target perception capability. Experimental results show that, compared with the original YOLOv8n, the improved model reduces parameters by 1.01 M and computational cost by 2.2 G, while achieving a detection precision of 94.8% and an mAP50 of 93.6%. It outperforms other mainstream lightweight detection models, achieving an excellent balance between precision and real-time performance, and demonstrates strong practical value.
Guo Xinru , Lyu Weidong , Wang Rui , Zhao Dini
2026, 49(6):220-228.
Abstract:Brain tumors are highly invasive neurological diseases, and accurate early diagnosis is crucial for developing personalized treatment plans. Computer-aided diagnosis (CAD) based on deep learning techniques has achieved significant progress in medical image analysis, but limitations remain in terms of classification accuracy, computational efficiency, and interpretability. To address these issues, this study proposes an optimized EfficientNet model based on transfer learning and fine-tuning strategies. The model improves certain convolutional and fully connected layers and adds a global average pooling layer and a Dropout layer at the top of the network to enhance feature extraction capability and classification performance. Additionally, gradient-weighted class activation mapping (Grad-CAM) is introduced to visualize the model′s decision-making process, effectively highlighting key discriminative regions of brain tumors, thereby improving interpretability and clinical reliability. Experimental results on the Figshare dataset demonstrate that the proposed model achieves an accuracy of 99.35% on the test set while significantly reducing parameter count and computational complexity, outperforming baseline models including VGG16, ResNet152V2, and Vision Transformer across all major metrics. Furthermore, cross-dataset validation shows that the model attains an accuracy of 92.51%, further demonstrating its robust stability and generalization capability.
2026, 49(6):229-238.
Abstract:Aiming at the problems of small target size, fuzzy edges, and vulnerability to noise and background interference in defect areas of photovoltaic infrared images, an improved algorithm based on YOLOv11 was proposed. Firstly, a guided local-global spatial attention (GLGSA) module is designed to effectively integrate Local salient region information and Global context semantics to improve the discrimination of feature representation. Secondly, the GLGSA module was combined with the bidirectional feature fusion structure BiFPN to construct the GLGSA-BiFPN structure to improve the effect of multi-scale feature fusion. The P2 detection layer was added to enhance the detection ability of minimal targets. Finally, the NWD loss function is introduced to replace the original loss function to enhance the positioning accuracy of small targets. Experimental verification is carried out on the PV-HSD-2025 photovoltaic hot spot data set. The results show that the detection accuracy of the improved algorithm mAP50 and mAP50-95 is 9.1% and 5.6% higher than that of YOLOv11n. Effectively improve the accuracy of photovoltaic small target defect detection.
2026, 49(6):239-246.
Abstract:This paper proposes a set of high-precision visual measurement methods to address the challenges of perspective distortion, thickness corner offset, and continuous tracking of multiple workpieces in the dynamic environment of intelligent manufacturing. In the preprocessing stage, the collected images are converted into approximately orthographic projection views through camera calibration and perspective correction. To obtain accurate edge images, this paper proposes an edge detection algorithm based on multi-scale edge fusion. By applying guided filtering to the collected images at different scales and then using dynamic Canny edge detection, the complete contour of the workpiece is obtained. To address the corner offset caused by the thickness of the workpiece, this paper proposes a high-precision corner extraction algorithm based on thickness interference elimination. By fusing sub-pixel corners and fitted corners, precise corner positioning is achieved. In addition, an object tracking algorithm is designed to match and identify the centroids of the workpieces, enabling automatic size recognition and measurement of multiple workpieces in consecutive frames. Experimental results show that this method can measure the sizes of multiple workpieces in arbitrary poses, with a mean error of 0.599 mm and a standard deviation of 0.172 mm, meeting the measurement requirements in industrial production.
Liu Qingqiang , Zheng Xiaodong , Liu Yuanhong , Qian Kun
2026, 49(6):247-256.
Abstract:Addressing the key challenges of insulator fault detection in drone-based power inspection scenarios, such as high missed detection rate for small targets, significant interference from complex backgrounds, and insufficient real-time performance, this study proposes an improved YOLOv10n detection model based on multi-scale feature collaborative optimization. By constructing a lightweight adaptive feature extraction network and a hierarchical fusion mechanism of multi-scale semantic enhancement architecture, dynamic deformable grouped convolution and channel recalibration strategies are adopted in the shallow network to enhance the sensitivity to micro-defect features, while a multi-branch dilated convolution pyramid and cross-dimensional attention mechanism are established in the deep network to build cross-scale associations, achieving a collaborative optimization of detection accuracy and computational efficiency. A shape-sensitive InSh-IoU loss function is proposed, which dynamically adjusts the weight coefficient of the bounding box shape to reduce the positioning error of targets with abnormal aspect ratios, enabling more accurate localization of insulators. Verified by a self-built insulator fault dataset, this model maintains real-time detection speed while achieving an average detection accuracy (mAP@0.5) of 97.12%, an improvement of 2.82% over the baseline model.

Editor in chief:Prof. Sun Shenghe
Inauguration:1980
ISSN:1002-7300
CN:11-2175/TN
Domestic postal code:2-369