Volume 48,Issue 15,2025 Table of Contents

Robust dynamic RGB-D SLAM based on motion probability screening and weighted pose estimation

Yu Xingyun , Cheng Xianghong , Liu Fengyu , Zhong Zhiwei

2025, 48(15):1-10.

Abstract (278) HTML (0) PDF 11.04 M (458) Comment (0) Favorites

Abstract:In order to reduce the interference of dynamic objects on visual SLAM, a robust dynamic RGB-D SLAM that combines the motion probability of feature point and weighted pose estimation is proposed. First, the instance segmentation network Yolact is used to obtain semantic information of scene, combine semantic information and depth information to restore the dynamic mask boundaries, and calculate the semantic dynamic probability according to the magnitude of the prior motion probability. Then, a semantically guided method is used to calculate the geometric dynamic probability of feature point, and the semantic dynamic probability, the geometric dynamic probability and their confidence are combined to construct the motion probability of the feature point, and a feature point screening strategy with adaptive probability threshold is designed. Finally, in the process of pose tracking, local map optimization, and global optimization of the system, a weighted cost function based on the motion probability of feature point is designed to distinguish the contribution of different feature points to pose optimization. In addition, after removing the dynamic objects, a global point cloud map is established for static scenes. Experimental results on the public datasets demonstrate that, compared with ORB-SLAM2, the Root Mean Square Error of Absolute Trajectory Error of the proposed algorithm on the TUM RGB-D and Bonn datasets is reduced on average by 69.16% and 91.94%, respectively. Moreover, compared with other state-of-the-art dynamic SLAM algorithms, the proposed method exhibits noticeable improvements in both pose estimation accuracy and robustness. In real-world experiments, compared with ORB-SLAM2 and Dyna-SLAM, the trajectory endpoint drift error is reduced by an average of 52.20% and 19.15% respectively.

Research on the extremely near-field telemetry antenna for engine bearings

Zhang Xiaoxin , Cheng Long , Ma Lifeng , Zhang Feifan

2025, 48(15):11-19.

Abstract (210) HTML (0) PDF 10.32 M (470) Comment (0) Favorites

Abstract:In monitoring engine bearing temperatures, challenges such as antenna installation and signal transmission difficulties often arise. This paper proposes a comprehensive design for an extremely near-field telemetry antenna based on surface acoustic wave temperature sensors. Firstly, the electromagnetic field distribution characteristics in the extremely near-field region are obtained through theoretical analysis, and a wideband antenna with an in band reflection coefficient of less than -10 dB is designed through simulation optimization in the 2~3 GHz frequency band. Next, an antenna cage is introduced with a rational assembly structure. Furthermore, by covering the surface of the extremely near-field antenna with a dielectric superstrate, the energy transfer efficiency is improved by approximately 14.80%, thereby enhancing signal transmission quality and reducing the rate of change during energy transfer by about 54.36%, which ultimately increases the stability of the subsequent telemetry system. Finally, after installing the sensor on the extremely near-field antenna and connecting it to the telemetry system, actual temperature measurements are conducted during bearing operation, validating the practicality of the extremely near-field antenna, its assembly structure, and the feasibility of the dielectric superstrate method.

Research on fast wet etching release of front-side opened suspended structures based on <100>-oriented silicon

Li Bo

2025, 48(15):20-26.

Abstract (207) HTML (0) PDF 3.55 M (408) Comment (0) Favorites

Abstract:To meet the demand for the efficient preparation of the suspended structure of medical uncooled thermal infrared detectors, this paper proposes a front-opening rapid wet etching process based on <100>-oriented silicon wafers. Three types of <100>-oriented slit-shaped opening structures, namely the zigzag type, the strip type, and the composite type, are designed. Combined with the SiO2/Si3N4 stress-compensated composite film, after 120 minutes of anisotropic etching in a KOH solution with a molar concentration of 30% at a water bath temperature of 80 ℃, the high-precision release of the suspended structure is achieved. The experimental results show that the composite opening significantly improves the penetration efficiency of the etching solution by adding auxiliary openings in the cantilever beam area. Compared with the back-side sacrificial layer etching method, the etching time is shortened by 60%. The effect is far better than that of the front-side sacrificial layer etching, with the release area reaching over 98% and the yield rate increased to 95%. On this basis, by adopting a single-side processing flow that is fully compatible with the CMOS process, P/N polysilicon thermopiles and amorphous silicon microbolometer units are successfully fabricated. In the detection of ear temperature/forehead temperature, a temperature measurement accuracy of ±0.1 ℃ and a response time of less than 500 milliseconds are achieved, meeting the requirements of medical-grade equipment for high precision and rapid response. The integrated process flow of “crystal orientation design-stress regulation-etching optimization” proposed in this study provides a reliable solution for the batch preparation of high-performance suspended structures and has important application value in the field of wearable health monitoring devices.

Medical image registration based on multiscale feature-perceptual loss and attention mechanism

Ma Tianyi , Jiang Dashuai , Zhu Dong , Zhang Lintao , Li Guoqiang

2025, 48(15):27-34.

Abstract (251) HTML (0) PDF 5.96 M (403) Comment (0) Favorites

Abstract:Recently, deep learning-based methods have been widely applied in deformable medical image alignment tasks. Among them, utilizing novel loss functions and effective network architectures to improve registration performance is a common approach. This article proposes a multiscale feature perception loss and attention module ECA-D, which improves the design approach of using only mean square error (MSE) or normalized cross-correlation (NCC). Inspired by the current popular large language model (LLM), this paper trains a classification neural network using multi-site medical image data and constructs a multi-scale feature learning process to improve the accuracy of the classification network. Subsequently, a multi-scale perceptual loss function is designed to enhance the accuracy of registration. To improve the learning ability of the alignment network, a new attention module ECA-D was designed to more effectively utilize spatial and channel information. After training on the LPBA40 dataset, our model showed a 3% improvement in Dice score on the untrained Neurite OASIS compared to the most advanced methods. The experimental results show that compared with other popular registration methods, our method has higher registration accuracy and better robustness.

CEEMDAN-based pulse wave data augmentation with two-layer SMOTE

Li Hui , Li Zhenhua , Li Ruijie , Zhang Zhidong , Xue Chenyang

2025, 48(15):35-41.

Abstract (234) HTML (0) PDF 1.29 M (399) Comment (0) Favorites

Abstract:To address the SMOTE algorithm′s noise sensitivity and physical distortion in pulse wave imbalance processing, this study proposes a CEEMDAN-enhanced CP-SMOTE that decomposes preprocessed pulse waves into primary/secondary layers for stratified sample generation, effectively eliminating residual noise. By integrating adaptive distance metrics and constrained supervision mechanisms aligned with pulse wave characteristics, the algorithm ensures physiologically authentic sample generation while enhancing inter-class discriminability. Evaluations on proprietary and public PPG-BP datasets with four classifiers demonstrate CP-SMOTE′s superiority: 1.51%-18.25% improvements in AUC, G-mean, and F1 scores on proprietary data, CP-SMOTE consistently outperformed SMOTE-based algorithms across key metrics including AUC,G-mean, and F1-score, with improvements ranging from 1.51% to 18.25% on proprietary data,and minimum 1.43% gains in Accuracy(2.24%),G-mean (1.47%) and AUC (1.43%) on public data, confirming its effectiveness in mitigating physical distortion and noise interference compared to SMOTE variants.

Improved BitCN-LSTM short-term photovoltaic power prediction based on the KNN-LASSO-PPC method

He Yuxuan , Wang Kun , Zeng Jinhui , Liu Jie , Zhou Wuding

2025, 48(15):42-51.

Abstract (231) HTML (0) PDF 6.82 M (436) Comment (0) Favorites

Abstract:The photovoltaic power output is influenced by the randomness and volatility of weather conditions. To address this, an improved BitCN-LSTM neural network-based short-term photovoltaic power forecasting method is proposed using the KNN-LASSO-PCC approach. First, the KNN method is used to clean the dataset. Then, multi-layer feature selection is applied by combining LASSO and PCC. Next, GRU and Elman neural networks are incorporated into the traditional BitCNLSTM method. Specifically, GRU solves long-term dependency issues and parameter optimization problems, while the Elman network enhances local time-series modeling and memory capacity. Finally, after multi-layer feature selection, global horizontal radiation, diffuse radiation, temperature, and humidity are selected as input variables, and the predicted photovoltaic power output for each time period is selected as the final output. A simulation is conducted for a 1~3 day period with predictions made every 15 minutes. The resulting optimal evaluation metrics are an average absolute error of 9.976 3%, mean squared error of 1.702 9%, and average absolute percentage error of 10.626 7%. The training time and optimal testing time are 181.305 1 s and 0.058 932 s, respectively. Compared to other commonly used short-term photovoltaic forecasting models, the proposed method achieves higher accuracy and faster speed.

Improved defect detection algorithm for YOLOv10n photovoltaic cells

Wang Haiqun , Wu Zekai , Yu Haifeng

2025, 48(15):52-62.

Abstract (291) HTML (0) PDF 15.63 M (479) Comment (0) Favorites

Abstract:In order to solve the problems of difficulty in defect identification, high rate of missed detection and false detection rate caused by factors such as irregular defect shape, variable size and wide variety of defects in photovoltaic cell defect detection, an improved photovoltaic cell defect detection algorithm based on YOLOv10n was proposed. Firstly, the bottleneck structure of the original C2f is eliminated, and the PMSFA_CSP module is designed as a partial feature extraction module of the backbone and neck network, and the ability to obtain context information through its partial multi-scale feature extraction and residual structure is designed to enhance the network′s ability to fuse defect features. Secondly, by using the shared convolutional layer with different expansion rates and the attention mechanism of SENetV2 aggregate dense layer, the FPSC_SENetV2 module is designed to introduce the backbone network to reduce local information loss and enhance the network′s ability to capture detailed features. Thirdly, FreqFFPN and PMSFA_CSP modules were fused, and the FreqFP_FPN modules were designed and feature pyramid networks were introduced to reduce category inconsistency and enhance the defect information of high-frequency details. Finally, the SESN loss function is constructed as the bounding box regression loss function to balance the detection of defects at different scales, accelerate the network convergence, and improve the computational efficiency. The experimental results show that compared with the original algorithm, the improved YOLOv10n is improved by 3.0%, the computational amount is reduced by 0.7 GFLOPs, and the parameter quantity is reduced by 0.08 M, which is compared with the original algorithm mAP@0.5, and the comprehensive performance meets the requirements of photovoltaic cell defect detection.

DenseNet feature grouping deep isolated forest for image anomaly detection

Zhou Xunhui , Huang Chengquan , Xiao Honghu , Dong Honglai

2025, 48(15):63-69.

Abstract (206) HTML (0) PDF 3.05 M (384) Comment (0) Favorites

Abstract:In order to broaden the application field of Deep Isolation Forest (DIF) algorithm. we combine the deep learning pre-training DenseNet-121 model and DIF algorithm, and proposes a DenseNet Deep Isolation Forest (DDIF) algorithm for exploring the effectiveness of the method on the industrial image anomaly detection dataset MVTec AD. However, the dimension of feature vector after feature extraction by DenseNet-121 model is quite high, and there may be the problem that some important feature attributes in the dataset cannot be selected when randomly selecting data attributes to construct the tree, so we also propose a Group Deep Isolation Forest (GDIF) algorithm and applies it to tabular datasets. Finally, based on the DDIF algorithm and combined with the GDIF algorithm, the DenseNet Group Deep Isolation Forest (DGDIF) algorithm is obtained, which solves the problem of missing important features in high-dimensional data. Different datasets were selected for anomaly detection, and it was found that the DDIF method outperforms other deep learning-based methods in 9 out of 15 image datasets; the GDIF method showed better AUROC values than other traditional classical anomaly detection algorithms in the 9 tabular datasets; and the DGDIF method outperforms the DGDIF method in 15 image datasets by 9 outperforms the DDIF method without referencing feature grouping. The experimental results validate the effectiveness of the proposed GDIF algorithm, DDIF algorithm and DGDIF algorithm.

Liquid crystal display surface defect detection algorithm based on improved YOLOv10

Yang Ruifeng , Liao Yinghua , Luo Qinpeng , Luo Xingran

2025, 48(15):70-79.

Abstract (344) HTML (0) PDF 12.28 M (429) Comment (0) Favorites

Abstract:In response to the challenges posed by the weak characteristics, diverse types, and high similarity with the background of surface defects in liquid crystal displays (LCDs), which result in low detection accuracy with existing methods, this paper proposes an improved micro-defect detection model for LCDs based on YOLOv10, referred to as LC-YOLO. First, the convolutional module in the neck network is replaced with a full-dimensional dynamic convolution (ODConv), which reduces the computational load of the model while maintaining detection accuracy and improving the precise extraction of small defect features. Next, to further optimize the model′s performance, the DySample dynamic upsampling module is introduced. This module avoids background interference by point sampling, thereby reducing false positives and false negatives, and enhancing the model′s robustness in complex backgrounds. Finally, to enhance the model′s ability to detect small targets, the EMAttention attention mechanism is incorporated. This mechanism improves the model′s sensitivity to small and dim targets, significantly boosting overall performance. Experimental results on a dataset of 1,774 images containing three types of defects—scratches, corner breaks, and dents—demonstrate that compared to the original YOLOv10 model, LC-YOLO improves mean average precision,accuracy rate, and recall by 2.9%, 2.4%, and 5.8%. Meanwhile, the computational load of the model is reduced by 2%. When compared to existing object detection algorithms, LC-YOLO not only retains its lightweight characteristics but also enhances detection accuracy and speed, showing excellent performance in detecting subtle surface defects in LCDs.

Lightweight early-stage forest fire detection algorithm integrating multi-scale attention

Xu Ruijie , Xie Hui , Jiang Wujin , Li Hongbing , Xiao Yang

2025, 48(15):80-90.

Abstract (271) HTML (0) PDF 18.88 M (460) Comment (0) Favorites

Abstract:To address challenges in early forest fire detection—including complex environmental backgrounds, indistinct texture features of small flame/smoke targets, and high computational demands in resource-constrained deployments—we propose YOLO-VRG, a lightweight detection algorithm based on improved YOLOv5s. First, we employ VanillaNet as the feature extraction backbone to significantly reduce model complexity while maintaining efficient feature capture. Second, we design the RVBC3EMA module with spatial-channel reconstruction attention to minimize feature redundancy and enhance discriminative representation. Third, we implement grouped shuffle convolution to further optimize parameter efficiency. Experimental results demonstrate that YOLO-VRG achieves 87.6% mAP@0.5 (3.2% improvement over baseline) with only 2.1 M parameters (74.1% reduction) and 4.5 GFLOPs (71.9% reduction). This balanced architecture enables superior detection accuracy and hardware efficiency for edge deployment scenarios.

Positive and negative learning with prototype for distant supervision relation extraction

Xu Guoliang , Chen Qidong , Xu Yuxuan

2025, 48(15):91-100.

Abstract (227) HTML (0) PDF 2.94 M (357) Comment (0) Favorites

Abstract:Distant supervision relation extraction methods based on the multi-instance learning framework mostly rely on contaminated labels that are heuristically generated, and focus on predicting relations at bag-level. However, they show unsatisfactory performance on sentence-level prediction which is more friendly with comprehend sentence tasks, like question answering and knowledge graph completion. To solve the above problems, a novel distant supervision relation extraction method is proposed in this paper, in which we train the model at sentence-level via positive learning and negative learning to separate noisy data and enable faster convergence. Meanwhile, a constraint graph is constructed to encode the re-strictions between relations and entity types and is optimized by an auxiliary loss towards relation prototype, which allows information propagation among different relations that makes the model can learn essential and interpretable sentence representation. We not only identify noisy data but also revise the labels of them iteratively to refine the quality of distant data and further enhance model performance. This method performs well in the sentence-level relation extraction task of the NYT dataset, with an accuracy of 77.69%, which is 6.47% higher than the current optimal baseline model. The F1 score on the noisy annotated test set is as high as 85.88%, verifying its excellent denoising ability. The ablation experiment results show that the contribution of the constraint graph to the optimization of the relation prototype is 11.02%. The experimental results show that this method significantly outperforms the existing methods in the sentence-level relation extraction task, not only effectively reducing the impact of noise, but also significantly improving the model performance, providing an efficient solution for the remote supervision relation extraction task.

Harmonic and interharmonic detection method based on Nuttall window compressed sensing

Liu Yuanda , Liang Chengbin , Yang Ming , Wang Deguang , Dong Yu

2025, 48(15):101-109.

Abstract (266) HTML (0) PDF 9.45 M (498) Comment (0) Favorites

Abstract:To mitigate measurement errors induced by spectral leakage and the fence effect during harmonic and interharmonic detection, and to address the challenges associated with data transmission and storage, a novel method for harmonic and interharmonic detection based on Nuttall window compressed sensing and interpolation technology is proposed in this paper. Additionally, an experimental platform for harmonic detection is established. Initially, the four-term third-order Nuttall window is integrated into the compressed sensing sampling process to achieve windowed compressed sampling of the signal. Subsequently, the sparsity adaptive matching pursuit algorithm is employed to reconstruct and estimate the sparse vector obtained from compressed sampling, and the three-spectrum-line interpolation technique is applied to correct and derive the signal parameter detection results. Finally, an experimental platform is constructed to validate the theoretical correctness and practical feasibility of the proposed method. The results demonstrate that the proposed method achieves maximum relative errors of -0.008%, -0.42%, and 1.37% for frequency, amplitude, and phase, respectively, at a compression ratio of 50%, enabling accurate measurement of harmonic and interharmonic characteristic parameters while significantly alleviating the data processing burden. In hardware experiments, the maximum absolute errors for frequency and amplitude measurements of harmonic signals are found to be 0.026 1 Hz and 0.080 5 V, respectively, confirming the effectiveness and feasibility of the proposed method.

Research of autism diagnosis based on MS-SAGCNs

Lin Bowen , Cao Xianqing , Yang Huan , Zhao Feng

2025, 48(15):110-119.

Abstract (204) HTML (0) PDF 11.77 M (437) Comment (0) Favorites

Abstract:To address the limitations in autism diagnosis, such as insufficient multi-scale feature extraction and the inaccuracy of functional connectivity estimation using Pearson correlation, this study proposes a novel diagnostic framework based on the Multi-Scaled Self-Attention Graph Convolution Network (MS-SAGCN). The framework begins by applying Morlet wavelet transform and dynamic time warping to extract the time-frequency information of Blood-Oxygen-Level-Dependent (BOLD) signals and their multi-scale functional connectivity. A pre-trained embedding model is then used to enhance time-frequency features, which are combined with functional connectivity to construct multi-scale brain networks. Finally, MS-SAGCN is employed to integrate and enhance the data for the automatic diagnosis of autism. Experiments were conducted using the ABIDE dataset, and the results show that MS-SAGCN can effectively enhance the multi-scale brain network. The overall framework achieved an accuracy of 95.1%, a true positive rate of 97.4%, and an F1 score of 94.9% in the classification task, significantly outperforming other diagnostic models, demonstrating the promising application prospects of this model.

Continuous non-invasive blood pressure prediction method based on Conformer-LSTM

Chen Xin , Liu Licheng , Wang Xiaolin

2025, 48(15):120-128.

Abstract (218) HTML (0) PDF 7.06 M (433) Comment (0) Favorites

Abstract:We propose a continuous, non-invasive blood pressure prediction method using the Conformer-LSTM model, which integrates a convolutional branch, Transformer branch, multi-scale cross-attention modules, adaptive spatial feature fusion, and a two-layer LSTM. This method predicts the ABP waveform from the PPG signal, from which systolic and diastolic blood pressures are derived. The model demonstrates minimal prediction error across a large dataset. Experimental results show a high correlation between the predicted ABP waveform from the MIMIC dataset and the actual waveform, with SBP and DBP prediction errors of (3.68±5.60) mmHg and (2.16±3.72) mmHg, respectively. The method complies with the American Association for the Advancement of Medical Devices (AAMI) standards and achieves an A-level rating according to the British Hypertension Society (BHS).

Lightweight improvement of inland ship detection algorithm

Dong Jian , Zhao Xin , Wu Luyao , Wu Kaili , Si Fuqi

2025, 48(15):129-140.

Abstract (295) HTML (0) PDF 17.81 M (468) Comment (0) Favorites

Abstract:To address the challenges of large parameter sizes and computational demands in existing ship detection algorithms, as well as the fluctuations in detection results caused by scale and perspective variations, we propose an improved lightweight inland ship detection algorithm, YOLO-LISD, based on YOLOv8n. First, an efficient feature-sharing detection head, incorporating detail-enhanced convolution, is introduced to replace the original detection head, improving detection consistency. Second, a slim-neck method is incorporated to optimize the neck network, reducing the model size while maintaining detection performance. Third, a global channel-adaptive magnitude-based pruning algorithm is proposed for depth compression, enhancing detection efficiency. Finally, a feature knowledge distillation approach, leveraging spatial and channel correlations, is designed to improve the detection accuracy of the pruned model. Experimental results demonstrate that, compared to YOLOv8n, YOLO-LISD reduces the number of parameters and computational complexity by 68.4% and 56.8%, respectively, while improving detection accuracy and mAP50:95 on the SeaShips dataset by 1.1% and 2.1%, respectively. In practical applications, the detection speed of low computing power equipment reaches 55 fps, meeting real-time requirements. Compared to other algorithms, it demonstrates significant advantages, validating the superiority of the proposed method.

Stereo matching network based on fusing contextual information selectively

Ning Anqi , Yu Yuecheng , Yang Fan , Li Xiang

2025, 48(15):141-149.

Abstract (252) HTML (0) PDF 7.69 M (453) Comment (0) Favorites

Abstract:At present, although the stereo matching network based on deep learning has high accuracy, the complex model structure in the network leads to a sharp increase in computing time.In order to balance the matching speed and accuracy of the network, this paper proposes a stereo matching network based on fusing contextual information selectively. First, the cost volume is constructed through the correlation layer method, and then the single encoder decoder structure is used in the aggregation module to reduce the complexity of the model. Secondly, multi-scale cost bodies are fused in the encoder to capture different levels of parallax information; a selective context information fusion module is designed in the decoder, which uses the context features of the reference image to guide the generation of high-quality geometric information. Thirdly, multi-scale cost volume is fused in the encoder to capture different levels of parallax information; at the same time, fusing contextual information selectively module is designed in the decoder, which uses the context features of the reference image to guide the high-quality decoding of geometric information. Finally, the multi branch aggregation pyramid pooling module is designed to enhance the ability of the encoding-decoding module to understand the global context. The experimental results show that the mismatch rate of all regions on the KITTI2015 dataset is 1.97%, and the three pixel error on the KITTI2012 dataset is 1.50%. Compared with other algorithms, our algorithm achieves more accurate stereo matching accuracy while meeting the real-time requirements.

Dual encoder-based nasal septum medical image segmentation model

Zhou Baokang , Cao Shuang , Gao Hongyong , Song Weibo , Cui Shulin

2025, 48(15):150-158.

Abstract (258) HTML (0) PDF 6.12 M (435) Comment (0) Favorites

Abstract:Accurate segmentation of the nasal septum anatomical structure holds significant clinical value for disease assessment and surgical planning. However, existing methods based on Convolutional Neural Network (CNN) exhibit limitations in global feature representation. To address this issue, this study innovatively constructs the CTA-Net model, which achieves local-global feature collaborative learning through a dual-branch encoding architecture: the CNN branch captures fine anatomical details, while the Transformer branch models long-distance spatial dependencies, and a feature fusion module is designed to enable effective information exchange. Particularly, a multi-scale feature attention mechanism is introduced in the bottleneck layer to enhance the model′s capability to represent complex anatomical structures through different receptive fields. Experiments were conducted on three medical datasets—a self-annotated clinical dataset of the nasal septum, ISIC 2018, and Kvasir. The results demonstrate that, in the nasal septum segmentation task, the model achieves IoU and Dice coefficients of 90.38% and 94.94%, respectively. In cross-dataset experiments, the IoU accuracy for gastrointestinal endoscopic image segmentation reached 76.17%, significantly outperforming other existing models, thereby confirming the model′s advantages in feature learning and generalization. This study provides an innovative solution for medical image analysis by integrating local perception with global modeling, and it holds significant promise for intelligent diagnosis and treatment in the otolaryngology field.

Underwater object detection model based on improved YOLOv11

Fang Zhenbo , Gao Xiangyang , Zhang Qieshi , Cheng Jun , Yang Mengjie

2025, 48(15):159-167.

Abstract (406) HTML (0) PDF 8.46 M (511) Comment (0) Favorites

Abstract:In the complex underwater environment, aiming at the poor detection performance of traditional YOLO target detection method, an underwater target detection model based on improved YOLO11 is proposed. Firstly, by introducing context guidance module CGBD, a multi-scale feature extractor is used to enhance the network capture capability. Secondly, in order to solve the problem that the number of parameters is too large due to feature redundancy in the network, the lightweight and efficient aggregation module RGCSPELAN is designed to reduce the burden of the model. To solve the problem that the localization and recognition ability of the original detection head is insufficient and the calculation cost is high, a lightweight and efficient DEC-Head detection head is constructed by combining the heavy parameterization strategy and detail enhancement convolution. In addition, Wise-Inner-MPD loss function is used to improve the generalization ability and accelerate the convergence of the model. The experimental results in URPC dataset show that compared with the benchmark model YOLO11, the proposed method improves the mean accuracy of mAP50 and MAP50-90 by 2.4% and 2.1% points respectively. Moreover, in the experimental results of RUOD dataset, Compared with YOLO11, the average accuracy of the improved model mAP50 increased by 1.3% and the recall rate R increased by 1.5%, showing better underwater target detection performance than other mainstream detection methods.

Surface defect detection method for wind turbine based on SAFPN-YOLO

Jin Xin , Jing Rui , Jiang Yichen

2025, 48(15):168-176.

Abstract (229) HTML (0) PDF 10.49 M (440) Comment (0) Favorites

Abstract:In order to solve the problem that traditional detection methods cannot detect surface defects of wind turbines sufficiently, the paper proposes a surface defect detection algorithm for wind turbines based on SAFPN-YOLO. Firstly, in order to solve the problem of difficulty in multi-scale object detection, the SAFPN network based on the idea of asymptotic fusion is used to replace the classical feature pyramid fusion network, so as to reduce the semantic gap of information during feature fusion. Secondly, in order to solve the problem of redundancy of background information, the original SCDown module was replaced by the NAMBlock module embedded in the deep embedding of the algorithm backbone network, so that the model could extract features in a broader field of view while retaining the key information of local features. Finally, in order to solve the problem that textured defects are difficult to detect and locate, an attention mechanism is proposed to strengthen the spatial feature interaction ability and feature expression ability, and further improve the detection performance of the model. The experimental results show that the mAP50 based on SAFPN-YOLO fan surface defect detection algorithm reaches 82.4%, which is 3.3% higher than that of the baseline model, and can achieve more accurate defect detection on the surface of the fan.

Adaptive underwater image enhancement model based on multiple input fusion

Li Kexun , Gao Zhijun , Liu Jianyong , Zhang Meng , Zhang Wenzhou

2025, 48(15):177-184.

Abstract (240) HTML (0) PDF 8.50 M (482) Comment (0) Favorites

Abstract:In response to common image degradation issues in underwater complex environments, such as low light, color distortion, and blurring, this paper proposes an image enhancement model based on multi-input fusion. First, by combining a standard underwater input image with white balance processing and a denoised input image with contrast enhancement, the model generates corresponding weights by utilizing image degradation information and relying solely on the original image to effectively address the restrictive effects of the underwater medium. Four types of weight maps are then designed to optimize the visibility of distant objects, which is affected by light scattering and absorption, thus improving the overall visual quality and detail representation of the image. Finally, through a multi-scale fusion process, the model progressively merges features at different scales, reducing artifacts and enhancing image details. Experimental results show that the proposed model achieves average values of 0.660 3 for UCIQE, 4.556 9 for UIQM, and 7.434 1 for information entropy on the UIEB, EUVP, and RUIE datasets. Compared with other typical and novel algorithms, the proposed model outperforms in color distortion correction, detail enrichment, contrast enhancement, and subjective visual judgment, validating its superiority and robustness in underwater image enhancement.

Improved YOLO11 algorithm for student classroom behavior detection

Cao Qian , Cao Yi , Qian Chengshan

2025, 48(15):185-198.

Abstract (348) HTML (0) PDF 24.52 M (520) Comment (0) Favorites

Abstract:In response to the issues of complex details loss, insufficient multi-scale perception, low computational efficiency, and low detection accuracy in YOLO11 for classroom behavior detection, an improved ATDW-YOLO algorithm is proposed. Firstly, an Adaptive Polarized Feature Fusion module is constructed in the neck network to im-prove feature semantic fusion capabilities and better capture complex details. Secondly, a task dynamic align detection head module is designed to enhance the model′s recognition ability across multi-scale targets. Subsequently, a dynamic group convolution shuffle transformer module is introduced into the back-bone network to improve feature representation and achieve network lightweight. Finally, the Wise-IoU function replaces the CIoU loss function to improve the bounding box fitting capability and detection accuracy. Experimental results demonstrate that compared to the YOLO11n model, ATDW-YOLO improves mAP0.5 and mAP0.5:0.95 by 3.1% and 4.0%, respectively, while reducing model parameters, computational complexity, and model size by 21.6%, 7.4%, and 20.6%, respectively, significantly enhancing detection accuracy and achieving model lightweight.

Home

Introduction

Editorial Committee

Policy

Contact Us

中文版

>Research&Design

>Theory and Algorithms

>Data Acquisition

>Information Technology & Image Processing