Abstract:To overcome the limitations of existing Transformer-based super-resolution models, which rely on self-attention mechanisms and face challenges in computational complexity and local detail capture, an optimized lightweight super resolution network is proposed. The network aims to efficiently utilize global, non-local, and local features for enhanced reconstruction. First, a spatial-frequency feature aggregation layer, incorporating dynamic strip attention and unbiased dynamic frequency awareness, is used to capture global and non-local features, ensuring that the network can effectively recover image feature. Then, to ensure the restoration of image details, a local detail enhancement layer is constructed to encode local context and perform channel mixing. Finally, multiple spatial-frequency feature modulation blocks progressively extract features and perform up-sampling reconstruction to produce the final super-resolution image. The proposed algorithm was benchmarked on five public super-resolution datasets, including Set14, BSD100, and Urban100. Under the ×2 reconstruction, it reduces FLOPs by 24.2% and requires a smaller training dataset compared with ShuffleMixer, another lightweight super-resolution network, while attaining gains of 0.54 dB in PSNR and 0.0055 in SSIM on the Urban100. Experiments show that the proposed network excels in lightweight super-resolution tasks, achieving a good balance between performance and complexity.