Abstract:The traditional binocular Semi-Global Matching (SGM) algorithm is computationally complex and demands significant computational resources, making it challenging to meet the real-time processing and low-power requirements of small-scale embedded systems. To address this issue, this paper proposes an improved solution based on FPGA architecture, aiming to enhance the real-time performance, resource utilization, and reduce resource overhead of the stereo SGM algorithm. The improved SGM algorithm adjusts the direction of the cost aggregation to align with the data flow direction of the FPGA, enabling four-path parallel computation. In the disparity calculation phase, a binomial-based subpixel interpolation technique is introduced, allowing disparity computation and optimization to proceed simultaneously, thus reducing computation delay and further reduce resource consumption and system power usage. Experimental results show that, compared to the traditional SGM algorithm, the proposed method reduces the average disparity error by 32.4%, improves the LUT resource utilization by 45%, decreases resource consumption by 25%, achieves a matching rate of 65.3 fps, and maintains a system power consumption of only 2.85 W, meeting the requirements for small-scale real-time embedded systems.