Abstract:An information-rich and computationally efficient cost volume is crucial for high-precision and high-efficiency stereo matching. To construct such a cost volume and achieve accurate stereo matching, a lightweight network, Efficient-ACVNet, was proposed based on Fast-ACVNet to improve the efficiency of cost volume construction in stereo matching. First, a computationally less intensive 3D cost volume was used as cost volume attention. Inverse bottleneck residual blocks were employed to stack a symmetric hourglass structure for cost aggregation of the 3D cost volume, and multi-scale disparity channel attention modules was introduced to further enhance the aggregation effect. The aggregated 3D cost volume served as cost volume attention to construct and filter the information-redundant 4D cost volume, improving its information content and computational efficiency. Finally, pseudo-3D residual blocks were introduced and pseudo-3D downsampling modules was designed for cost aggregation of the 4D cost volume, further reducing network complexity. Experimental results showed that compared to the baseline method, the proposed algorithm reduced the endpoint error (EPE) by 9.375% on the SceneFlow dataset, decreased the outlier percentage (D1-fg) in the foreground region by 19% on the KITTI15 dataset, and reduced network runtime from 39 ms to 25 ms.