Abstract:Multi-sensor simulataneous localization and mapping (SLAM) mitigates single-sensor limitations, yet current methods still face challenges such as monocular scale ambiguity, inaccurate intrtial measurement unit (IMU) initialization, and limited local mapping precision. This paper proposes a tightly-coupled, factor graph-based SLAM approach that fuses data from three heterogeneous sensors: a 3D light detection and ranging (LiDAR), an IMU, and a camera. For initialization, LiDAR data provides depth for visual features, and outliers are removed through neighborhood selection and statistical optimization to improve accuracy. Visual, LiDAR, and IMU data are then fused to jointly estimate IMU biases and gravity direction, reducing vertical map drift. For local optimization, factor graphs dynamically maintain keyframes and local maps within sliding windows. Visual constraints are refined through co-visibility projection matching, efficiently purging redundant map points while boosting accuracy and robustness. Global optimization incorporates loop-closure factors detected via specialized algorithms and applies incremental optimization to the factor graph, suppressing cumulative error without compromising real-time performance. The proposed method is evaluated on KITTI, M2UD extreme weather, and real-campus datasets. It reduces the absolute trajectory error by 53.1% on KITTI, 66% in M2UD rain/snow scenarios, and 20.3% in campus environments compared to LIO-SAM. The resulting maps exhibit higher structural consistency and geometric accuracy in both overhead and side views.