Lip-reading model based on multi-feature fusion
DOI:
CSTR:
Author:
Affiliation:

School of Control and Computer Engineering, North China Electric Power University,Beijng 102206, China

Clc Number:

TP391.4; TN911.73

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Mainstream word-level lipreading models, based on three-dimensional convolutional neural networks and residual networks, struggle to capture the geometric dynamics of lip movements. Their reliance on pixel-level texture details makes them highly sensitive to noise and facial variations. To address these limitations, this paper proposes an end-to-end word-level lipreading model that integrates pixel-level texture detail features, geometry-level contour shape features, and word boundary features, achieving comprehensive multi-feature fusion across temporal, spatial, pixel-level, and geometric-level dimensions. The proposed model incorporates the spatial and channel squeeze-and-excitation mechanism into 3D CNNs and ResNet-18 to enhance texture feature extraction, while an improved spatial-temporal graph convolutional network integrates a global context network to strengthen global geometric relationships. Additionally, word boundary features further guide the model to focus on relevant temporal frames, reducing noise sensitivity. These features are fused and processed by a back-end temporal module to complete the recognition task. Experiments show that when the input is grayscale video, the accuracy of this paper′s model on the publicly available large-scale word-level lip recognition dataset LRW reaches 89.3%, which is improved by 1.3%~3.9% compared with single or partial feature models under the same conditions, and higher than most existing models, which verifies the validity of the proposed model; at the same time, experiments find that, when colorful video is used as the input, the accuracy of the model further improves to 89.7%, verifying the effect of color information on lip recognition.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: July 28,2025
  • Published:
Article QR Code