Research on autism screening based on cross-modal feature fusion
DOI:
CSTR:
Author:
Affiliation:

School of Information and Communications Engineering,North University of China,Taiyuan 030051,China

Clc Number:

TN911.7

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Because children with autism show abnormalities in visual attention in their early years, it provides an important distinguishing criterion for early intervention. In view of the insufficient attention paid to semantic alignment and dynamic interaction between modalities in autism research, this study proposes a multimodal model that integrates saliency maps and eye movement trajectory data features, providing an objective implementation method for the diagnosis of autism. This method constructs a dualstream network architecture: the U-Net feature extractor is used to process the saliency map, and the temporal convolutional network is utilized to conduct temporal modeling of the eye movement trajectory. To achieve dynamic weighted fusion between two different modal data, a cross-modal attention mechanism is introduced. During the process of time series modeling, eye movement trajectory prediction is carried out simultaneously. Additionally, the prediction error is introduced as a distinguishing feature into the classification process to enhance the classification performance of the model. Through comparative experiments, it was verified that the proposed model achieved an accuracy rate of 98.89% in the early screening task of autism.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: February 26,2026
  • Published:
Article QR Code