Research on multimodal sentiment analysis technology based on conversations
DOI:
CSTR:
Author:
Affiliation:

School of Computer Science and Technology, North University of China,Taiyuan 030051, China

Clc Number:

TP391;TN912.34

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Focused on the issue that multimodal emotion recognition in conversation (MERC) is difficult to effectively capture cross-modal semantic associations in conversation rounds and has limited discrimination ability for minority classes and semantically confusing classes of emotions, a new multimodal sentiment analysis model (FuseNet) is proposed. This model adopts the bidirectional attention dialogue encoder (BiDRN) to capture the context dependency of the dialogue, effectively integrates audio and visual cues from different speakers, and realizes dynamic multimodal fusion through the Hi-gated fusion module based on the hierarchical gated mechanism. Meanwhile, class-aware multimodal contrastive (CAMC) loss is introduced to enhance the inter-class discriminability and improve the discrimination ability of minority classes and semantically similar sentiment categories. Experimental results on the two benchmark ERC datasets of IEMOCAP and MELD show that compared with the current advanced model CORECT, the F1 score of the proposed framework has improved by 2.91% and 2.00%, respectively, which are better than the existing baseline model in terms of classification performance in most emotions, especially in identifying a few classes and semantic similar categories of emotions.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: May 13,2026
  • Published:
Article QR Code