基于激活调制的双分支弱监督语义分割
DOI:
CSTR:
作者:
作者单位:

1.贵州民族大学数据科学与信息工程学院 贵阳 550025; 2.贵州省模式识别与人工智能系统重点实验室 贵阳 550025

作者简介:

通讯作者:

中图分类号:

TP391.4;TN791

基金项目:

贵州省科技计划项目(QKHJCZK2022YB195,QKHJCZK2023YB143,QKHPTRCZCKJ2021007,QKHJCZK2022YB197)、贵州省教育厅自然科学研究项目(QJJ2022015,QJJ2023061,QJJ2023012,QJJ2022047,QJJ2024063)、贵州民族大学博士研究启动项目(GZMUZK[2024]QD04)资助


Dual branch weakly supervised semantic segmentation based on activation modulation
Author:
Affiliation:

1.College of Data Science and Information Engineering, Guizhou Minzu University,Guiyang 550025, China; 2.Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou Province,Guiyang 550025, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    图像级标注的语义分割因具有友好的注释和令人满意的性能而被广泛研究。针对类激活图激活区域稀疏、前背景间语义模糊问题,提出基于激活调制的双分支弱监督语义分割网络。该网络以Resnet50和Vision Transformer作为双分支特征提取网络,并设计激活调制模块嵌入卷积分支,该模块迫使模型激活中间分数的像素,生成紧凑的类激活图,从而缓解类激活图激活区域稀疏的问题。其次,提出基于余弦退火衰减的动态阈值调整策略,该策略在训练过程中自适应的确定背景最高阈值,使更多低置信前景像素参与到分割训练中,生成完整且准确的分割图。在PASCAL VOC 2012以及MS COCO 2014数据集上验证该网络的有效性。PASCAL VOC 2012验证集和测试集上的mIou值分别为74.2%和74.0%,在MS COCO 2014验证集上的mIou值为45.9%。实验结果表明,该网络可以解决前背景颜色相似场景下的误分割问题并取得优异的分割性能。

    Abstract:

    Semantic segmentation with image-level annotation has been widely studied for its friendly annotation and satisfactory performance. Aiming at the problem of sparse activation regions and semantic ambiguity between foreground and background of class activation maps, a dual-branch weakly supervised semantic segmentation network based on activation modulation is proposed. The network uses Resnet50 and Vision Transformer as a two-branch feature extraction network, and designs an activation modulation module embedded in the convolutional branch, which forces the model to activate the intermediate fraction of pixels to generate a compact class activation map, thus alleviating the problem of sparse activation regions of class activation maps. Second, a dynamic threshold adjustment strategy based on cosine annealing decay is proposed, which adaptively determines the highest background threshold during the training process, so that more low-confidence foreground pixels are involved in the segmentation training, and complete and accurate segmentation maps are generated. The effectiveness of the network is verified on the PASCAL VOC 2012 as well as MS COCO 2014 datasets. mIou values are 74.2% and 74.0% on the PASCAL VOC 2012 validation and test sets, respectively, and 45.9% on the MS COCO 2014 validation set. The experimental results show that the network can solve the mis-segmentation problem and achieve excellent segmentation performance in scenes with similar front background colours.

    参考文献
    相似文献
    引证文献
引用本文

王家莉,谭棉,冯夫健.基于激活调制的双分支弱监督语义分割[J].电子测量技术,2024,47(24):139-148

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-01-24
  • 出版日期:
文章二维码