Yansong Tang

I am a tenure-track Assistant Professor of Tsinghua-Berkeley Shenzhen Institute/Shenzhen International Graduate School, Tsinghua University, where I direct the IVG@SZ (Intelligent Vision Group at Shenzhen, the sister group of the IVG at Beijing). Before that, I was a postdoctoral researcher at the Department of Engineering Science of the University of Oxford, working with Prof. Philip H. S. Torr and Prof. Victor Prisacariu. My research interests lie in computer vision. Currently, I am working in the fields of video analytics, vision-language understanding and 3D reconstruction.

I received my Ph.D degree with honour at Tsinghua University, advised by Prof. Jie Zhou and Prof. Jiwen Lu, and B.S. degree in Automation from Tsinghua University. I have also spent time at Visual Computing Group of Microsoft Research Asia (MSRA), and Prof. Song-Chun Zhu’s VCLA lab of University of California, Los Angeles (UCLA).

I am looking for self-motivated Master/PhD/Postdoc. If you have top grades or coding skill, and are highly creative and interested in joining my group, please do not hesitate to send me your CV and transcripts of grades by Email. Due to the large number of emails I receive, I appoligize that I may not respond to every email individually.

profile photo

  • 2022-10: Our team led by Yong Liu (incoming Ph.D. student of IVG@SZ) won the 1st place in the Long Video Instance Segmentation Challenge (ECCV 2022 Workshop).
  • 2022-09: Two papers got accepted by NeurIPS 2022.
  • 2022-07: ScalableViT and GSFM got accepted by ECCV 2022.
  • 2022-04: A talk at MSRA about LAVT.
  • 2022-03: Five papers to appear in CVPR 2022.
  • Recent Selected Publications [ Full List ]

    (*Equal Contribution, #Corresponding Author)

    dise HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
    Yongming Rao*, Wenliang Zhao*, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu
    Conference on Neural Information Processing Systems (NeurIPS), 2022
    [arXiv] [Code] [Project Page] [中文解读]

    HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.

    dise OrdinalCLIP: Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
    Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
    Conference on Neural Information Processing Systems (NeurIPS), 2022
    [arXiv] [Code] [Project Page] [中文解读]

    We present a language-powered paradigm for ordinal regression.

    dise LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
    Zhao Yang*, Jiaqi Wang*, Yansong Tang#, Kai Chen, Hengshuang Zhao, Philip H.S. Torr
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [arxiv] [code] [中文解读]

    We present an end-to-end hierarchical Transformer-based network for referring segmentation.

    dise BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
    Kejie Li, Yansong Tang, Victor Adrian Prisacariu, Philip H.S. Torr
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [arxiv] [code] [中文解读]

    We present Bi-level Neural Volume Fusion, which leverages recent advances in neural implicit representations and neural rendering for dense 3D reconstruction. In order to incrementally integrate new depth maps into a global neural implicit representation, we propose a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design.

    dise DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
    Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [arXiv] [Code] [Project Page] [中文解读]

    DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

    dise Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
    Yansong Tang, Jiwen Lu, and Jie Zhou
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
    [arXiv] [Project Page] [中文解读]

    COIN is currently the largest and most comprehensive instructional video analysis datasets with rich annotations.

    dise Uncertainty-aware Score Distribution Learning for Action Quality Assessment
    Yansong Tang*, Zanlin Ni*, Jiahuan Zhou, Danyang Zhang, Jiwen Lu, Ying Wu, and Jie Zhou
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    Oral Presentation
    [arxiv] [Code]

    We propose an uncertainty-aware score distribution learning method and extend it to a multi-path model for action quality assessment.


  • Data Mining: Theory and Algorithms, Fall 2022 (with Prof. Xinlei Chen)
  • Selected Honors and Awards

  • Excellent Doctoral Dissertation Award of CAAI, 2021.
  • Excellent PhD Graduate of Beijing, 2020.
  • Excellent Doctoral Dissertation of Tsinghua University, 2020.
  • Zijing Scholar Fellowship for Prospective Researcher, Tsinghua University, 2020.
  • Group

  • PhD Students:
    Zhiheng Li (2021-; with Prof. Jie Zhou) Yixuan Zhu (2022-; with Prof. Jie Zhou)
  • Master Students:
    Xiaoke Huang (2021-; with Prof. Jiwen Lu) Rong He (2021-; with Prof. Jiwen Lu)
    Yiji Cheng (2022-) Jinpeng Liu (2022-)
    Aoyang Liu (2022-) Sujia Wang (2022-)
    Yunzhi Teng (2022-) Wenjia Geng (2022-; with Prof. Jie Zhou)
  • Academic Services

  • Area Chair: FG 2023
  • Conference Reviewer: CVPR, ICCV, ECCV, AAAI and so on
  • Journal Reviewer: TPAMI, TIP, TMM, TCSVT and so on

  • Website Template