Yansong Tang
I am a tenure-track Assistant Professor of Tsinghua-Berkeley Shenzhen Institute/Shenzhen International Graduate School, Tsinghua University, where I direct the IVG@SZ (Intelligent Vision Group at Shenzhen, the sister group of the IVG at Beijing). Before that, I was a postdoctoral researcher at the Department of Engineering Science of the University of Oxford, working with Prof. Philip H. S. Torr and Prof. Victor Prisacariu. My current research interests lie in computer vision, computer graphics and machine learning.
I received my B.S. degree and Ph.D degree with honour from Tsinghua University, advised by Prof. Jie Zhou and Prof. Jiwen Lu. I have also spent time at Prof. Song-Chun Zhu’s VCLA lab of University of California, Los Angeles (UCLA), and Microsoft Research Asia (MSRA), hosted by Dr. Han Hu and Dr. Xin Tong respectively.
I am looking for self-motivated Master/PhD/Postdoc. If you have top grades or coding skill, and are highly creative and interested in joining my group, please do not hesitate to send me your CV and transcripts of grades by Email after reading this file.
|
|
News
2023-07: Four papers on video understanding and generation got accepted by ICCV 2023.
2023-02: Two papers on human activity understanding got accepted by CVPR 2023.
2023-01: One paper on instructional video analysis got accepted by ICLR 2023.
2023-01: One paper on referring segmentation was selected as ORAL presentation by AAAI 2023.
2022-10: Our team led by Yong Liu (incoming Ph.D. student of IVG@SZ) won the 1st place in the Long Video Instance Segmentation Challenge (ECCV 2022 Workshop).
2022-09: Two papers got accepted by NeurIPS 2022.
2022-07: ScalableViT and GSFM got accepted by ECCV 2022.
2022-04: A talk at MSRA about LAVT.
|
Recent Selected Publications [ Full List ]
(*Equal Contribution, #Corresponding Author)
|
|
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Guangyi Chen*, Xiao Liu*, Guangrun Wang, Kun Zhang, Philip H.S. Torr, Xiao-Ping Zhang, Yansong Tang#
IEEE International Conference on Computer Vision (ICCV), 2023
[arXiv]
[Project Page]
We present Tem-Adapter, a method that improves VQA by leveraging image-based knowledge and introducing temporal and semantic aligners.
|
|
Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang#, Jiwen Lu, Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2023
[arXiv][Project Page](coming soon)
We propose Skip-Plan, a condensed action space learning method for procedure planning in instructional videos.
|
|
FLAG3D: A 3D Fitness Activity Dataset with Language Instruction
Yansong Tang*, Jinpeng Liu*, Aoyang Liu*, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie Zhou, Xiu Li
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[arXiv]
[Project Page]
We present FLAG3D, a large-scale 3D fitness activity dataset with language instruction.
|
|
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang#
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[PDF]
[Project Page]
LOGO is a new multi-person long-form video dataset for action quality assessment.
|
|
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao*, Wenliang Zhao*, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu
Conference on Neural Information Processing Systems (NeurIPS), 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.
|
|
OrdinalCLIP: Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu, Jie Zhou
Conference on Neural Information Processing Systems (NeurIPS), 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
We present a language-powered paradigm for ordinal regression.
|
|
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang*, Jiaqi Wang*, Yansong Tang#, Kai Chen, Hengshuang Zhao, Philip H.S. Torr
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arxiv]
[code]
[中文解读]
We present an end-to-end hierarchical Transformer-based network for referring segmentation.
|
|
BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
Kejie Li, Yansong Tang, Victor Adrian Prisacariu, Philip H.S. Torr
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arxiv]
[code]
[中文解读]
We present Bi-level Neural Volume Fusion, which leverages recent advances in neural implicit representations and neural rendering for dense 3D reconstruction. In order to incrementally integrate new depth maps into a global neural implicit representation, we propose a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design.
|
|
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
|
|
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
Yansong Tang, Jiwen Lu, and Jie Zhou
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
[arXiv]
[Project Page]
[中文解读]
COIN is currently the largest and most comprehensive instructional video analysis datasets with rich annotations.
|
|
Uncertainty-aware Score Distribution Learning for Action Quality Assessment
Yansong Tang*, Zanlin Ni*, Jiahuan Zhou, Danyang Zhang, Jiwen Lu, Ying Wu, and Jie Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Oral Presentation
[arxiv]
[Code]
We propose an uncertainty-aware score distribution learning method and extend it to a multi-path model for action quality assessment.
|
Teaching
Data Mining: Theory and Algorithms, Fall 2022 (with Prof. Xinlei Chen)
Deep Learning: Frontier and Interdisciplinary Research, Fall 2023
|
Selected Honors and Awards
Startrack Program by MSRA, 2023.
Young Elite Scientist Sponsorship Program by CAST, 2022.
Excellent Doctoral Dissertation Award of CAAI, 2021.
Excellent PhD Graduate of Beijing, 2020.
Excellent Doctoral Dissertation of Tsinghua University, 2020.
Zijing Scholar Fellowship for Prospective Researcher, Tsinghua University, 2020.
|
Group
PhD Students:
Master Students:
|
Academic Services
Area Chair: FG 2023
Conference Reviewer: CVPR, ICCV, ECCV, AAAI and so on
Journal Reviewer: TPAMI, TIP, TMM, TCSVT and so on
|
|