Meng Cao      

Currently, Meng Cao is a postdoctoral researcher at ohamed bin Zayed University of Artificial Intelligence (MBZUAI), working with Prof. Xiaodan Liang and Prof. Ian D Reid. Prior to that, he worked as a researcher at International Digital Economy Academy (IDEA), supervised by Lei Zhang. He received the Ph.D. degree from School of Computer Science, Peking University, supervised by Prof. Yuexian Zou (2018 - 2023). He received his B.E. degree from Huazhong University of Science and Technology (2014 - 2018).

During his Ph.D. period, he also worked closely with Prof. Mike Z. Shou from National University of Singapore, Prof. Long Chen from The Hong Kong University of Science and Technology, and Fangyun Wei from Microsoft Research Asia.

His primary research interests are Computer Vision and Multimedia Analysis. He aims to build an interactive AI assistant that can not only ground and reason over multi-modal signals, but also assist humans in customized content creation.

Email  /  CV  /  Google Scholar  /  Github

News

  • [2024/05]   Our paper on parameter-efficient text-video retrieval is accepted by findings of the Association for Computational Linguistics (ACL).
  • [2023/08]   Our paper on spatio-temporal video grounding is accepted by ACM International Conference on Multimedia Workshop.
  • [2023/08]   Our paper on video grounding is accepted by ICCV 2023 as as oral presentation.
  • [2023/05]   I successfully passed my PhD thesis defense. I'd like to thank everyone who has helped me. Thank you all!
  • [2023/03]   Our paper on weakly-supervised video grounding is accepted by CVPR 2023.
  • [2022/08]   Our team win the first place (1/4278) in Tianchi Challenge of Financial QA under Market Volatility.
  • [2022/07]   Our paper on temporal action localization is accepted by IEEE-TIP.
  • [2022/07]   Our paper on video-text pre-training is accepted by ECCV 2022.
  • [2022/06]   Our paper on spatio-temporal video grounding is accepted by ACM MM 2022.
  • [2021/08]   Our paper on video grounding is accepted by EMNLP 2021 as oral presentation.
  • [2021/06]   Our paper on video portrait manipulation is accepted by IEEE-TIP.
  • [2021/03]   Our paper on temporal action localization is accepted by CVPR 2021.
  • [2021/03]   Our paper on scene text detecton is accepted by IEEE-TCSVT.

  • Education

    Peking University
    Doctoral Student in School of Computer Science      • Sep. 2018 - Jun. 2023
    Supervisors: Prof. Yuexian Zou

    Huazhong Univeristy of Science and Technology
    Bachelor Degree      • Aug. 2014 - Jun. 2018

    Internship Experience

    Tencent AI Lab
    Engineering and Research Intern       • Jul. 2019 - May. 2022
    Advisors:   Haozhi Huang, Hao Wang, Long Chen, Tianyu Yang, and Wei Liu

    Micorsoft Research Asia
    Research Intern       • Jun. 2022 - Mar. 2023
    Advisors:   Fangyun Wei, and Can Xu

    Selected Publication [Google Scholar]
    RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
    Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li
    Findings of the Association for Computational Linguistics, ACL 2024 Findings
    [Paperlink], [Code]
    Area: Text-Video Retrieval, Parameter-efficient Fine-tuning

    We propose RAP to conduct efficient text-video retrieval with a sparse-and-correlated adapter.

    Iterative Proposal Refinement for Weakly-Supervised Video Grounding
    Meng Cao, Fangyun Wei, Can Xu, Xiubo Geng, Long Chen, Can Zhang, Yuexian Zou, Tao Shen, Daxin Jiang
    IEEE Computer Vision and Pattern Recognition Conference, CVPR 2023
    [Paperlink], [Code]
    Area: Video Grounding, Weakly-Supervised Learning

    We introduce IRON, which includes a novel iterative proposal refinement module to model explicit correspondence for each proposal at both semantic and conceptual levels.

    LocVTP: Video-Text Pre-training for Temporal Localization
    Meng Cao, Tianyu Yang, Junwu Weng, Can Zhang, Jue Wang, Yuexian Zou
    European Conference on Computer Vision, ECCV 2022
    [Paperlink], [Code]
    Area: Video-Langauge Pre-training, Video Retrieval, Temporal Grounding

    We propose a novel Localization-oriented VideoText Pre-training framework, dubbed as LocVTP, which benefits both retrieval-based and the less-explored localization-based downstream tasks.

    On Pursuit of Designing Multi-modal Transformer for Video Grounding
    Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou
    Empirical Methods in Natural Language Processing, EMNLP 2021
    (Oral Presentation)
    [Paperlink], [Code]
    Area: Video Grounding, Transformer Architecture Design

    We propose the first end-to-end model GTR for video grounding, which is inherently efficient with extremely fast inference speed. Our comprehensive explorations and empirical results can help to guide the design of more multi-modal Transformerfamily models in other multi-modal tasks..

    Correspondence Matters for Video Referring Expression Comprehension
    Meng Cao, Ji Jiang, Long Chen, Yuexian Zou
    ACM Multimedia, ACM MM 2022
    [Paperlink], [Code]
    Area: Referring Expression Comprehension, Correspondence Modeling

    We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.

    Deep Motion Prior for Weakly-Supervised Temporal Action Localization
    Meng Cao, Can Zhang, Long Chen, Mike Zheng Shou, Yuexian Zou
    IEEE Transactions on Image Processing, TIP 2022
    [Paperlink], [Code], [Video]
    Area: Temporal Action Localization, Weakly-Supervised Learning, Graph Network.

    We establish a context-dependent deep motion prior with a novel motion graph and propose an efficient motion-guided loss to inform the whole pipeline of more motion cues.

    UniFaceGAN: A UniFied Framework for Temporally Consistent Facial Video Editing
    Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, Jiebo Luo
    IEEE Transactions on Image Processing, TIP 2021
    [Paperlink], [Code], [Video]
    Area: Facial Video Manipulation, Generative Adversarial Network

    We present a unified framework that offers solutions for multiple tasks, including face swapping, face reenactment, and ``fully disentangled manipulation".

    All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection
    Meng Cao, Can Zhang, Dongming Yang, Yuexian Zou
    IEEE Transactions on Circuits and Systems for Video Technology, TCSVT 2021
    [Paperlink]
    Area: Scene Text Detection

    We propose a two-stage segmentation-based scene text detector, which conducts the detection in a coarse-to-fine manner. Besides, we establish a much tighter representation for arbitrary-shapedtexts.

    Technical Report for WAIC Challenge of Financial QA under Market Volatility
    Meng Cao, Ji Jiang, Qichen Ye, Yuexian Zou
    Technical Report for TianChi Challenge
    [Paperlink]
    Winner of Financial QA Challenge

    We address the problem of financial QA by proposing a graph transformer model for the efficient multi-source information fusion. As a result, we won the first place out of 4278 participating teams and outperformed the second place by 5.07 times on BLUE.

    Academic Service

  • Conference Reviewer:   ICCV'21, CVPR'22, ECCV'22, EMNLP'22, AAAI'23, CVPR'23, ICCV'23, NeurIPS'23
  • Journal Reviewer:   IEEE TCSVT, ACM ToMM, IEEE TIP

  • Honors & Scholarships

  • Outstanding graduate of Peking University,  2023
  • Outstanding graduates of Beijing City,  2023
  • May Fourth Scholarship (Annual Highest Honor Scholarship in PKU),  2022
  • The first place (1/4278) in Challenge of Financial QA under Market Volatility,  2022
  • Excellent Student Scholarship at PKU,  2018~2023
  • Academic Innovation Award of Peking University,  2020
  • Excellent Graduate at HUST (Bachelor degree),  2018
  • National Endeavor Scholarship (Top 5% students per year),  2017
  • Honorable Mention Price, Mathematical Contest in Modeling,  2017
  • Third prize in National Mathematical Modeling Contest,  2016


  • Last updated on Oct, 2023

    This awesome template borrowed from this guy~