Xubo Liu 刘徐博

Google Scholar GitHub LinkedIn X (Twitter)

Email: liuxubo717@gmail.com

I am a Research Scientist at Meta Superintelligence Labs (MSL), currently working on building “Her”, with a focus on speech tokenization, full-duplex modelling, and post-training. Before joining Meta, I was a Research Scientist & Head of Speech at Stability AI, where I led the company's efforts in generative AI for speech. I obtained my PhD from the University of Surrey, under the supervision of Prof. Wenwu Wang and Prof. Mark D. Plumbley, focusing on the understanding, separation, and generation of audio signals.

Selected Publications

My research centres on machine learning for audio, speech, and language. The selected publications below are grouped by topic; click each heading to expand. For the complete and up-to-date list, please see my Google Scholar.

Audio Tokenizer & Codec
  • Scaling Speech Tokenizers with Diffusion Autoencoders
    Yuancheng Wang, Zhenyu Tang, Yun Wang, Arthur Hinsvark, Yingru Liu, Yinghao Aaron Li, Kainan Peng, Junyi Ao, Mingbo Ma, Mike Seltzer, Qing He, Xubo Liu
    International Conference on Learning Representations (ICLR), 2026
  • Scaling Transformers for Low-Bitrate High-Quality Speech Coding
    Julian D Parker*, Anton Smirnov, Jordi Pons, CJ Carr, Zack Zukowski, Zach Evans, Xubo Liu*
    International Conference on Learning Representations (ICLR), 2025
  • ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
    Dongchao Yang, Songxiang Liu, Haohan Guo, Jiankun Zhao, Yuanyuan Wang, Helin Wang, Zeqian Ju, Xubo Liu, Xueyuan Chen, Xu Tan, Xixin Wu, Helen Meng
    Forty-second International Conference on Machine Learning (ICML), 2025
  • Learning Source Disentanglement in Neural Audio Codec
    Xiaoyu Bie, Xubo Liu, Gaël Richard
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
Audio Generation
  • DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
    Yi Yuan, Xubo Liu, Haohe Liu, Xiyuan Kang, Zhuo Chen, Yuxuan Wang, Mark D Plumbley, Wenwu Wang
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2026
  • RiTTA: Modeling Event Relations in Text-to-Audio Generation
    Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet
    The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
  • Sound-VECaps: Improving Audio Generation With Visual Enhanced Captions
    Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Xiyuan Kang, Mark D Plumbley, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
  • WavJourney: Compositional Audio Creation with Large Language Models
    Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D Plumbley, Wenwu Wang
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  • AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
    Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D Plumbley
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  • Retrieval-Augmented Text-to-Audio Generation
    Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D Plumbley, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  • ComposerX: Multi-Agent Symbolic Music Composition with LLMs
    Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo
    International Society for Music Information Retrieval (ISMIR), 2024
  • WavCraft: Audio Editing and Generation with Natural Language Prompts
    Jinhua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D Plumbley, Huy Phan, Emmanouil Benetos
    ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024
  • AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
    Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D Plumbley
    Proceedings of the 40th International Conference on Machine Learning (ICML), 2023
  • Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
    Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D Plumbley, Wenwu Wang
    IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021
Audio Source Separation
  • TMS: Text-Prompted Multi-channel Speech Separation on Smart Glasses
    Yang Liu, Li Wan, Yiteng Huang, Yifeng Fan, Haohe Liu, Xinhao Mei, Xubo Liu, Ming Sun, Yangyang Shi, Saurabh Adya, Florian Metze, Ariya Rastrow
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026
  • FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
    Yi Yuan, Xubo Liu, Haohe Liu, Mark D Plumbley, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
  • Separate Anything You Describe
    Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D Plumbley, Wenwu Wang
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  • A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining
    Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang
    Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2024
  • Audio Prompt Tuning for Universal Sound Separation
    Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  • Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
    Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D Plumbley, Wenwu Wang
    European Signal Processing Conference (EUSIPCO), 2024
  • Separate What You Describe: Language-Queried Audio Source Separation
    Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D Plumbley, Wenwu Wang
    INTERSPEECH, 2022
Voice Quality Enhancement
  • VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
    Haohe Liu*, Xubo Liu*, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang
    INTERSPEECH, 2022
  • Neural Vocoder is All You Need for Speech Super-resolution
    Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang
    INTERSPEECH, 2022
Audio Captioning
  • Towards Generating Diverse Audio Captions via Adversarial Training
    Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D Plumbley, Wenwu Wang
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  • Visually-Aware Audio Captioning with Adaptive Audio-Visual Attention
    Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H Tang, Mark D Plumbley, Volkan Kılıç, Wenwu Wang
    INTERSPEECH, 2023
  • Knowledge Distillation for Efficient Audio-Visual Video Captioning
    Özkan Çaylı, Xubo Liu, Volkan Kılıç, Wenwu Wang
    European Signal Processing Conference (EUSIPCO), 2023
  • Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
    Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kılıç, Mark D Plumbley, Wenwu Wang
    INTERSPEECH, 2023
  • Automated Audio Captioning: An Overview of Recent Progress and New Challenges
    Xinhao Mei, Xubo Liu, Mark D Plumbley, Wenwu Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • Diverse Audio Captioning via Adversarial Training
    Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D Plumbley, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
  • Leveraging Pre-trained BERT for Audio Captioning
    Xubo Liu, Xinhao Mei, Qiushi Huang, Jianyuan Sun, Jinzheng Zhao, Haohe Liu, Mark D Plumbley, Volkan Kılıç, Wenwu Wang
    European Signal Processing Conference (EUSIPCO), 2022
  • An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
    Xinhao Mei, Qiushi Huang, Xubo Liu, Gengyun Chen, Jingqian Wu, Yusong Wu, Jinzheng Zhao, Shengchen Li, Tom Ko, H Lilian Tang, Xi Shao, Mark D Plumbley, Wenwu Wang
    Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
  • CL4AC: A Contrastive Loss for Audio Captioning
    Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D Plumbley, Wenwu Wang
    Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
  • Audio Captioning Transformer
    Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D Plumbley, Wenwu Wang
    Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Audio Classification
  • Noise-Robust Sound Event Detection and Counting via Language-Queried Sound Separation
    Yuanjian Chen, Yang Xiao, Han Yin, Yadong Guan, Xubo Liu
    IEEE Signal Processing Letters, 2025
  • Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities
    Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D Plumbley, Huy Phan, Emmanouil Benetos
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  • MASV: Speaker Verification with Global and Local Context Mamba
    Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Xinhao Mei, Xubo Liu, Yangyang Shi, Florian Metze
    INTERSPEECH, 2025
  • Disentangling Hierarchical Features for Anomalous Sound Detection Under Domain Shift
    Jian Guan, Jiantong Tian, Qiaoxi Zhu, Feiyang Xiao, Hejing Zhang, Xubo Liu
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
  • T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
    Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D Plumbley, Wenwu Wang
    The IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2024
  • First-Shot Unsupervised Anomalous Sound Detection with Unknown Anomalies Estimated by Metadata-Assisted Audio Generation
    Hejing Zhang, Qiaoxi Zhu, Jian Guan, Haohe Liu, Feiyang Xiao, Jiantong Tian, Xinhao Mei, Xubo Liu, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  • Learning Temporal Resolution in Spectrogram for Audio Classification
    Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D Plumbley
    AAAI Conference on Artificial Intelligence (AAAI), 2024
  • CM-PIE: Cross-modal Perception for Interactive-enhanced Audio-Visual Video Parsing
    Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  • Adapting Language-Audio Models as Few-Shot Audio Learners
    Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D Plumbley, Wenwu Wang
    INTERSPEECH, 2023
  • Ontology-aware Learning and Evaluation for Audio Tagging
    Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D Plumbley
    INTERSPEECH, 2023
  • Simple Pooling Front-ends For Efficient Audio Classification
    Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D Plumbley, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
  • Continual Learning for On-Device Environmental Sound Classification
    Yang Xiao*, Xubo Liu*, James King, Arshdeep Singh, Eng Siong Chng, Mark D Plumbley, Wenwu Wang
    Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2022
  • Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
    Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D Plumbley
    Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE), 2022
  • Token-Level Supervised Contrastive Learning for Punctuation Restoration
    Qiushi Huang, Tom Ko, H Lilian Tang, Xubo Liu, Bo Wu
    INTERSPEECH, 2021
Multimodal Speech Recognition
  • Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
    Umberto Cappellazzo, Xubo Liu, Pingchuan Ma, Stavros Petridis, Maja Pantic
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026
  • MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
    Umberto Cappellazzo, Minsu Kim, Pingchuan Ma, Honglie Chen, Xubo Liu, Stavros Petridis, Maja Pantic
    The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
  • SynthVSR: Scaling Up Visual Speech Recognition with Synthetic Supervision
    Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jachym Kolar, Stavros Petridis, Maja Pantic, Christian Fuegen
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Personalized Dialogue Generation
  • Selective Prompting Tuning for Personalized Conversations with LLMs
    Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang
    The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024
  • Learning Retrieval Augmentation for Personalized Dialogue Generation
    Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, Lilian Tang
    The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
  • Personalized Dialogue Generation with Persona-Adaptive Attention
    Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang
    AAAI Conference on Artificial Intelligence (AAAI), 2023
Speaker Localization & Tracking
  • Visually Assisted Self-supervised Audio Speaker Localization and Tracking
    Jinzheng Zhao, Peipei Wu, Shidrokh Goudarzi, Xubo Liu, Jianyuan Sun, Yong Xu, Wenwu Wang
    European Signal Processing Conference (EUSIPCO), 2022
  • Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
    Jinzheng Zhao, Peipei Wu, Xubo Liu, Shidrokh Goudarzi, Haohe Liu, Yong Xu, Wenwu Wang
    INTERSPEECH, 2022
  • Audio-visual Tracking of Multiple Speakers via A PMBM Filter
    Jinzheng Zhao, Peipei Wu, Xubo Liu, Yong Xu, Lyudmila Mihaylova, Simon Godsill, Wenwu Wang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
AI for Aquaculture
  • Fish tracking, counting, and behaviour analysis in digital aquaculture: a comprehensive survey
    Meng Cui, Xubo Liu, Haohe Liu, Jinzheng Zhao, Daoliang Li, Wenwu Wang
    Reviews in Aquaculture, 2025
  • Audio-Visual Class-Incremental Learning for Fish Feeding Intensity Assessment in Aquaculture
    Meng Cui, Xianghu Yue, Xinyuan Qian, Jinzheng Zhao, Haohe Liu, Xubo Liu, Daoliang Li, Wenwu Wang
    arXiv preprint arXiv:2504.15171, 2025
  • Multimodal Fish Feeding Intensity Assessment in Aquaculture
    Meng Cui, Xubo Liu, Haohe Liu, Zhuangzhuang Du, Tao Chen, Guoping Lian, Daoliang Li, Wenwu Wang
    IEEE Transactions on Automation Science and Engineering, 2024
  • Fish Feeding Intensity Assessment in Aquaculture: A New Audio Dataset AFFIA3K and A Deep Learning Algorithm
    Meng Cui*, Xubo Liu*, Jinzheng Zhao, Jianyuan Sun, Guoping Lian, Tao Chen, Mark D. Plumbley, Daoliang Li, Wenwu Wang
    IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), 2022

* indicates equal contribution.

Education


2020–2024
Ph.D. in Vision, Speech and Signal Processing
University of Surrey, Guildford, UK

2016–2020
B.Sc. in Telecommunications Engineering (First Class Honours)
Queen Mary University of London, London, UK
Employment


2025–Present
Research Scientist
Meta Superintelligence Labs, London, UK

2024–2025
Research Scientist & Head of Speech
Stability AI, London, UK
Professional Services

Area Chair:
  • ICLR 2026, NeurIPS 2026
Journal Reviewer:
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • IEEE Signal Processing Letters
  • International Journal of Computer Vision
Conference Reviewer:
  • ICLR 2025, ICML 2024–2025, CVPR 2024–2025, ICCV 2025, NeurIPS 2025, AAAI 2025, ACL 2025, ICASSP 2023–2025, ASRU 2025, ECCV 2024, INTERSPEECH 2023–2024, DCASE 2024, EMNLP 2023, MLSP 2023
Special Session & Challenge Organizer:
Teaching Experience

Mentoring

Research Interns at Meta: Research Interns at Stability AI:
Invited Talks and Guest Lectures

My Cats

Beyond research, my days are happily ruled by three lovely girls 😻


The three of them

Goo-lee Joo (朱古力)

Hu-Hu (刘呼呼)

Bu-Bu (刘布布)

Template credits: Unnat