Xubo Liu 刘徐博

I am a final-year Ph.D. student in Centre for Vision, Speech & Singal Processing at University of Surrey advised by Prof. Wenwu Wang and Prof. Mark D. Plumbley. My passion is to build AI models to understand the world with multi-modalities and engage with humans. Currently, I work on computational auditory scene analysis, multimodal content creation and large language models for audo/speech/music signals.

Previously, I spent four months working with Dr. Christian Fuegen and Dr. Egor Lakomkin at Meta AI, London. During my PhD, I worked closely with Dr. Qiuqiang Kong at the Chinese University of Hong Kong (CUHK). I graduated with First Class Honors from Queen Mary University of London in 2020 with a BSc in Telecommunications Engineering.

I am open to research collaborations. Please feel free to email me.

Email: xubo.liu@surrey.ac.uk

Personal: [Google Scholar] | [Github] | [Linkedin] | [Twitter]

Publications
sym
Separate Anything You Describe
Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D Plumbley, Wenwu Wang
arXiv:2308.05037
paper | project | code GitHub stars
Media coverage:
media logo media logo media logo media logo media logo media logo media logo media logo media logo
sym
WavJourney: Compositional Audio Creation with Large Language Models
Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D Plumbley, Wenwu Wang
arXiv:2307.14335
paper | project | code GitHub stars

Media coverage:
media logo media logo media logo media logo media logo media logo media logo media logo media logo
sym
SynthVSR: Scaling Up Visual Speech Recognition with Synthetic Supervision
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jachym Kolar, Stavros Petridis, Maja Pantic, Christian Fuegen
CVPR 2023
paper | project
sym
Visually-Aware Audio Captioning with Adaptive Audio-Visual Attention
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H Tang, Mark D Plumbley, Volkan Kılıç, Wenwu Wang
Interspeech 2023
paper | code GitHub stars
sym
Simple Pooling Front-ends For Efficient Audio Classification
Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang
ICASSP 2023
paper | code GitHub stars
sym
Separate What You Describe: Language-Queried Audio Source Separation
Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D Plumbley, Wenwu Wang
Interspeech 2022
paper | project | code GitHub stars
sym
Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D Plumbley, Wenwu Wang
MLSP 2021
paper | code GitHub stars
sym
CL4AC: A Contrastive Loss for Audio Captioning
Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D Plumbley, Wenwu Wang
DCASE Workshop 2021
paper | code GitHub stars
Professional Services

Special Session Chair of Multimodal Learning for Audio and Language" at EUSIPCO 2023
Journal reviewer: IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), International Journal of Computer Vision (IJCV)
Conference reviewer: ICML (24), CVPR (24), EMNLP (23), ICASSP (23-24), INTERSPEECH (22-24), MLSP (23)

Template credits: Unnat