Yixuan Xiao
I am currently a PhD student at the University of Stuttgart, supervised by Prof. Dr. Thang Vu. I started my PhD in 2024 April. My research interests lie in speech processing tasks such as audio deepfake detection and speech synthesis. My contact info can be found here. I received my M.Sc. in Computational Linguistics, also from the University of Stuttgart. My Master’s thesis was titled “Mitigating Text Domain Mismatch in ASR Systems through Prompt-based Learning” and supervised by Prof. Dr. Thang Vu.
Prior to this, I worked as a senior algorithm engineer at Baidu’s Speech Team (specialized in high-performance computing) and at NetEase Youdao’s AI Team (specialized in ASR and computer-aided pronunciation training). Earlier, I completed a taught Master’s programme in Artificial Intelligence at the University of Edinburgh and a B.Sc. in Computer Science and Technology at Beijing’s Institute of Technology, supervised by Prof. Dr. Xianling Mao.
Teaching
Courses
I am/have been the (co-)lecturer for the following courses:
- (2026SS)
- Computational Linguistics Team Laboratory: Phonetics: Main Lecturer
- (2025WS)
- Introduction to Deep Learning for Speech and Text Processing: Co-lecturer, All Coding Sessions + Project Phase
- Current Topics in Speech Technology: Co-lecturer, TTS, NAC, and Audio Deepfake Detection, Paper reading sessions
- (2025SS)
- Computational Linguistics Team Laboratory: Phonetics: Main Lecturer
- Optional Research Module: Project-based Research
- (2024WS)
- Advanced Deep Learning: Co-lecturer, Continual Learning
- Current Topics in Speech Technology: Co-lecturer, TTS, and Audio Deepfake Detection, Paper reading sessions
Publications
- [Dataset] Xiao, Yixuan, Florian Lux, Alejandro Pérez-González-de-Martos, and Ngoc Thang Vu. “Codecdeepfakedetection”. https://doi.org/10.5281/zenodo.17225924
- Relevant paper: How to Label Resynthesized Audio? The Dual Role of Neural Audio Codecs in Audio Deepfake Detection, Accepted at ICASSP 2026.
- Xiao, Yixuan, and Ngoc Thang Vu. “Layer-Wise Decision Fusion for Fake Audio Detection Using XLS-R.” In Proc. Interspeech 2025, pp. 5618-5622. 2025. code
- Xiao, Yixuan, and Ngoc Thang Vu. “What Affects the Performance of Fake Audio Detection? Analyzing Factors in a Continual Learning Setting.” In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2025. code.
Supervision
Thesis Topics
NOTE: Due to a high number of teaching responsibilities in 2026SS (One 4 SWS lab course, two or maybe three Master’s thesis projects), I will not be able to supervise new students. Thank you for your understanding.
I used to supervise the following topics:
- Audio Deepfake Detection: Model training and analysis; requires familiarity with our codebase IMS-ADD.
- Codec-based Speech Synthesis: Prompting and fine-tuning TTS or ALM models; audio reconstruction using neural audio codecs.
- Speech Analysis: Analyzing speech to better understand speech models. Example tools include
librosa,openSMILE,Parselmouth,speechmetrics, andSpeechBrain. Relevant models can sometimes be found on HuggingFace, e.g., speech enhancement models.
HiWi Position
NOTE:Because supervision workload is expected to be high in 2026SS, I also won’t be able to supervise research projects, hence no research-oriented HiWi positions are available.