Yixuan Xiao

I am currently a PhD student at the University of Stuttgart, supervised by Prof. Dr. Thang Vu. I started my PhD in 2024 April. My research interests lie in speech processing tasks such as audio deepfake detection and speech synthesis. My contact info can be found here. I received my M.Sc. in Computational Linguistics, also from the University of Stuttgart. My Master’s thesis was titled “Mitigating Text Domain Mismatch in ASR Systems through Prompt-based Learning” and supervised by Prof. Dr. Thang Vu.

Prior to this, I worked as a senior algorithm engineer at Baidu’s Speech Team (specialized in high-performance computing) and at NetEase Youdao’s AI Team (specialized in ASR and computer-aided pronunciation training). Earlier, I completed a taught Master’s programme in Artificial Intelligence at the University of Edinburgh and a B.Sc. in Computer Science and Technology at Beijing’s Institute of Technology, supervised by Prof. Dr. Xianling Mao.

Teaching

Courses

I am/have been the (co-)lecturer for the following courses:

(2026SS)
- Computational Linguistics Team Laboratory: Phonetics: Main Lecturer
(2025WS)
- Introduction to Deep Learning for Speech and Text Processing: Co-lecturer, All Coding Sessions + Project Phase
- Current Topics in Speech Technology: Co-lecturer, TTS, NAC, and Audio Deepfake Detection, Paper reading sessions
(2025SS)
- Computational Linguistics Team Laboratory: Phonetics: Main Lecturer
- Optional Research Module: Project-based Research
(2024WS)
- Advanced Deep Learning: Co-lecturer, Continual Learning
- Current Topics in Speech Technology: Co-lecturer, TTS, and Audio Deepfake Detection, Paper reading sessions

Publications

[Dataset] Xiao, Yixuan, Florian Lux, Alejandro Pérez-González-de-Martos, and Ngoc Thang Vu. “Codecdeepfakedetection”. https://doi.org/10.5281/zenodo.17225924
- Relevant paper: How to Label Resynthesized Audio? The Dual Role of Neural Audio Codecs in Audio Deepfake Detection, Accepted at ICASSP 2026.
Xiao, Yixuan, and Ngoc Thang Vu. “Layer-Wise Decision Fusion for Fake Audio Detection Using XLS-R.” In Proc. Interspeech 2025, pp. 5618-5622. 2025. code
Xiao, Yixuan, and Ngoc Thang Vu. “What Affects the Performance of Fake Audio Detection? Analyzing Factors in a Continual Learning Setting.” In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2025. code.

Supervision

Thesis Topics

NOTE: Due to a high number of teaching responsibilities in 2026SS (One 4 SWS lab course, two or maybe three Master’s thesis projects), I will not be able to supervise new students. Thank you for your understanding.

I used to supervise the following topics:

Audio Deepfake Detection: Model training and analysis; requires familiarity with our codebase IMS-ADD.
Codec-based Speech Synthesis: Prompting and fine-tuning TTS or ALM models; audio reconstruction using neural audio codecs.
Speech Analysis: Analyzing speech to better understand speech models. Example tools include librosa, openSMILE, Parselmouth, speechmetrics, and SpeechBrain. Relevant models can sometimes be found on HuggingFace, e.g., speech enhancement models.

HiWi Position

NOTE:Because supervision workload is expected to be high in 2026SS, I also won’t be able to supervise research projects, hence no research-oriented HiWi positions are available.