👋 About Me
I am currently a research assistant at the School of Data Science (SDS), The Chinese University of Hong Kong, Shenzhen (CUHKSZ), where I work under the supervision of Prof. Lihai Zhou and Prof. Benyou Wang. My research focuses on AI Interpretability & Trustworthy, aiming to better understand Large Language Models (LLMs) and to design more advanced and safer AI.
Previously, I earned my Bachelor’s degree from the University of Electronic Science and Technology of China (UESTC) and obtained my Master’s degree in Artificial Intelligence from the Faculty of Science, The University of Hong Kong (HKU). I am also an incoming Ph.D. student in AI at the Department of Data Science, HKU, advised by Prof. Difan Zou.
My current research interest lies in understanding the mechanisms of LLMs to understand these models to better plan for a future of safe AI. In particular, I am actively exploring Circuit Analysis and Sparse Autoencoders (SAEs), with the long-term goal of (1) understanding how LLMs work internally, (2) improving their performance, and (3) building models that are safer and more controllable.
If you would like to get in touch—or share a passion for LLM interpretability and safety—feel free to reach out via email: sunny615@connect.hku.hk.
🗞️ News
- 📝 [2025.05] Paper “Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis” accepted at ICML 2025
📄 Publications
-
Xu Wang, Z Li, B Wang, Y Hu, D Zou. Model Unlearning via Sparse Autoencoder Subspace Guided Projections
arXiv preprint arXiv:2505.24428 -
Li, Z, Xu Wang, Y Yang, Z Yao, H Xiong, M Du. Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
arXiv preprint arXiv:2505.15634 -
Xu Wang, et al. Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
ICML 2025 (accepted) -
Cheung, L, Xu Wang, J Zhang, RKM Poon, ASM Lau. Applications of Generative AI: A Case Study of AI Doctor
Southeast Decision Sciences Institute (SEDSI) Conference, 29 Jan 2025 – 31 Jan 2025 (accepted)
🔬 Experience
-
HKU Research Assistant, Department of Statistics and Actuarial Science (05/2024 – 08/2024)
Research Direction: LLM applications in healthcare. -
CUHK (ShenZhen) Research Assistant, School of Data Science (09/2024 – 08/2025)
Research Direction: LLM mechanistic interpretability and AI safety.
🧭 Future Plan
- 🔍 Continue exploring LLM interpretability, following Anthropic Interpretability team
- 🧠 Leverage SAE, Circuit, and related methods to uncover the internal mechanisms of LLMs, delivering improved foundational SAE and features to the community
- 🛡️ Continue exploring AI safety, focusing on data security and training robustness in LLMs
- 🌐 Combine mechanistic interpretability with inference and reasoning: identify ways to integrate inference scaling and reinforcement learning (RL) theory into mechanistic interpretability research