Boyuan Sun (孙博远) is currently a 3rd year Ph.D. candidate at Nankai University, supervised by Prof. Qibin Hou and Prof. Ming-Ming Cheng. He received his bachelor's degree from the School of Computer Science and Technology at Xidian University in 2021. And now he is taking the Master-Ph.D. combined program in Nankai University. His research interests include Computer Vision and Multimodal Large Language Model, particularly focusing on multi-modal visual perception, vison-language model, semi-supervised learning, etc.
Internship Experience
Shanghai AI Laboratory
Working on MLE Agent Workflow, focusing on planner design, and tool-augmented agentic model training for scientific discovery.
Tongyi Lab, Alibaba
Focused on video understanding techniques, especially on training-free token compression, omni model architecture, and fine-grained object understanding for large multimodal models.
Selected Publications
* Equal contribution. # Corresponding author.
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
IEEE Computer Vision and Pattern Recognition 2026 (CVPR 2026)