website picture

Boyuan Sun (孙博远) is currently a 3rd year Ph.D. candidate at Nankai University, supervised by Prof. Qibin Hou and Prof. Ming-Ming Cheng. He received his bachelor's degree from the School of Computer Science and Technology at Xidian University in 2021. And now he is taking the Master-Ph.D. combined program in Nankai University. His research interests include Computer Vision and Multimodal Large Language Model, particularly focusing on multi-modal visual perception, vison-language model, semi-supervised learning, etc.

Tianjin, China
sbysbysby123@gmail.com

Internship Experience

Shanghai AI Laboratory
Dec 2025 – Present
Research Intern

Working on MLE Agent Workflow, focusing on planner design, and tool-augmented agentic model training for scientific discovery.

Agent Workflow MLE Reasoning
Tongyi Lab, Alibaba
Sep 2024 – Dec 2025
Research Intern

Focused on video understanding techniques, especially on training-free token compression, omni model architecture, and fine-grained object understanding for large multimodal models.

Token Compression Video Understanding Video-LLM

Selected Publications

* Equal contribution. # Corresponding author.

arxiv

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Boyuan Sun, Bo-Wen Yin, Yuan-Ming Li, Xihan Wei, Qibin Hou

IEEE Computer Vision and Pattern Recognition 2026 (CVPR 2026)

arxiv

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

Modi Jin, Yiming Zhang, Boyuan Sun, Dingwen Zhang, Ming-Ming Cheng, Qibin Hou

IEEE Computer Vision and Pattern Recognition 2026 (CVPR 2026)

arxiv

Depth Anything at Any Condition

Boyuan Sun*, Modi Jin*, Bowen Yin, Qibin Hou#

Arxiv preprint 2025

arxiv

LLaVA-Scissor: token Compression with Semantic Connected Components for Video LLMs

Boyuan Sun*, Jiaxing Zhao*, Xihan Wei, Qibin Hou#

Arxiv preprint 2025

arxiv

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Qize Yang*, Shimin Yao*, Weixuan Chen*, Shenghao Fu, Detao Bai, Jiaxing Zhao, Boyuan Sun, Bowen Yin, Xihan Wei, Jingren Zhou

Technical Report

arxiv

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding

Jiaxing Zhao*, Qize Yang*, Yixing Peng*, Detao Bai* Shimin Yao*, Boyuan Sun, Xiang Chen, Shenghao Fu, Weixuan Chen, Xihan Wei, Liefeng Bo#

Technical Report

arxiv

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Jiaxing Zhao*, Boyuan Sun*, Xiang Chen, Xihan Wei, Qibin Hou#

Arxiv preprint 2025

arxiv

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

Jiaxing Zhao*, Boyuan Sun*, Xiang Chen, Xihan Wei

Association for the Advancement of Artificial Intelligence 2026 (AAAI 2026)

arxiv

AODRaw: Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Boyuan Sun, Chun-Le Guo, Ming-Ming Cheng#

IEEE Computer Vision and Pattern Recognition 2025 (CVPR 2025 Highlight)

arxiv

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Boyuan Sun, Yuqi Yang, Weifeng Yuan, Le Zhang, Ming-Ming Cheng, Qibin Hou#

IEEE Computer Vision and Pattern Recognition 2024 (CVPR 2024)

arxiv

CamoFormer: Masked Separable Attention for Camouflaged Object Detection

Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-Ping Fan, & Luc Van Gool.

Arxiv preprint 2022

Find me on WeChat with the ID sby123bb, or scan my QR code:

QR code