website picture

Boyuan Sun (孙博远) is currently a 3rd year Ph.D. candidate at Nankai University, supervised by Prof. Qibin Hou and Prof. Ming-Ming Cheng. He received his bachelor's degree from the School of Computer Science and Technology at Xidian University in 2021. And now he is taking the Master-Ph.D. combined program in Nankai University. His research interests include Computer Vision and Multimodal Large Language Model, particularly focusing on multi-modal visual perception, vison-language model, semi-supervised learning, etc.

Internship Experience

ByteDance
Apr 2026 – Present
Research Intern

Working on streaming video understanding and omni models. Focusing on agentic RL and on-policy training for streaming video.

Streaming Video Understanding Omni Model
Shanghai AI Laboratory
Dec 2025 – Apr 2026
Research Intern

Working on Science Discovery Agent System, focusing on MLE agentic model design, Data analyze agent, and agentic model training.

Science Discovery Agent MLE Data Analyze Agent
Tongyi Lab, Alibaba
Sep 2024 – Dec 2025
Research Intern

Focused on video understanding techniques, especially on training-free token compression, omni model architecture, and fine-grained object understanding for large multimodal models.

Token Compression Video Understanding Video-LLM

Selected Publications

* Equal contribution. # Corresponding author.

arxiv

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao, Shiyang Feng, Zichen Liang, Boyuan Sun, Tianshuo Peng, Yifan Zhou, Xin Li, Jie Zhou, Liang He, Bo Zhang, Lei Bai

Technical Report

arxiv

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Boyuan Sun, Bo-Wen Yin, Yuan-Ming Li, Xihan Wei, Qibin Hou

IEEE Computer Vision and Pattern Recognition 2026 (CVPR 2026)

arxiv

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

Modi Jin, Yiming Zhang, Boyuan Sun, Dingwen Zhang, Ming-Ming Cheng, Qibin Hou

IEEE Computer Vision and Pattern Recognition 2026 (CVPR 2026 Highlight)

arxiv

Depth Anything at Any Condition

Boyuan Sun*, Modi Jin*, Bowen Yin, Qibin Hou#

Submitted to TPAMI

arxiv

LLaVA-Scissor: token Compression with Semantic Connected Components for Video LLMs

Boyuan Sun*, Jiaxing Zhao*, Xihan Wei, Qibin Hou#

Submitted to TMM

arxiv

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Qize Yang*, Shimin Yao*, Weixuan Chen*, Shenghao Fu, Detao Bai, Jiaxing Zhao, Boyuan Sun, Bowen Yin, Xihan Wei, Jingren Zhou

Technical Report

arxiv

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding

Jiaxing Zhao*, Qize Yang*, Yixing Peng*, Detao Bai* Shimin Yao*, Boyuan Sun, Xiang Chen, Shenghao Fu, Weixuan Chen, Xihan Wei, Liefeng Bo#

Technical Report

arxiv

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Boyuan Sun*, Jiaxing Zhao*, Xiang Chen, Xihan Wei, Qibin Hou#

Arxiv preprint 2025

arxiv

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

Jiaxing Zhao*, Boyuan Sun*, Xiang Chen, Xihan Wei

Association for the Advancement of Artificial Intelligence 2026 (AAAI 2026)

arxiv

AODRaw: Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Boyuan Sun, Chun-Le Guo, Ming-Ming Cheng#

IEEE Computer Vision and Pattern Recognition 2025 (CVPR 2025 Highlight)

arxiv

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Boyuan Sun, Yuqi Yang, Weifeng Yuan, Le Zhang, Ming-Ming Cheng, Qibin Hou#

IEEE Computer Vision and Pattern Recognition 2024 (CVPR 2024)

arxiv

CamoFormer: Masked Separable Attention for Camouflaged Object Detection

Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-Ping Fan, & Luc Van Gool.

Arxiv preprint 2022

Find me on WeChat with the ID sby123bb, or scan my QR code:

QR code