Wenbo Hu. Personal Webpage

Wenbo (Gordon) Hu

whu at cs dot ucla dot edu

Hi! I am Wenbo. I'm a first-year CS PhD student at University of California, Los Angeles advised by Prof. Kai-Wei Chang and Prof. Nanyun Peng. I also obtained my M.S. in CS from UCLA in 2024. Previously, I obtained my B.S. in Data Science at the Halicioglu Data Science Institute (HDSI) from University of California, San Diego in 2023. I'm fortunate to have worked with Prof. Zhuowen Tu and Prof. Hao Su during my undergraduate study.

My primary research interest lies in the intersection of vision, language, and agentic. Particularly, I have worked on 2D and 3D vision-language models in visual understanding and embodied tasks, and evaluation benchmarks for multimodal models. My long-term research goal is to build intelligent systems that can perceive, understand and interact with the complex physical world.

I'm actively looking for strong and motivated graduate and undergraduate students to collaborate. If you have similar research interests or interested in working on research projects in general, feel free to reach out to me!

CV / GitHub / Google Scholar / LinkedIn / Email / Twitter (X) / DBLP

News

3DLLM-Mem was selected for the Best Paper Award at CVPR 2025 Foundation Models Meet Embodied Agents Workshop!

We release MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models!

MQT-LLaVA is accepted at NeurIPS 2024!

Selected Publications (Full: here)

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng,

Yonatan Bitton, Idan Szpektor, Kai-Wei Chang

CVPR 2025 Foundation Models Meet Embodied Agents Workshop (Best Paper Award)

[Paper] [Project Page] [Code]

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Cheng-Fu Yang, Da Yin, Wenbo Hu, Heng Ji, Nanyun Peng, Bolei Zhou, Kai-Wei Chang

ICCV 2025

[Paper] [Code]

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou, Mohsen Fayyaz, Pan Lu, Kai-Wei Chang, Nanyun Peng

ICLR 2025

[Paper] [Project Page] [Code] [🤗 Data]

Matryoshka Query Transformer for Large Vision-Language Models

Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

NeurIPS 2024

[Paper] [Project Page] [Code] [🤗 Demo]

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu*, Yifan Xu*, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu

AAAI 2024

[Paper] [Project Page] [Code] [🤗 Demo] [Slides]

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

Haoyi Qiu*, Wenbo Hu*, Zi-Yi Dou, Nanyun Peng

ACL 2024 (Findings)

[Paper] [Project Page] [Code]

Academic Service

Conference Reviewer for ICLR 2025, NeurIPS 2025, CVPR 2025, ICCV 2025, ACL 2025, EMNLP 2025, NAACL 2025

Journal Reviewer for TPAMI and International Journal of Robotics Research

Conference Reviewer for ACL 2024, EMNLP 2024, NAACL 2024, ACL Rolling Review

Journal Reviewer for IEEE Transactions on Multimedia

Teaching

Assistant, CSE151A: Intro to Machine Learning, UCSD (Winter 2023)

Awards

Best Paper Award at CVPR 2025 Foundation Models Meet Embodied Agents Workshop

UCLA CS Departmental Fellowship Award

Thanks to the template from Jon Barron

Pageviews