Gen Li

Research Fellow @ MAE, NTU

prof_pic.jpg

On Westminster Bridge, London (2025)

I am a Postdoctoral Research Fellow in the School of Mechanical and Aerospace Engineering at Nanyang Technological University (NTU), working with Jianfei Yang at the MARS Lab. I completed my PhD in Robotics and Autonomous Systems at the University of Edinburgh, where I was supervised by Laura Sevilla and co-supervised by Timothy Hospedales. I was fortunate to be partially supported through Google DeepMind and Stability AI, where I collaborated with Deqing Sun and Varun Jampani.

🎯 My research aims to build intelligent physical agents that can perceive, reason, and act in real-world environments with human-like capability and high efficiency. This spans topics including:

  • Embodied AI: Robot Learning, VLA, RL
  • Multimodal AI: VLMs, MLLMs
  • Generative AI: Image / Video Generation, World Models
  • Efficient AI: Transfer Learning, Learning under Limited Data & Supervision
  • Human-Centered AI: Human-Robot Interaction, Human-to-Robot Learning

📢 If you are interested in these topics and would like to explore working together, please feel free to reach out via email.

news

May 08, 2026 🚩 Our latest survey, World Model for Robot Learning, is now out on arXiv!
Apr 27, 2026 🎉 A2A-FM is accepted to RSS 2026!
Mar 24, 2026 📖 Invited to serve as an Area Chair for NeurIPS 2026!
Feb 21, 2026 🎉 Evo-1 and PALM are accepted to CVPR 2026!
Feb 11, 2026 📖 Invited to serve as an Area Chair for BMVC 2026!
Nov 08, 2025 🎉 Mask2IV has been accepted to AAAI 2026.
Sep 15, 2025 💂 Started my new position as a Postdoctoral Research Fellow at NTU!

selected publications

  1. Preprint
    wm-survey.png
    World Model for Robot Learning: A Comprehensive Survey
    Bohan Hou*, Gen Li*, Jindou Jia*, Tuo An*, Xinying Guo*, and 13 more authors
    arXiv, 2026
  2. RSS’26
    a2a.png
    Action-to-Action Flow Matching
    Jindou Jia*, Gen Li*, Xiangyu Chen, Tuo An, Yuxuan Hu, and 3 more authors
    In Robotics: Science and Systems, 2026
  3. CVPR’26
    evo1.gif
    Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment
    Tao Lin, Yilei Zhong, Yuxin Du, Jingjing Zhang, Jiting Liu, and 9 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026
  4. AAAI’26
    mask2iv-gif.gif
    Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
    Gen Li, Bo Zhao, Jianfei Yang, and Laura Sevilla-Lara
    In AAAI Conference on Artificial Intelligence, 2026
  5. ICCV’25
    affgrasp-gif.gif
    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
    Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, and 2 more authors
    In IEEE/CVF International Conference on Computer Vision, 2025
  6. NMI
    ellmer.gif
    Embodied Large Language Models Enable Robots to Complete Complex Tasks in Unpredictable Environments
    Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, and Chris Lucas
    Nature Machine Intelligence, 2025
  7. CVPR’24
    ooal.png
    One-Shot Open Affordance Learning with Foundation Models
    Gen Li, Deqing Sun, Laura Sevilla-Lara, and Varun Jampani
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
  8. CVPR’23
    LOCATE.png
    LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
    Gen Li, Varun Jampani, Deqing Sun, and Laura Sevilla-Lara
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  9. CVPR’21
    ASGNet.png
    Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
    Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and 1 more author
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
Visitor Traffic 🌍