Gen Li

Research Fellow @ MAE, NTU

Westminster Bridge

London, 2025

I am a Postdoctoral Research Fellow at Nanyang Technological University (NTU), working with Jianfei Yang at the MARS Lab. I completed my PhD in Robotics and Autonomous Systems at the University of Edinburgh, where I was supervised by Laura Sevilla and co-supervised by Timothy Hospedales. I was fortunate to be partially supported through Google DeepMind and Stability AI, where I collaborated with Deqing Sun and Varun Jampani.

🎯 My research aims to build reliable, efficient, and human-centered embodied intelligence that can perceive, reason, and act in the physical world, with a focus on:

Embodied AI: Robot Learning, VLA, World Models
Efficient AI: Data-, Parameter-, and Resource-Efficient Methodologies
Multimodal AI: Multimodal Foundation Models and Representation Learning
Human-Centered AI: Human-Robot Interaction, Human-to-Robot Skill Transfer

📢 If you are interested in these topics and would like to explore working together, please feel free to reach out via email.

News

May 08, 2026	🚩 Our latest survey, World Model for Robot Learning, is now out on arXiv!
Apr 27, 2026	🎉 A2A Flow Matching is accepted to RSS 2026!
Mar 24, 2026	📖 Invited to serve as an Area Chair for NeurIPS 2026!
Feb 21, 2026	🎉 Evo-1 and PALM are accepted to CVPR 2026!
Feb 11, 2026	📖 Invited to serve as an Area Chair for BMVC 2026!
Nov 08, 2025	🎉 Mask2IV is accepted to AAAI 2026!

Selected Publications

Preprint

World Model for Robot Learning: A Comprehensive Survey

Bohan Hou^*, Gen Li^*, Jindou Jia^*, Tuo An^*, Xinying Guo^*, Sicong Leng, Haoran Geng, Yanjie Ze, Tatsuya Harada, Philip Torr, Oier Mees, Marc Pollefeys, Zhuang Liu, Jiajun Wu, Pieter Abbeel, Jitendra Malik, Yilun Du, and Jianfei Yang

arXiv, 2026

arXiv Bib Code Website

@article{wm-survey,
  title = {World Model for Robot Learning: A Comprehensive Survey},
  author = {Hou, Bohan and Li, Gen and Jia, Jindou and An, Tuo and Guo, Xinying and Leng, Sicong and Geng, Haoran and Ze, Yanjie and Harada, Tatsuya and Torr, Philip and Mees, Oier and Pollefeys, Marc and Liu, Zhuang and Wu, Jiajun and Abbeel, Pieter and Malik, Jitendra and Du, Yilun and Yang, Jianfei},
  year = {2026},
  journal = {arXiv},
}

RSS’26

Action-to-Action Flow Matching

Jindou Jia^*, Gen Li^*, Xiangyu Chen, Tuo An, Yuxuan Hu, Jingliang Li, Xinying Guo, and Jianfei Yang

In Robotics: Science and Systems, 2026

arXiv Bib Code Website

@inproceedings{a2a,
  title = {Action-to-Action Flow Matching},
  author = {Jia, Jindou and Li, Gen and Chen, Xiangyu and An, Tuo and Hu, Yuxuan and Li, Jingliang and Guo, Xinying and Yang, Jianfei},
  year = {2026},
  booktitle = {Robotics: Science and Systems},
}

CVPR’26

Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment

Tao Lin, Yilei Zhong, Yuxin Du, Jingjing Zhang, Jiting Liu, Yinxinyu Chen, Encheng Gu, Ziyan Liu, Hongyi Cai, Yanwen Zou, Lixing Zou, Zhaoye Zhou, Gen Li^†, and Bo Zhao^†

In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

arXiv Bib Code Website

@inproceedings{evo1,
  title = {Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment},
  author = {Lin, Tao and Zhong, Yilei and Du, Yuxin and Zhang, Jingjing and Liu, Jiting and Chen, Yinxinyu and Gu, Encheng and Liu, Ziyan and Cai, Hongyi and Zou, Yanwen and Zou, Lixing and Zhou, Zhaoye and Li, Gen and Zhao, Bo},
  year = {2026},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
}

AAAI’26

Mask2IV: Interaction-Centric Video Generation via Mask Trajectories

Gen Li, Bo Zhao, Jianfei Yang, and Laura Sevilla-Lara

In AAAI Conference on Artificial Intelligence, 2026

arXiv Bib Code Website

@inproceedings{Mask2IV,
  title = {Mask2IV: Interaction-Centric Video Generation via Mask Trajectories},
  author = {Li, Gen and Zhao, Bo and Yang, Jianfei and Sevilla-Lara, Laura},
  year = {2026},
  booktitle = {AAAI Conference on Artificial Intelligence},
}

ICCV’25

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, and Laura Sevilla-Lara

In IEEE/CVF International Conference on Computer Vision, 2025

arXiv Bib Code Website

@inproceedings{Aff-Grasp,
  title = {Learning Precise Affordances from Egocentric Videos for Robotic Manipulation},
  author = {Li, Gen and Tsagkas, Nikolaos and Song, Jifei and Mon-Williams, Ruaridh and Vijayakumar, Sethu and Shao, Kun and Sevilla-Lara, Laura},
  year = {2025},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
}

NMI

Embodied Large Language Models Enable Robots to Complete Complex Tasks in Unpredictable Environments

Ruaridh Mon-Williams^†, Gen Li^†, Ran Long, Wenqian Du, and Chris Lucas

Nature Machine Intelligence, 2025

Bib PDF Video Code

@article{ELLMER,
  title = {Embodied Large Language Models Enable Robots to Complete Complex Tasks in Unpredictable Environments},
  author = {Mon-Williams, Ruaridh and Li, Gen and Long, Ran and Du, Wenqian and Lucas, Chris},
  journal = {Nature Machine Intelligence},
  year = {2025},
}

CVPR’24

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara, and Varun Jampani

In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

arXiv Bib Code Website

@inproceedings{OOAL,
  title = {One-Shot Open Affordance Learning with Foundation Models},
  author = {Li, Gen and Sun, Deqing and Sevilla-Lara, Laura and Jampani, Varun},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year = {2024},
}

CVPR’23

LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

Gen Li, Varun Jampani, Deqing Sun, and Laura Sevilla-Lara

In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

arXiv Bib Code Website

@inproceedings{LOCATE,
  title = {LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding},
  author = {Li, Gen and Jampani, Varun and Sun, Deqing and Sevilla-Lara, Laura},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year = {2023},
}

CVPR’21

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation

Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim

In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021

arXiv Bib Code Website

@inproceedings{ASGNet,
  title = {Adaptive Prototype Learning and Allocation for Few-Shot Segmentation},
  author = {Li, Gen and Jampani, Varun and Sevilla-Lara, Laura and Sun, Deqing and Kim, Jonghyun and Kim, Joongkyu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {8334--8343},
  year = {2021},
}

Visitor Traffic 🌍