publications

Please refer to Google Scholar for the full publication list.

2026

  1. Preprint
    wm-survey.png
    World Model for Robot Learning: A Comprehensive Survey
    Bohan Hou*, Gen Li*, Jindou Jia*, Tuo An*, Xinying Guo*, Sicong Leng, Haoran Geng, Yanjie Ze, Tatsuya Harada, Philip Torr, Oier Mees, Marc Pollefeys, Zhuang Liu, Jiajun Wu, Pieter Abbeel, Jitendra Malik, Yilun Du, and Jianfei Yang
    arXiv, 2026
  2. Preprint
    compassad.png
    CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects
    Jingliang Li, Jindou Jia, Tuo An, Chuhao Zhou, Xiangyu Chen, Shilin Shan, Boyu Ma, Bofan Lyu, Gen Li, and Jianfei Yang
    arXiv, 2026
  3. Preprint
    evo0.png
    Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding
    Tao Lin*, Gen Li*, Yilei Zhong, Yanwen Zou, Yuxin Du, Jiting Liu, Encheng Gu, and Bo Zhao
    arXiv, 2026
  4. RSS’26
    a2a.png
    Action-to-Action Flow Matching
    Jindou Jia*, Gen Li*, Xiangyu Chen, Tuo An, Yuxuan Hu, Jingliang Li, Xinying Guo, and Jianfei Yang
    In Robotics: Science and Systems, 2026
  5. CVPR’26
    evo1.gif
    Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment
    Tao Lin, Yilei Zhong, Yuxin Du, Jingjing Zhang, Jiting Liu, Yinxinyu Chen, Encheng Gu, Ziyan Liu, Hongyi Cai, Yanwen Zou, Lixing Zou, Zhaoye Zhou, Gen Li, and Bo Zhao
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026
  6. CVPR’26
    palm.png
    PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation
    Yuanzhe Liu, Jingyuan Zhu, Yuchen Mo, Gen Li, Xu Cao, Jin Jin, Yifan Shen, Zhengyuan Li, Tianjiao Yu, Wenzhen Yuan, Fangqiang Ding, and Ismini Lourentzou
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026
  7. AAAI’26
    mask2iv-gif.gif
    Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
    Gen Li, Bo Zhao, Jianfei Yang, and Laura Sevilla-Lara
    In AAAI Conference on Artificial Intelligence, 2026

2025

  1. ACM MM’25
    dual_acm25.png
    Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding
    Yuzhen Li, Min Liu, Yuan Bian, Xueping Wang, Zhaoyang Li, Gen Li, and Yaonan Wang
    In Proceedings of the 33rd ACM International Conference on Multimedia, 2025
  2. ICCV’25
    affgrasp-gif.gif
    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
    Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, and Laura Sevilla-Lara
    In IEEE/CVF International Conference on Computer Vision, 2025
  3. ICCV’25
    LITE.png
    Principles of Visual Tokens for Efficient Video Understanding
    Xinyue Hao, Gen Li, Shreyank N Gowda, Robert B Fisher, Jonathan Huang, Anurag Arnab, and Laura Sevilla-Lara
    In IEEE/CVF International Conference on Computer Vision, 2025
  4. IROS’25
    bit-align.png
    Resource-Efficient Affordance Grounding with Complementary Depth and Semantic Prompts
    Yizhou Huang, Fan Yang, Guoliang Zhu, Gen Li, Hao Shi, Yukun Zuo, Wenrui Chen, Zhiyong Li, and Kailun Yang
    In International Conference on Intelligent Robots and Systems, 2025
  5. NMI
    ellmer.gif
    Embodied Large Language Models Enable Robots to Complete Complex Tasks in Unpredictable Environments
    Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, and Chris Lucas
    Nature Machine Intelligence, 2025

2024

  1. ECCVW’24
    watt-for-what.png
    Watt for what: Rethinking deep learning’s energy-performance relationship
    Shreyank N Gowda, Xinyue Hao, Gen Li, Shashank Narayana Gowda, Xiaobo Jin, and Laura Sevilla-Lara
    In European Conference on Computer Vision Workshop, 2024
  2. CVPR’24
    ooal.png
    One-Shot Open Affordance Learning with Foundation Models
    Gen Li, Deqing Sun, Laura Sevilla-Lara, and Varun Jampani
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

  1. IJCNN’23
    referenceless.png
    Referenceless User Controllable Semantic Image Synthesis
    Jonghyun Kim, Gen Li, and Joongkyu Kim
    In International Joint Conference on Neural Networks, 2023
  2. CVPR’23
    LOCATE.png
    LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
    Gen Li, Varun Jampani, Deqing Sun, and Laura Sevilla-Lara
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2021

  1. CVPR’21
    ASGNet.png
    Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
    Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
  2. BMVC’21
    superstyle.png
    SuperStyleNet: Deep Image Synthesis with Superpixel Based Style Encoder
    Jonghyun Kim, Gen Li, Cheolkon Jung, and Joongkyu Kim
    In British Machine Vision Conference, 2021
  3. PR
    pr.png
    Weakly-supervised temporal attention 3D network for human action recognition
    Jonghyun Kim, Gen Li, Inyong Yun, Cheolkon Jung, and Joongkyu Kim
    Pattern Recognition, 2021
  4. Neurocom
    neu.png
    Edge and identity preserving network for face super-resolution
    Jonghyun Kim, Gen Li, Inyong Yun, Cheolkon Jung, and Joongkyu Kim
    Neurocomputing, 2021

2020

  1. Access
    dab_access.png
    Depth-Wise Asymmetric Bottleneck With Point-Wise Aggregation Decoder for Real-Time Semantic Segmentation in Urban Scenes
    Gen Li, Shenlu Jiang, Inyong Yun, Jonghyun Kim, and Joongkyu Kim
    IEEE Access, 2020

2019

  1. BMVC’19
    DABNet.png
    DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation
    Gen Li and Joongkyu Kim
    In British Machine Vision Conference, 2019