ºÚÁÏÍø

Events

Visual Planning: Let's Think Only with Images

LLM seminar event about the paper "Visual Planning: Let's Think Only with Images" by Cambridge University, UCL and Google.
Image with writing about the presenter name, title, time and place of the event. Black background with a book

Title: Visual Planning: Let's Think Only with Images

Presenter: Anton Naumov

Abstract: Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely on pure text as the medium for both expressing and structuring reasoning, even when visual information is present. In this work, the authors argue that language may not always be the most natural or effective modality for reasoning, particularly in tasks involving spatial and geometrical information. Motivated by this, they propose a new paradigm, Visual Planning, which enables planning through purely visual representations, independent of text. In this paradigm, planning is executed via sequences of images that encode step-by-step inference in the visual domain, akin to how humans sketch or visualize future actions. They introduce a novel reinforcement learning framework, Visual Planning via Reinforcement Learning (VPRL), empowered by GRPO for post-training large vision models, leading to substantial improvements in planning in a selection of representative visual navigation tasks, FrozenLake, Maze, and MiniBehavior. Their visual planning paradigm outperforms all other planning variants that conduct reasoning in the text-only space. Their results establish Visual Planning as a viable and promising alternative to language-based reasoning, opening new avenues for tasks that benefit from intuitive, image-based inference. 

Paper link:

Disclaimer: The presenter is not part of the authors!

LLM seminar
  • Updated:
  • Published:
Share
URL copied!