Research
Developing next-gen AI agents, exploring new modalities, and pioneering foundational learning
Next week, AI researchers from around the globe will converge at the 12th International Conference on Learning Representations (ICLR), set to take place May 7-11 in Vienna, Austria.
Raia Hadsell, Vice President of Research at Google DeepMind, will deliver a keynote reflecting on the last 20 years in the field, highlighting how lessons learned are shaping the future of AI for the benefit of humanity.
We’ll also offer live demonstrations showcasing how we bring our foundational research into reality, from the development of Robotics Transformers to the creation of toolkits and open-source models like Gemma.
Teams from across Google DeepMind will present more than 70 papers this year. Some research highlights:
Contents
Problem-solving agents and human-inspired approaches
Large language models (LLMs) are already revolutionizing advanced AI tools, yet their full potential remains untapped. For instance, LLM-based AI agents capable of taking effective actions could transform digital assistants into more helpful and intuitive AI tools.
AI assistants that follow natural language instructions to carry out web-based tasks on people’s behalf would be a huge timesaver. In an oral presentation we introduce WebAgent, an LLM-driven agent that learns from self-experience to navigate and manage complex tasks on real-world websites.
To further enhance the general usefulness of LLMs, we focused on boosting their problem-solving skills. We demonstrate how we achieved this by equipping an LLM-based system with a traditionally human approach: producing and using “tools”. Separately, we present a training technique that ensures language models produce more consistently socially acceptable outputs. Our approach uses a sandbox rehearsal space that represents the values of society.
Pushing boundaries in vision and coding
Until recently, large AI models mostly focused on text and images, laying the groundwork for large-scale pattern recognition and data interpretation. Now, the field is progressing beyond these static realms to embrace the dynamics of real-world visual environments. As computing advances across the board, it is increasingly important that its underlying code is generated and optimized with maximum efficiency.
When you watch a video on a flat screen, you intuitively grasp the three-dimensional nature of the scene. Machines, however, struggle to emulate this ability without explicit supervision. We showcase our Dynamic Scene Transformer (DyST) model, which leverages real-world single-camera videos to extract 3D representations of objects in the scene and their movements. What’s more, DyST also enables the generation of novel versions of the same video, with user control over camera angles and content.
Emulating human cognitive strategies also makes for better AI code generators. When programmers write complex code, they typically “decompose” the task into simpler subtasks. With ExeDec, we introduce a novel code-generating approach that harnesses a decomposition approach to elevate AI systems’ programming and generalization performance.
In a parallel spotlight paper we explore the novel use of machine learning to not only generate code, but to optimize it, introducing a dataset for the robust benchmarking of code performance. Code optimization is challenging, requiring complex reasoning, and our dataset enables the exploration of a range of ML techniques. We demonstrate that the resulting learning strategies outperform human-crafted code optimizations.
Advancing foundational learning
Our research teams are tackling the big questions of AI – from exploring the essence of machine cognition to understanding how advanced AI models generalize – while also working to overcome key theoretical challenges.
For both humans and machines, causal reasoning and the ability to predict events are closely related concepts. In a spotlight presentation, we explore how reinforcement learning is affected by prediction-based training objectives, and draw parallels to changes in brain activity also linked to prediction.
When AI agents are able to generalize well to new scenarios is it because they, like humans, have learned an underlying causal model of their world? This is a critical question in advanced AI. In an oral presentation, we reveal that such models have indeed learned an approximate causal model of the processes that resulted in their training data, and discuss the deep implications.
Another critical question in AI is trust, which in part depends on how accurately models can estimate the uncertainty of their outputs – a crucial factor for reliable decision-making. We’ve made significant advances in uncertainty estimation within Bayesian deep learning, employing a simple and essentially cost-free method.
Finally, we explore game theory’s Nash equilibrium (NE) – a state in which no player benefits from changing their strategy if others maintain theirs. Beyond simple two-player games, even approximating a Nash equilibrium is computationally intractable, but in an oral presentation, we reveal new state-of-the-art approaches in negotiating deals from poker to auctions.
Bringing together the AI community
We’re delighted to sponsor ICLR and support initiatives including Queer in AI and Women In Machine Learning. Such partnerships not only bolster research collaborations but also foster a vibrant, diverse community in AI and machine learning.
If you’re at ICLR, be sure to visit our booth and our Google Research colleagues next door. Discover our pioneering research, meet our teams hosting workshops, and engage with our experts presenting throughout the conference. We look forward to connecting with you!