This year’s CoRL was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front-and-center: “is training a large neural network on a very large dataset a feasible way to solve robotics?”.

Of course, some version of this question has been on researchers’ minds for a few years now1. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale ‘foundation models’ on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail.

Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions. If you read this and feel that I’ve misrepresented any of the ideas of positions below, then do write to me or post a clarifying comment below.

With all that established, let’s dive in!


I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics.

Why scaling might work

As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really ‘solve’ robotics.

Why scaling might not work

It’s impractical

Even if it works as well as in CV/NLP, it won’t solve robotics

Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised.

We can probably deploy learning-based approaches robustly

One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe.

However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car”, thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly.

What if we strive to deploy Human-in-the-Loop systems?

In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X% performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not”.

Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X% problem.

Maybe we don’t need to collect that much real world data for scaling

A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks.

Maybe combining classical and learning-based approaches can give us the best of both worlds

As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system - using learning for perception and a few skills, and classical SLAM and path-planning for the rest - is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best.

What can/should we take away from all this?

If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?”. Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this.

We should pursue the direction of trying to just scale up learning with very large datasets

Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well.

We should also pursue other existing directions

Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms.

We should focus more on real-world mobile manipulation and easy-to-use systems

Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot - and consequently its camera view - moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea.

Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’r extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amouns of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches.

We should be more forthright about things that don’t work

There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one.

We should try to do something totally new

There were a few people who pointed out that all current approaches - be they learning-based or classical - are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times.


Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post.

  1. In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.