Lex Fridman surveys the state of autonomous vehicle technology as of early 2019
Lex Fridman delivers a lecture on the current state of self-driving car technology, covering industry milestones, sensor approaches, safety debates, and the human experience of autonomy.
Summary
Lex Fridman, AI researcher and lecturer at MIT, presents a comprehensive overview of where autonomous vehicle technology stands at the start of 2019. He opens by framing the mission — reducing the roughly one fatality every 23 seconds caused by car crashes worldwide — before reviewing the major milestones of 2018, including Waymo's 10 million fully autonomous miles and Tesla Autopilot's one billion semi-autonomous miles. He examines the two fatalities of 2018 (the Uber pedestrian death in Tempe and the Tesla Autopilot fatality in Mountain View) and argues that public and media responses to individual incidents create disproportionate pressure that could slow progress. A central argument of the lecture is that autonomous vehicles will not be adopted primarily because they are safer or faster, but only if they create a genuinely better human experience — a dimension Fridman argues the industry largely neglects. He closes by framing autonomous vehicles as one of the defining challenges for artificial intelligence in the 21st century.
Key Takeaways
FULL TRANSCRIPT
The Mission: Why Autonomous Vehicles Matter
Lex Fridman: Today I'd like to talk about the state of the art of autonomous vehicles — how I see the landscape, how others see the landscape, what we're all excited about, the ways to solve the problem, and what to look forward to in 2019 as we hear from different perspectives and various leaders in the industry over the next few days and weeks.
The problem, the mission, the dream — the thing we're trying to solve. For many it may be about entrepreneurial possibilities, making money, and so on. But really it's about improving access to mobility, moving people around in a world where that ability doesn't exist for everyone, whether because of age or purely because of where you live. We want to increase the efficiency of how people move about — the ability to be productive in the time we spend in traffic and transportation. One of the most hated things in terms of stress and emotion, the thing in our lives that if we could just snap our fingers and remove, is traffic. The ability to convert that into efficiency, into a productive aspect, into a positive aspect of life.
And really the most important thing — at least for me, and for many of us working in this space — is to save lives, prevent crashes that lead to injuries, prevent crashes that lead to fatalities. Here's a counter: every 23 seconds, somebody in the world dies in a car crash. It should be sobering — and it is for me — something I think about every single day. You go to bed, you wake up, you work on all the deep learning levels, all the different papers being published, everything we're trying to push forward. It's really to save lives. At the beginning and at the end, that is the main goal.
2018 in Review: Milestones and Fatalities
So with that groundwork, that idea, that base — the mission we're all working towards from different ideas and different perspectives — I'd like to review what happened in 2018.
First, Waymo has done incredible work in deploying and testing their vehicles in various domains. In October they reached the mark of 10 million miles driven autonomously, which is an incredible accomplishment. It's truly a big step for fully autonomous vehicles in terms of deployment, and it's obviously growing by the day. We'll have Drago here from Waymo to talk about their work.
Then on the L2, semi-autonomous side — the mirror side of this equation — there's another incredible number that's perhaps less talked about: the one billion mile mark reached by Tesla in the semi-autonomous driving of Autopilot. Autopilot is a system that's able to control its position in the lane, center itself in the lane, and control longitudinal movement — following a vehicle when there's one in front, and so on. But the degree of its ability to do so is the critical thing. The ability to do so for many minutes at a time, even hours at a time, especially on highway driving — that's the critical thing. And the fact that they've reached one billion — with a B — miles is an incredible accomplishment.
From the machine learning perspective, all of that is data. And all of the Autopilot models are driven with the primary sensor being a camera — that's computer vision. Now, how does computer vision work in the modern day? Especially with the second iteration of Autopilot hardware, there's a neural network — a set of neural networks — behind it. That's super exciting. That is probably the largest deployment of neural networks in the world that has a direct impact on a human life, that is able to make life-critical decisions many times a second, over and over. That's incredible. You go from the step of image classification on ImageNet, sitting there with TensorFlow, very happy, achieving 99.3% accuracy with a state-of-the-art algorithm — and then you take a step toward a world where there's a human life. Your parents driving, your grandparents driving, your children riding in this system, and there's a neural network making the decision of whether they'll live. So that one billion mile mark is an incredible accomplishment.
On the sobering side, from various perspectives, there were two fatalities in March of 2018. One on the fully autonomous side — Uber in Tempe, Arizona, hitting a pedestrian and leading to a pedestrian fatality. And on the semi-autonomous side, with Tesla Autopilot, the third fatality that Tesla Autopilot has been associated with — the one in 2018 — in Mountain View, California, when a Tesla slammed into a divider, killing the driver.
The two aspects here that are sobering and really important to think about, as we discuss the progression and proliferation of autonomous vehicles in our world, are our response as a public — from the general public to the engineers to the media — and how we think about these fatalities. There's a disproportionate amount of attention given to them. That's something engineers have to think about: the bar is much higher on every level in terms of performance. In order to design successful autonomous vehicles, those vehicles will have to take risks. And when those risks don't pan out, if the public doesn't understand the general problem we're tackling and the goal of the mission, the risks that are taken can have a significant detrimental effect on progress in the autonomous vehicle space. That's something we really have to think about — that's our role as engineers.
Question from audience: Do we know the rate of fatalities per mile of vehicle driven?
Lex Fridman: That's the crudest level at which people think about safety. There are about 80 to 100 million miles driven in manually controlled cars per fatality — so one fatality per 80 to 100 million miles, depending on which numbers you look at. In the Tesla vehicle, for example, you could just take the one billion and divide by three. Now, this is apples and oranges, and that's something we're actually working on — making sure we compare correctly. We need to compare the aspects of manual miles that are directly comparable to Autopilot miles. Autopilot is used in a modern vehicle that's much safer than the general population of manually driven vehicles. Autopilot is driven primarily on highways. The kinds of people who drive with Autopilot — all these factors need to be considered when you compare the two.
But when you just look at the raw numbers, Tesla Autopilot appears three times safer than manually driven vehicles. That's not the right way to look at it, though. And for anyone who's taken a statistics class, three fatalities is not a large enough number from which to draw any significant conclusions. Nevertheless, that doesn't stop the media — the New York Times and everybody else — from responding to a single fatality. The PR and marketing aspects of these different companies are very sensitive to that, which is of course troubling and concerning for an engineer who wants to save lives. But it's something we have to think about.
Autonomous Taxi Services and Public Deployments
Continuing the 2018 review: there have been a lot of announcements — or rather actual launches — of public testing of autonomous taxi services. Companies have been delivering real people from one location to another on public roads. Now there are a lot of caveats. In many cases it's very small scale, just a few vehicles. In most cases it's very low speed, in a constrained environment, in a constrained community, and almost always with a safety driver. There are a few exceptions for demonstration purposes, but there is always an actual driver in the seat.
Some of the brilliant folks representing these companies will speak in this course. Voyage is doing it in an isolated community — awesome work they're doing in retirement villages in Florida. Optimus Ride here in Boston is operating in the Union Point community. Drive.ai is in Texas. May Mobility is expanding beyond Detroit, though most operations are still in Detroit. Waymo has launched its service — Waymo One — which has gotten some publicity in Phoenix, Arizona. Nuro is doing zero-occupancy deliveries of groceries autonomously. We didn't say it has to be delivering humans — it's delivering groceries autonomously. Uber has quietly, or not so quietly, resumed its autonomous vehicle taxi service testing in Pittsburgh in a very careful, constrained way. Aptiv, after acquiring Carl Iagnemma and nuTonomy, has been doing extensive large-scale taxi service testing everywhere from Las Vegas to Boston to Pittsburgh and in Singapore. Aurora — whose founder, the former head of Tesla Autopilot, spoke here last time — is doing testing in San Francisco and Pittsburgh. And Cruise, from GM — Kyle will be here to talk about that — is doing testing in San Francisco, Arizona, and Michigan.
Predictions: When Will Autonomous Vehicles Arrive?
When we talk about predictions — and I'll talk about a few people's predictions — and when you yourself think about what it means when autonomous vehicles will be here, when the Uber you call will be autonomous and not populated by a driver, the thing we have to think about is how we define autonomous, what that experience looks like, and most importantly, we have to think about scale.
We here at MIT — our group, MIT Human-Centered Autonomous Vehicles — have a fully autonomous vehicle that people can get in if they'd like, and it will give them a ride in a particular location. But that's one vehicle. It's not a service, and it only works on particular roads. It's extremely constrained. In some ways it's not much different from most of the companies we were talking about today.
Scale here — there's a magic number, and I'm not sure what it is, but for the purpose of this conversation let's say it's 10,000 — where there's a meaningful deployment, when it's truly going beyond that prototype demo mode to where everything is under control, to where it's really touching the general population in a fundamental way. Scale is everything here, and let's say it starts at 10,000. Just to give you a reference: there are 46,000 active Uber drivers in New York City. So 10,000 feels like about 25 to 30% of the Uber drivers in New York City all of a sudden becoming passengers.
So the predictions. I'm not a marketing PR person, so I don't understand why everybody feels they have to make a prediction — but they all seem to. Major automakers have made predictions of when they'll be able to deploy autonomous vehicles. Tesla made a prediction in early 2017 that it would have autonomous vehicles in 2018. In 2018 they adjusted that prediction to 2019. Nissan, Honda, and Toyota have made predictions for 2020 under certain constraints in highway and urban environments. Hyundai and Volvo have said 2021. BMW and Ford — Ford saying at scale, a large-scale deployment — have said 2021. Chrysler has said 2021, and Daimler has said the early 2020s.
So there are predictions that are extremely optimistic, perhaps driven by the instinct that a company has to declare it's at the cutting edge of innovation. And then there are many of the leading engineers behind these teams — including Carl Iagnemma and Gill Pratt from MIT — who inject a little bit of caution and grounded thinking about how difficult it is to remove the human from the loop of automation.
Carl basically gives this analogy of an elevator: the elevator is fully autonomous, but there is still a button to call for help if something happens. That's how he thinks about autonomous vehicles. Even with a greater and greater degree of automation, there's still going to have to be a human in the loop, still going to be a way to contact a human to get help. And Gill Pratt at Toyota — they're making some announcements at CES — basically says that the human in the loop is the fundamental aspect we need to approach this problem with, and that removing the human from consideration is really, really far away. Gill, who is historically and currently one of the great roboticists in the world and who defined a lot of the DARPA challenges and much of our historical progress up to this point, represents that more cautious view.
So there's really the full spectrum. We can think of it as the Elon–Rodney spectrum of optimism versus pessimism. Elon Musk is extremely bold and optimistic about his predictions. I often connect with this kind of thinking because sometimes you have to believe the impossible is possible in order to make it happen. And then there is Rodney — also one of the great roboticists, the former head of CSAIL, the AI laboratory here — who is a little bit on the pessimistic side. For Elon, a fully autonomous vehicle will be here in 2019. For Rodney, fully autonomous vehicles are really beyond 2050. But he does believe that in the 2030s, a major city will be able to allocate a significant region where manual driving is fully banned — which is the way he believes autonomous vehicles could really proliferate, when you ban manually driven vehicles in certain parts. And then in the 2040s, 2045 or beyond, the majority of US cities will ban manually driven vehicles.
The quote from Elon Musk in 2017 is that his guess is that in probably 10 years it will be very unusual for cars to be built that are not fully autonomous. We also have to think about the long tail of the fact that many people drive cars that are 10 or 20 years old. So even when every car is built as fully autonomous, it's still going to take time for that dissipation of older vehicles to happen.
The Human Experience: Why Safety Alone Won't Drive Adoption
My own view, beyond predictions — and to take a little pause into the ridiculous and the fun to explain the view — yes, that is me playing guitar in our autonomous vehicle. Now the point of this ridiculous and embarrassing video — I should've never played it — for those of you born in the 1990s, that's classic rock.
The point I'm trying to make, beyond predictions, is that autonomous vehicles will not be adopted by human beings in the near term — in the next 10 to 15 years — because they're safer. Safety is not going to be the reason you adopt. They may be safer, but not so much safer that that alone drives adoption. It's not going to be because they get you to the location faster. Everything we see with autonomy is that they're going to be slower until the majority of the fleet is autonomous. They're cautious and therefore slower and therefore more annoying in the way we actually navigate the world. We take risks, we drive assertively, we go over the speed limit all the time. That is not how autonomous vehicles today operate. So they're not going to get us there faster, and for every promise and every hope that they're going to be cheaper, there's still significant investment going into them and there is not good economics in the near term for making them obviously significantly cheaper.
What I think is that Uber and Lyft took over the taxi service because of the human experience. In the same way, autonomy will only be adopted by human beings if it creates a better human experience — if there's something about the experience that you genuinely enjoy. The videos we're putting out show that in natural language communication, the interaction with the car, the ability of the car to sense everything you're doing — from the activity of the driver to the driver's attention — and being able to transfer control back and forth in a playful but also serious and personalized way, that's really the human experience. The efficiency and richness of the human experience — that is what we also need to solve.
That's something you have to think about, because many of the people who will be speaking in this class, and many of the people working on this problem, are not focused on the human experience. It's a kind of afterthought: once we solve the autonomous vehicle problem, it'll be fun as hell to be in that car. I believe you first have to make it fun as hell to be in the car, and then solve the autonomous vehicle problem jointly.
Levels of Autonomy: Two Real Categories
In the language we're talking about here, there are several levels of autonomy defined from level zero to level five, with increasing automation. Level two is when the driver is still responsible. Levels three, four, and five involve less and less driver responsibility, with parts of the driving where liability falls on the car.
But really, as far as I'm concerned, there are only two levels: human-centered autonomy and full autonomy. Human-centered means the human is responsible. Full autonomy means the car is responsible — on the legal side, the experience side, and the algorithmic side. That means full autonomy does not allow for teleoperation. It doesn't allow for a human to step in and remotely control the vehicle, because that means the human is still in the loop. It doesn't allow for a 10-second rule where it's fully autonomous but once it starts warning you, you have 10 seconds to take over. No — it's not fully autonomous if it cannot guarantee safety in any situation. If the driver doesn't respond in 10 seconds, the car has to be able to find safe harbor. It has to be able to pull off to the side of the road without hurting anybody else. That's the fully autonomous challenge.
Deployment Pathways for Autonomous Vehicles
How do we envision these two levels of automation proliferating in society and getting deployed at mass scale — the 10,000, 10 million, and beyond?
On the fully autonomous side, there are several different possibilities for how to deploy these vehicles. One is last-mile delivery of goods and services, like groceries — zero-occupancy vehicles delivering groceries or delivering human beings at the last mile. The last mile means slow-moving transport to the destination where most of the tricky driving along the way is done manually, and then the last-mile delivery in the urban environment is done by zero-occupancy autonomous vehicles.
Trucking on the highway, possibly with platooning, where a sequence of trucks follow each other — this is considered a pretty well-defined problem of highway driving with lanes well marked, well-mapped routes throughout the United States and globally. Specific urban routes — kind of like what a lot of these companies are working on — defining a taxi service and personalized public transport. There are certain pickup locations you're allowed to go to, certain drop-off locations, and that's it. It's like taking the train, but instead of getting on with 100 other people, you're getting in the car alone or with one other person.
Closed communities — something Oliver Cameron with Voyage is working on, and Optimus Ride — defining a particular community that you have a monopoly over, where you define the constraints, define the customer base, and then just deliver the vehicles. You map the entire road, you have slow-moving transport that gets people from A to B anywhere in that community.
And then there's the world of zero-occupancy ride-sharing delivery — the Uber that comes to you autonomously with nobody in it, and then you get in and drive it. Imagine a world where we have empty vehicles driving around, delivering themselves to you.
On the semi-autonomous side, thinking about a world where teleoperation plays a really crucial role — fully autonomous under certain constraints on the highway, but a human can always step in. High autonomy on the highway, kind of like what Tesla is working towards most recently — on-ramp to off-ramp. The driver is still responsible, liability-wise, and in terms of observing the vehicle, but the autonomy is at a high enough level that much of the highway driving could be done fully autonomously. And low autonomy unrestricted travel as an advanced driver assistance system — the Tesla, the Volvo S90, Super Cruise in the Cadillacs — all these L2 systems that are able to keep you in the lane for maybe 10 to 30% of the miles you drive, taking some of the stress of driving off.
And then there are some out-there ideas. The idea of connected vehicles — vehicle-to-vehicle communication and vehicle-to-infrastructure communication — enabling us to navigate intersections efficiently without stopping, removing all traffic lights. Without traffic lights, and with communication between the infrastructure and the vehicles, you can actually optimize traffic flow to significantly increase the load through a city.
There's the boring solution of tunnels under cities — layers of tunnels. Autonomous vehicles, by the design of the tunnel, constrain the problem to such a degree that the idea of autonomy is completely transformed. A car is able to transform itself into a mini train, a mini public transit entity, for a particular period of time. You get into that tunnel, you drive at 200 miles an hour — or not necessarily drive, but are driven at 200 miles an hour — and then you get out of the tunnel.
And of course there are flying cars, personalized flying car vehicles. Rodney, as I mentioned, does believe we'll have them in 2050. There are a lot of people seriously thinking about this problem. There's a level of autonomy obviously required for a regular person without a pilot's license to be able to take off and land. Making that experience accessible to regular people means there's going to be a significant amount of autonomy involved. One of the companies really seriously working on this is Uber, with Uber Elevate — Uber Air, I think it's called. The idea is that you would meet your vehicle not on the street but on a rooftop — you take an elevator, you meet them at the roof of a building. Many of the great solutions to the world's problems have been laughed at at some point. So let's not laugh too loud at these possibilities.
Who Will Deploy 10,000 Autonomous Vehicles First?
Back to the 10,000 vehicle threshold. Out of curiosity, I did a little public poll — 3,000 people responded. I asked who will be first to deploy 10,000 fully autonomous cars operating on public roads without a safety driver. Tesla got 57% of the vote, Waymo got 21%, someone else got 14%, and 8% — the curmudgeons and the engineers — said no one in the next 50 years will do it.
And again, in 1998 when Google came along, the leaders of the space were Ask Jeeves, Infoseek, Excite, Lycos, and Yahoo — all services I've used, and probably some people in this room have used. Google disrupted that space completely. So this poll shows the current leaders, but it's wide open. That's why there are a lot of autonomous vehicle companies. Some companies are taking advantage of the hype and the fact that there's a lot of investment in the space, but some companies — like some of the speakers visiting this course — are really trying to solve this problem. They want to be the next Google, the next billion, multi-billion, next trillion-dollar company by solving it.
Currently, Tesla with the semi-autonomous vehicle approach working towards full autonomy, and Waymo starting with full autonomy and working towards achieving scale, are the leaders in the space.
The DARPA Grand Challenge and the Difficulty of Driving
Given that ranking in 2019, let's take a quick step back to 2005 with the DARPA Challenge, when the story began. The race through the desert — when Stanley from Stanford won a race that really captivated people's imagination about what's possible. A lot of people said the autonomous vehicle problem was solved in 2005. The idea was — especially because in 2004 nobody finished that race, and in 2005 four cars finished — well, we cracked it. This is it. Some critics said that urban driving is really nothing comparable to desert driving, that the desert is very simple, there are no obstacles, and it's really a mechanical engineering problem, not a software problem, not a fundamentally autonomous driving problem as it would be delivered to consumers.
And of course in 2007, DARPA put together the Urban Grand Challenge, and several teams finished it, with CMU's Boss winning. The thought at that point was: that's it, we're done. As Ernest Rutherford, the physicist, said, physics is the only real science — the rest is just stamp collecting. That was the idea with the DARPA Grand Challenge: we solved the fundamental problem of autonomy, and the rest is just for industry to figure out the details of how to make an app and a business out of it.
The underlying belief there is that driving is an easy task — that it's solvable, that what we do as human beings is pretty formalizable and pretty easy to solve with autonomy. The other idea is that humans are bad at driving. This is a common belief. Not me, not you, but everybody else — nobody in this room, but everybody else is a terrible driver. The intuition we have about our experience of traffic leads us to believe that humans are just really bad at driving.
From the human factors and psychology side, there's been over 70 years of research showing that humans are not able to maintain vigilance when monitoring a system. When you put a human in a room with a robot and say "watch that robot," they start texting about 15 seconds in. That's the fundamental psychology. There are thousands of papers on this. People tune out, they over-trust the system, they misinterpret the system, and they lose vigilance. Those are the three underlying beliefs.
It very well could be true — but what if it is not? We have to consider that it is not.
The driving task is easy — if you think that, and you think it's formalizable and solvable by autonomous vehicles, you have to solve this problem: the subtle vehicle-to-vehicle, vehicle-to-pedestrian nonverbal communication that happens in a dramatic sense but really happens in the subtle sense millions of times every single day in Boston. Subtle nonverbal communication between vehicles — you go, no, you go. You have to solve all the crazy road conditions where in a split second you have to make a decision — in snowy, icy weather, rain, limited visibility conditions, you have 100 to 200 milliseconds to make a decision. Your algorithm, based on the perception, has to make a control decision.
And then you have to deal with the nonverbal communication with pedestrians — these unreasonable, irrational creatures, us human beings. You have to not only understand the anticipated intent of the pedestrian's movement, anticipating their trajectory, you also have to assert yourself in a game-theoretic way. As crazy as it might sound, you have to take a risk. You have to take the risk that if you don't slow down, the pedestrian will slow down. Algorithmically we're afraid to do that. The idea that a pedestrian who is moving — we anticipate their trajectory based on the simple physics of their current velocity and momentum, they're going to keep going with some probability — but the fact that by us accelerating we might make that pedestrian stop is something that we have to incorporate into algorithms, and we don't today. We don't really know how to.
So if driving is easy, we have to solve that too. And of course there are the ethical dilemmas — from the moral machine to the more serious engineering aspects — and the unintended consequences that arise from having to formalize the objective function under which a planning algorithm operates. If the objective function is to maximize reward, you can slam into the wall over and over again and that's actually the way to optimize the reward. Those are the unintended consequences of an algorithm that has to be formalized to the objective function without a human in the loop.
Humans are bad at driving — if they're bad at anything, it's having a good intuition about what's hard and what's easy. The fact that we have 540 million years' worth of data on our visual perception system means we don't understand how impressive it is to be able to perceive and understand a scene in a split second, maintain context, maintain an understanding of the visual localization tasks, anticipate the physics of the scene, and so on. Humans don't give ourselves enough credit. We're incredible.
And the ones on the right in the robotics demonstrations are actually not fully autonomous — there's still some human in the loop, just with noisy, broken communication. Humans are incredible in terms of our ability to understand the world and act in it. The popular view, grounded in psychology, that humans and automation don't mix well — over-trust, misunderstanding, loss of vigilance — that's not an obvious fact. It happens a lot in the lab. Most of the experiments are actually in the lab. You put an undergrad or grad student in a lab and say "watch this screen and wait for the dot to appear" — they'll tune out immediately. But when it's your life and you're on the road, just you in the car, it's a different experience. It's not completely obvious that vigilance will be lost, and it's not completely obvious what the psychology, the attentional mechanism, the vigilance looks like when it's just you and the robot.
MIT Research: What Drivers Actually Do with Autopilot
One of the things we did was instrument 22 Teslas and observe people over a period of two years — what they actually do when they're driving with Autopilot, driving these systems. In red, manually controlled vehicles; in cyan, vehicle-controlled Autopilot. There are a lot of details here and we have a lot of presentations on this, but really the fundamentals are: they drive 34% of miles in Autopilot, and in 26,000 moments of transfer of control, they are always vigilant. There's not a moment in this dataset where they respond too late to a critical or challenging road situation.
Now the dataset is 22 vehicles — that's 0.1% or less of the full Tesla fleet that has Autopilot. But it's still an inkling. It's not obvious that it's not possible to build a system that works together with a human being. That system essentially looks like this: some percentage of the time — 90%, maybe less, maybe more — when it can solve the problem of autonomous driving, it solves it, and when it needs human help, it asks for help. That's the trade-off, that's the balance.
Two Technical Approaches: Vision vs. LIDAR
On the fully autonomous side, all the problems have to be solved exceptionally — from mapping and localization to scene perception, to control, to planning, to being able to find safe harbor at any moment, to external HMI communication with other pedestrians and vehicles in the scene, and then teleoperation, vehicle-to-vehicle, vehicle-to-AI. You have to solve those perfectly if you want to solve the fully autonomous problem, including all the crazy things that happen in driving.
If you approach the shared autonomy side — the semi-autonomous approach, where you're only responsible for a large percentage but not 100% of the driving — then you have to solve the human side: the human interaction, sensing what the driver is doing, collaborating and communicating with the driver, and the personalization aspect that learns with the driver.
Beyond the human side, looking out into the world, people trying to solve the fully autonomous vehicle problem really have two approaches to consider.
One approach is vision — cameras and deep learning. Collect a huge amount of data. Cameras have the highest resolution of information available. It's rich texture information, and there's a lot of it, which is exactly what networks love. To be able to cover all the crazy edge cases, the vision data, camera data, visible light data, is exactly the kind of data you need to collect a huge amount of to generalize over all the countless edge cases that happen. It's also feasible in terms of cost, interest, and scale — all the major datasets are visible light cameras. They're cheap. And whoever designed the simulation we're all living in made it such that our roads and our world are designed for human eyes. The lane markings are visual. Most of the road textures you use to navigate are made for human eyes.
The cons are that without a ton of data — and we don't know how much — they're not accurate. You make errors, because driving is ultimately about 99.99999% accuracy, and it's really difficult to reach that level. And they're not explainable.
The second approach is LIDAR — taking a very particular, constrained set of roads, mapping the heck out of them, understanding them fully under different weather conditions, and then using the most accurate sensors available. A suite of sensors, but really LIDAR at the forefront. Being able to localize yourself effectively. The pros are that it's consistent — especially when machine learning is not involved — it's reliable, and it's explainable. If it fails, you can understand why and account for those situations. The accuracy is higher. The cons of LIDAR are that it's expensive, and most of the approaches to perceiving the world using LIDAR primarily are not deep learning-based and therefore are not learning over time. And if they were deep learning-based — there's a reason they're not — it's because you need a lot of LIDAR data, and only a tiny percentage of cars in the world are equipped with LIDAR to collect that data.
Sensor Comparison: Radar, LIDAR, Camera, and Ultrasonic
Quickly running through the sensors: radar is like the offensive line in football — they actually do all the work and never get the credit. Radar is that. It's always there to catch, to actually do the detection in terms of obstacle avoidance — the most critical, safety-critical function. It's cheap, it does extremely well, and it does well in extreme weather. But it's low resolution, so it cannot stand on its own to achieve any kind of high autonomy.
On the LIDAR side, it's expensive, but it provides extremely accurate depth information — 3D point cloud information. Its resolution is much higher than radar, though still lower than visible light, and depending on the sensor, there's 360-degree visibility built in.
On the camera side, it's cheap, everybody has one, the resolution is extremely high in terms of the amount of information transferred per frame, and the scale of the number of vehicles equipped with cameras is humongous. So it's ripe for the application of deep learning. The challenge is it's noisy, it's bad at depth estimation, and it's not good in extreme weather.
If we use a comparison plot to look at these sensors: LIDAR works in the dark and in variable lighting conditions, has pretty good resolution, has pretty good range, but it's expensive, it's large, and it doesn't provide rich textural contrast information. It's also sensitive to fog and rain. Ultrasonic sensors catch a lot of those problems — they're better at detecting proximity, they have high resolution for objects that are close, which is why they're often used for parking, but they can also be integrated into the sensor fusion package for an autonomous vehicle. They complement radar well. Radar is cheap, tiny, detects speed, and has pretty good range, but has terrible resolution — very little information being provided. Cameras provide a lot of rich information, they're cheap, they're small, their range is actually the best of all the sensors, and they work in bright conditions — but they don't work in the dark or in extreme conditions, and they don't detect speed unless you do some tricky structure-from-motion work.
Here's where sensor fusion steps in — everybody works together to build an entire picture. If you look at the suite that Tesla is using — ultrasonic, radar, and camera — and compare it to just LIDAR, the suite of camera, radar, and ultrasonic is actually comparable to LIDAR. So those are the two comparisons: the costly, non-machine-learning way of LIDAR, and the cheap but data-hungry, not-yet-explainable-or-reliable vision-based approach.
Of course, some will say they're trying to use both — but ultimately the question is who catches, who is the fail-safe. In the semi-autonomous way, when there's a camera-based method, the human is the fail-safe. In the fully autonomous mode — what Waymo is working on — the fail-safe is LIDAR, the fail-safe is maps. You can't rely on the human. But you know this road so well that if any of the sensors are confused, you have such good maps and such accurate sensors that the fundamental problem of obstacle avoidance — which is what safety is about — can be solved. The question is what kind of experience that creates.
The Road Ahead: AI's Defining Challenge
In the meantime, as people debate, try to make money, and start companies, there's just lots of data. The Ford F-150 is still the most popular car in America. Manually driven cars are still happening. Semi-autonomous cars — every company is now releasing more and more semi-autonomous technology. That's all data. And what that boils down to is the two paths walking towards each other: vision versus LIDAR, L2 versus L4, semi-autonomous versus fully autonomous.
Tesla on the semi-autonomous front has reached one billion miles. Waymo, the leader on the autonomous front, has reached 10 million miles. The pros and cons as I've outlined them. The vision approach — the one machine learning researchers are obviously very excited about — fundamentally relies on huge data and deep learning. The neural networks running inside Tesla, and with their new hardware — it's kind of the same path as Google took from GPU to TPU — Tesla is taking from the Nvidia Drive PX2 system, a more general GPU-based system, to creating their own ASIC and having a ton of neural networks running on their car. That kind of path, which others are beginning to embrace, is really interesting to think about for machine learning engineers.
And then people who are more grounded and really value safety, reliability, and come from the automotive world, are thinking: machine learning is not explainable, it's difficult to work with, it's not reliable — so we need a sensor suite that is extremely reliable. Those are the two paths.
Question from audience: There are all kinds of things you need to perceive — stop signs, traffic lights, pedestrians, and so on. Some of them, if you hit them, it's a problem. Some of them are a bag flying through the air. They all have different visual characteristics and different characteristics for all the different sensors.
Lex Fridman: LIDAR can detect solid-body objects. Camera is better at detecting things like fog or smoke — these are interesting things that might look like an object to certain sensors and not to others. But the traffic light detection problem, luckily, with cameras, is pretty solved at this point. That's luckily the easy part. The hard part is when you have a green light and there's a drunk, drugged, drowsy, or distracted driver — the four Ds — that hits a pedestrian trying to cross. That's the hard part.
The road ahead for us as engineers — the science is the thing I'm super excited about. The possibility of artificial intelligence having a huge impact is taking the step from having these large but still toy datasets, toy problems, toy benchmarks — ImageNet classification, COCO, all the exciting deep RL stuff we'll talk about in future weeks, the game of Go and chess and so on — and taking those algorithms and putting them in cars where they can save people's lives and directly touch and impact our entire civilization. That's actually the defining problem for artificial intelligence in the 21st century: AI that touches people in a real way. And I think cars — autonomous vehicles — is one of the big ways that happens. We get to deal with the psychology, the philosophy, the sociology aspects of it, how we associate and think about it, to the robotics problem, to the perception problem. It's a fascinating space to explore, and we have many guest speakers exploring it in different ways. That's really exciting to see — how these people are trying to change the world.