Podcast transcripts, polished for reading

Sertac Karaman (MIT) on Motion Planning in a Complex World - MIT Self-Driving Cars | Lex Fridman Transcript

Polished transcript · Lex Fridman · 13 Dec 2017 · 1h 2m · @martymcfly

MIT lecture on motion planning algorithms and autonomous vehicle systems by Sertaç Karaman

Sertaç Karaman, MIT professor in the Aero-Astro department, delivers a guest lecture in Lex Fridman's MIT self-driving cars course.

Summary

Sertaç Karaman traces his career from the DARPA Urban Challenge through his current research on autonomous vehicles and motion planning algorithms. He explains the fundamental flaw he discovered in the widely-used RRT (Rapidly-exploring Random Tree) algorithm — that it fails to converge to optimal solutions — and describes his doctoral contribution, RRT*, which guarantees asymptotic optimality. He recounts MIT's experience in the 2007 DARPA Urban Challenge in detail, including a collision with Cornell's vehicle caused by a map-refresh bug, and reflects on how that academic effort seeded much of today's autonomous vehicle industry. He closes with broader observations on how autonomous vehicles, combined with electrification and ride-sharing business models, could fundamentally reshape urban transportation and reduce its cost to near zero.

Key Takeaways

  • RRT has a provable convergence flaw. The standard Rapidly-exploring Random Tree algorithm does not converge to optimal paths — early trajectories constrain the search space and trap the planner in suboptimal loops. Karaman's doctoral thesis introduced RRT*, which corrects this with minimal additional computation while guaranteeing asymptotic optimality.
  • The DARPA Urban Challenge seeded the modern autonomous vehicle industry. All six finishers were university teams. The Google self-driving car program drew heavily on the sensing architecture developed for the challenge, and MIT's "non-serious" entry later became Cruise Automation, acquired by GM for approximately one billion dollars.
  • Sensor proliferation was MIT's strategy to compensate for inexperience. The team mounted five cameras, sixteen radars, twelve planar laser scanners, one 3D laser scanner, and one GPS unit on a Land Rover LR3, requiring a 40-core server rack, an onboard generator, and a rooftop air conditioner. The Velodyne 3D laser scanner was identified as the single most critical sensor for completing the challenge.
  • The Cornell collision illustrates a fundamental inference gap in robotic planning. Cornell's vehicle became stuck because a perceived obstacle on its roof prevented map updates; when MIT's car passed close enough to trigger a refresh, Cornell's car moved into MIT's path. A human driver would have stopped, investigated, or steered wide — behaviors that remain computationally difficult for autonomous systems operating at speed in complex environments.
  • Controller compression makes high-frequency autonomous control tractable. A six-dimensional drone controller naively requires petabytes of state-action mappings. Using singular value decomposition-style compression reduces this to roughly two megabytes — ten orders of magnitude — enabling kilohertz-rate control from a lookup table computed offline on a supercomputer in approximately five minutes.
  • The autonomous vehicle problem is not purely technical. Karaman argues that law, insurance, regulation, urban architecture, and business model innovation are equally important barriers. He suggests that combining autonomy, electrification, and ride-sharing could reduce point-to-point urban transportation costs to under one dollar, potentially transforming city design and human mobility behavior.
  • Camera-only autonomous navigation is likely achievable within a decade. Karaman expresses confidence that a vehicle navigating with cameras alone — using deep learning combined with geometric model-based techniques — will be demonstrated within roughly ten years, while dismissing near-term claims as premature.
  • Vehicle-to-vehicle communication enables qualitatively new behaviors at intersections. Fully coordinated autonomous intersection management — where vehicles pass through at high speed with minimal spacing — requires reliable inter-vehicle communication and is currently far from deployment, but simpler applications such as cooperative lane-following could appear much sooner.
  • FULL TRANSCRIPT

    Introduction and Speaker Background

    Lex Fridman: Our speaker is Sertaç Karaman. He is a professor here at MIT in the Aero-Astro department. He builds and studies autonomous vehicles that move on land and in the air — that includes ones that have 18 wheels and two wheels and everything in between, robots that move fast and aggressively, and robots that move slowly and safely. He takes both the formal optimization-based approach and the data-driven deep learning approach to robotics. He's a mentor to me and many other researchers here at MIT and beyond. And while he is one of the leading experts in the world in building autonomous vehicles, for the nerds out there, he still programs — he programs on a Kinesis keyboard and uses Emacs, which is how you know he's legit. Please give a warm welcome to Sertaç Karaman.

    Sertaç Karaman: Thanks a lot. I really had the pleasure to work with Lex for some time, and it seems like this class is something he and the TAs have put together as an amazing course. I'm really happy to be here.

    He gave me this title — "past, present, future of motion planning" — or something like that. Hopefully that's not quite exactly what you were expecting. I took a whole bunch of slides from different talks and put them together, and I'm hoping to go through as much as I can and tell you some of the interesting things happening in this domain, and touch upon motion planning at some point.

    A starting point would be to tell you a little bit about my background. It is exactly a decade — probably today — that I shook hands as a graduate student and joined the DARPA Urban Challenge team. It's been exactly a decade. We worked through it with a number of people, some of them are in the audience. At the time that we were doing these kinds of things back in the day, it was an academic project. You can look at the DARPA Urban Challenge teams and you'll recognize they're all university teams — at least all the finishers. And it came from an academic project to the thing that's going to change the world in ten years. I hope to give you a bit of a history and then some thoughts on that as well.

    Graduate Research: Autonomous Forklift and Early Motion Planning

    Sertaç Karaman: So I started graduate school with this. We built these vehicles that I'm going to talk to you about a little bit. This was our entry to the DARPA Urban Challenge — a Land Rover LR3 that we made autonomous, that navigated through that course, and it was one of the six finishers. A number of my friends went out and did their own careers. We stayed here at MIT and built a number of other autonomous vehicles.

    Let me show you one thing that we were doing — I was the motion planning lead for this autonomous forklift. It was a forklift that you could literally take a megaphone and speak to. You could say "forklift, go to X, Y, Z" and it would go to that location. Here it's trying to go to receiving, which happens to be an area where trucks pull up with pallets so that you can pick those pallets up and put them back. It has a front camera, it looks through that camera, and it beams that camera image to a handheld tablet device made by Nokia — back in the day there was a company called Nokia that made phones and handheld devices. So you could see what it's seeing. You could circle something and the thing would scan it and take a look at it. It'll scan through the pallet, it'll pick it up.

    One thing I would like to show you is that once that's done, you can also talk to the tablet. The tablet would recognize your voice and then command the robot to do that kind of thing. This was before autonomous cars, before iPhone, before Alexa, before Siri. So I spent a couple of years doing this type of project, which really shaped my PhD thesis.

    The RRT Algorithm and Its Fundamental Flaw

    Sertaç Karaman: Later, when I started as a faculty member, I also worked on a number of things. Throughout these projects I focused mainly on motion planning. The one algorithm I was working on was called the Rapidly-exploring Random Tree — RRT. The idea is quite simple. You're starting in the middle of a space — there's an orange dot that you're starting from, you want to go to the magenta goal region, there are red obstacles, and you want to find a path that starts from the initial condition and goes to the goal. That's the very basic motion planning problem.

    It turns out this problem is computationally pretty challenging, especially as the number of dimensions increases. This province is two-dimensional, but if you increase the number of dimensions you can prove that any complete algorithm — meaning any algorithm that converges to a solution if one exists and returns failure otherwise — will scale exponentially in computation time. So at some point you're going to run out of memory or time.

    The algorithm I was working on, RRT, has a simple idea: you just land a bunch of samples. Every time you put down a random sample, you connect it to the nearest node in a tree of trajectories that you're building. In this way you rapidly explore the state space to find a whole bunch of paths. Some of these paths may reach the goal, and that's the path you pick. It's just sampling the environment, trying to build this set of trajectories that don't collide with obstacles. If a trajectory hits an obstacle you just delete it and move on with other samples, and you build this kind of tree.

    It's an algorithm that's pretty widely used and it goes well beyond these simple cases. For example, in our Urban Challenge entry we were using this algorithm. You're seeing the algorithm in action — we're trying to park at a location during what DARPA called the enqueue event. You can see a whole bunch of cars that our vehicle is seeing, generating this map — red is obstacles, black is a drivable region. It's going to try to park into it and then unpark. You're seeing something hairy here — that's a set of trajectories generated by the RRT algorithm. It's trying to unpark, go there, generating trajectories and picking the best one. We used this algorithm throughout the race and it worked.

    When we switched to the forklift platform, we started working on this and realized that the forklift tries to go to park in front of a truck, and it finds this trajectory. At some point it discovers there's an obstacle and it finds this looping trajectory, and it never gets out of that loop. You would think that since it's trying to minimize path length it would be easier to come up with something that just turns left and aligns, but it turns out that once you have that loop, even if you add more samples, you're stuck with it and you would never improve that trajectory.

    Back in the day, Professor Teller — who passed away unfortunately a couple of years ago, but he really pushed me — was telling me, "This doesn't work. Every time it just makes this loop right in front of the Army generals who are the sponsors and it just looks ridiculous. You need to fix this." Trying to find the fix for it, we realized that the algorithm actually has some fundamental flaws.

    Specifically, we were able to write down a formal proof that the RRT algorithm fails to converge to optimal solutions. You would think that if you add more samples you will get better and better trajectories, but it turns out that the first few trajectories you found constrain you — they close off the space that you want to search and you're stuck with bad trajectories. This almost always happens. Sometimes you're lucky and your bad trajectory is good enough, but most of the time it's pretty bad.

    RRT* and Asymptotic Optimality

    Sertaç Karaman: We were able to come up with another algorithm that we called RRT — RRT-star — which just does a little bit more work but guarantees asymptotic optimality, meaning it will always converge to optimal solutions. The computational difference between the two is very little. If you were to run them side by side, RRT would look like this: it's just looking at the paths locally and correcting them locally, just a little bit. That little bit of correction is enough to converge below the optimal trajectory.

    That turned out to be my doctoral thesis, back in 2011. We applied it to a number of things. Imagine a race car coming into a turn — it turns very quickly, generates these trajectories. The right thing to do is to slow down a little bit, start skimming one end of the road, then start speeding up and go as fast as possible so that you hit the other end of the road and complete the turn. These kinds of things come out naturally from the algorithm. You don't have to program them in — you just run the algorithm and this is the best trajectory it finds. It would be impossible to get something like this from RRT. We applied it to a number of other robots as well — PR2-type robots, autonomous forklifts — and got good results.

    Current Research: Agile Vehicles and High-Performance Control

    Sertaç Karaman: So that gives you a bit of an idea of my background — my graduate school experience and the PhD. Let me tell you quickly what my research group does. We do a lot of things, in a fortunate and unfortunate way, so it's hard to find the focus sometimes, admittedly. I usually tell people that we work on autonomous vehicles. The problem is quite interesting both at the vehicle level — meaning how you're going to build these autonomous vehicles individually — and also interesting at the systems level, because most autonomous vehicles are most valuable when you put them into a system where they can work together.

    Let me give you some examples. A system of autonomous vehicles would be, for example, the Kiva system scenario. Nowadays when you buy something from Amazon, the way it's packed is that books are brought by robots to a picker, and the picker puts them into the same box and sends them to you. This is done by 500 autonomous vehicles, for example. Another example is that ports around the world are working completely with autonomous vehicles and cranes. If you project a little bit forward, you can think about drone delivery systems where drones may not have enough battery and have to relay packages to one another, so you need to build a system. Or if you have autonomous cars, maybe it's best to use them in an Uber-like scenario — autonomous taxis that work together.

    On the vehicle level, we're interested in all aspects of perception and planning. The challenges are either computational complexity — it's very hard computationally — or the system becomes very complex. We're recently motivated by really fast and agile vehicles. For example, imagine there's a drone flying and you want to catch it in flight. The Netherlands police have people flying UAVs around and you somehow want to take them down — you can't shoot at them, so people train eagles and things like that. We thought it would be great to actually build these types of robots.

    Once you start doing these kinds of things, you wonder how much you can push the boundaries of very agile vehicles and systems. You can see a falcon diving for prey — a goose — at the last split second. If you look at the scene from a 20-hertz camera, this is what you would see. They are definitely much faster and do very complicated planning and maneuvering.

    In the research group we look at a number of different perception problems where you have multi-agent systems with ultra-high-rate cameras. For example, we have drones with 200-hertz cameras, and you're trying to understand the person you're tracking — their dynamics, their intentions. On the control level you're trying to pull off really complicated maneuvers like the one you've seen with the race car, but in real time at roughly a kilohertz. How can you do these types of things? We use a lot of high-performance computing. For example, the drones that we have actually have GPUs on them — they fly teraflop computers to be able to do these kinds of things.

    Here's an example of a GPU-equipped drone just passing through a window. These are controllers that we compute on supercomputers and deploy. On the perception side, for example, we're looking at things like visual odometry — you can just have a camera and look through the world from the camera and try to understand your own position. We have certain algorithms to pick the features just right so that you can do these things with just about ten features, making them computationally very efficient.

    Controller Compression: From Petabytes to Megabytes

    Sertaç Karaman: The question was: what do you mean by computing the controllers? Controllers are actually pretty complicated objects. You have a drone — suppose it has six degrees of freedom. Six-dimensional space is very large. Suppose you discretize every dimension with 200 points. 200 to the sixth power would be thousands of trillions. If you were to write one byte for every point in the state space — recording what action to take at that position and orientation — it would make 2.5 petabytes of controller. It's pretty large.

    But it would be very surprising from an information-theoretic standpoint if it really required thousands of trillions of parameters to describe. How complicated is it really? Millions maybe, but trillions? So what we do is take very simple controllers — for example, "do nothing" — compress them like data compression, work on the compressed versions, and that compressed version grows to a level that comes down to something like two megabytes. That's probably essentially what you would need rather than three terabytes.

    We use singular value decomposition-type techniques to do the compression. You may have done the same thing with images — if you compress an image with JPEG you save an order of magnitude. If you compress video you save two to three orders of magnitude, because video is three-dimensional. As you increase the dimensions there's more to compress. When you compress this way it saves ten orders of magnitude, which honestly is no surprise when you think about it. We compute them in about five minutes, which gives you a lookup table that's two megabytes. You put it on the drone so that you can quickly execute it. That lookup table essentially gives you kilohertz-rate control.

    Autonomous Intersection Management

    Sertaç Karaman: On the systems domain, let me show you maybe the most interesting — possibly the most crazy — thing off of my hard disk. Imagine you have a whole bunch of vehicles coming to an intersection. Suppose they're fully autonomous. How would you make it so that they would pass through the intersection as fast as possible?

    If you were to really utilize algorithms that do that, here is what it would look like. You would have vehicles coming in — it looks like they're getting very lucky, but really what's happening is that they're just speeding up and slowing down just enough to avoid one another. You can actually sit down and do some math and try to understand, given the dynamics — your acceleration and deceleration limits — how fast you can push these things. Maybe it doesn't immediately apply to self-driving cars, but certainly you can use it in warehouses and things like that, which would actually improve operations quite a bit.

    The question is whether anyone is working on robustness aspects of distributed control. That's a very good point. We have looked at things from the theoretical perspective. It turns out that even in this case there's something like a critical density. Below the critical density things are very simple — you're going to be robust, you're going to be able to find paths and execute them. Above the critical density things are very hard. It's very fragile — if something fails the whole system can crash into one another. This is no surprise — it's the physics of many systems. It's the same thing as, say, heating a substance: there's a critical temperature above which it looks different and below which it looks like a liquid. You can use the same kind of theoretical arguments to come up with these types of results.

    We are far away from this kind of thing in practice. The main problem is partly control — we don't fully understand the control aspects — but we also don't trust our sensors. So probably more of the research needed for implementation is on the sensor side.

    Overview of Current Projects

    Sertaç Karaman: We have been doing a number of other projects on autonomous vehicles. We have an autonomous tricycle — that may sound funny, but it's actually pretty hard to test with autonomous vehicles. We currently have five of these and we're hoping to build thirty, so that we can put them in an enclosed robotic area in Taiwan where they're just driving around collecting data that we can feed into deep learning algorithms.

    We also have, in an ABB warehouse, one of these warehouse robots. It's supposed to be very easy to interact with — imagine a warehouse robot where you can just talk to it, tell it what to do, show it, or hop on it and do it yourself. I'm also a PI together with Daniella Rus on MIT's effort with Stanford and Toyota to build safer vehicles. And from golf carts we've moved into doing electric vehicles in the MIT-Singapore partnership, working on integrating a lot of electric vehicles together to make them work more nicely. We've also been looking into an autonomous wheelchair in that project.

    My group works on a number of other projects in this domain. Admittedly, my group is a bit more on the theory side as well — maybe half the group is theory-oriented, the other half is more experimental. We have quite a spectrum: we have mathematicians — people who don't have any engineering degrees, like one postdoc who is a mathematician by training — and we have undergraduates and graduate students whose undergraduate degrees are from mathematics. On the other hand, we have mechanical engineers who actually build things.

    The DARPA Urban Challenge in Detail

    Sertaç Karaman: Let me tell you a bit more about our DARPA Urban Challenge effort, so I can tell you a little more about how we implemented these motion planning algorithms.

    I'm sure many of you have heard that the DARPA Urban Challenge kind of kick-started all the autonomous vehicle work that's been going on. Let me introduce it a little bit. This was the third in a series of DARPA Grand Challenges. The idea was that you would take a street-legal vehicle, instrument it with sensors and computers, and enter this race to drive 60 miles in under six hours in urban traffic — with other vehicles driving around as well. The race was proposed back in 2006 and staged in November 2007. It was pretty hard — you would have to do a lot of different things like U-turns, gate points, you'd have to be careful with stop signs, and so on. If you won, they would give you two million dollars. Eighty-nine teams entered the race.

    MIT had a non-serious entry — I guess the team that later turned into Cruise Automation, which GM ended up buying for a billion dollars. So the non-serious one turned out to be the serious one. Our team had mainly MIT faculty, postdocs, and students — about eight full-time graduate students, roughly. I was one of them. We had a lot of support from Draper Laboratory, mainly on system integration and vehicle integration, and some vehicle engineering support from Olin College. Olin College came in and packaged the vehicle nicely after we had a first version with cables coming out everywhere.

    We took a Land Rover LR3 — one of the sponsors, and also conveniently a pretty big vehicle. We put a drive-by-wire system on it — this is a drive-by-wire system for people who are disabled, like a little joystick-type device so you can actuate gas and brake. It came in very handy. We needed to put a lot of sensors on it.

    I wish this wasn't recorded, but our situation was the following: there were a lot of other teams out there who were very experienced — they had done the other Grand Challenges before. We were not as experienced. I would say our team was talented but not experienced. We had a lot of sponsors, so we had a lot of money. Our strategy turned into: if it fits on the vehicle, let's put it on the vehicle and we'll figure out a way to use it. With that mindset we ended up with five cameras, sixteen radars, twelve planar laser scanners, one 3D laser scanner, and one GPS/IMU unit.

    This was a lot of sensors. They generated a lot of data that you had to process, so we had to buy a 40-CPU, 40-gigabyte-RAM Quanta computer that would normally run on a Google server rack — essentially ten computers that we had to put in the vehicle. We used to joke that this was the fastest mobile computer on campus, both in terms of speed and compute power. This requires a lot of energy, so we put in an internal generator. That generates a lot of heat, so we put an air conditioner on top. That became our vehicle.

    One thing to note is that while the number of sensors and computers was large, the sensor suite was very similar to the other teams that finished. One important sensor was the 3D laser scanner. This is the thing that sits on top of the vehicle — it looks like a Kentucky Fried Chicken bucket — and it has sixty-four lasers that measure range. Those sixty-four lasers are stacked up on a vertical plane, and that plane rotates at 15 hertz, giving you a 3D point cloud. If you drive with it in Harvard Square, here is what the raw data looks like — colored by height. You can easily pick up a bus, a building, a person, a bunch of others. That gives you great data already.

    This sensor is made by a company called Velodyne. It came pretty much just in time for the Urban Challenge. My guess is that if you didn't have this 3D point cloud it would be pretty hard to complete that challenge. There was only one team that didn't have it and still completed the course — they had a 2D laser scanner that was turning, essentially building their own Velodyne.

    We also had twelve planar laser scanners to cover the blind spots of the vehicle — the 3D scanner is on top, so you're not seeing the area nearby. We had five from the push rooms looking down and seven on the skirts. You're seeing the curbs and vehicles that are very close. We had sixteen radars — radars are great, they can see very far. Laser scanners would see 70 meters; radars would see twice as much. The problem is they have a very narrow field of view, so we needed sixteen of them to cover 270 degrees around the vehicle. You can park somewhere and see a whole bunch of other vehicles coming through — it helps quite a bit.

    Finally, we had five cameras. We were using cameras to look at lane markings. I think we were actually the only finishing team using cameras for any purpose. The other vehicles were just working with the laser scanner. We were mainly working with the laser scanner but picking up lane markings with the cameras.

    The Software Stack

    Sertaç Karaman: The algorithmic stack gets pretty complicated. By the end of the race we probably had on the order of hundreds of thousands of lines of C code — maybe around two hundred thousand. The forklift had about half a million lines of code; this was a bit less. We had around a hundred processes running and sending messages to one another on that forty-core system.

    The software diagram was huge, so I simplified it. You have sensors, you get that data, you process it through perception algorithms, you generate a map of the environment close to the robot, and you have a three-tier stack. You have a navigator — much like your Google Maps — that computes a route to get to your next goal, which may be kilometers away, and it also gives you the next waypoint that you should hit, hopefully within your grid map. There's a motion planner that looks at the map, sees all the obstacles, sees the goal point, and finds a path to get to the goal point using RRT. Once that trajectory is computed, it's passed to a controller that actually steers the vehicle.

    The Cornell Collision

    Sertaç Karaman: It doesn't always go well. Let me show you what doesn't work.

    Here's an example of a case that we got into. What's happening is we arrive at an intersection and there's another car — it's Cornell's car — and they're just sitting right in the middle of the intersection and they don't seem to be moving. I think they had been sitting there for a few minutes before we even arrived. DARPA decided they should let us go, and we're probably going to take over. It was going to be an important moment in robotics history — for the first time a robot takes over from another robot while the other robot is stuck.

    Here's how we're seeing things from inside our car. Our car is here, wants to go there. RRT generates trajectories. There's an object here — that's the car we're seeing, not all of it but a fraction of it. We were actually able to turn around it. Now we've seen the whole car, the new goal point is further away, we're regenerating trajectories — looks great.

    It turns out this car was stuck for a reason. We wrote a paper together with them. My understanding is that they think that an obstacle was perceived on top of the car. The way their algorithm was written, it generates a trajectory and asks if the trajectory is collision-free or not. The collision checker doesn't say "this part of the trajectory is in collision" — it just says yes or no. They also had a little piece where they only update the map when there's new information from the sensors. If nothing is moving, there's no need to update. So they ended up getting stuck on this perceived obstacle and not refreshing their map — until we moved right next to them. Then they refreshed and said, "Oh, I'm actually not sitting around an obstacle — that was an error." So the next time a path comes going forward, it says "this is a great path, go forward." That happens right when we're passing.

    If you look at this blob, as I play it the blob starts to move — they are going in a direction that we are going. At some point our car realizes there's no path, a collision is imminent, and there's nothing to do about it. It generates a wide circle around itself, which basically means we are headed to a crash — we're just going to slam the brakes and hope nothing too bad happens. The collision happens. DARPA pulled the Cornell car back, started us again, and we finished. They finished as well. So both teams finished.

    You can see some of the things that are a little bit hard. If you yourself were driving and arrived at an intersection with a car sitting there, you would probably stop, get out, and ask if anything was wrong. Even if you didn't do that, you would still steer wide — you probably wouldn't get as close to that car as we did. There are some problems at the inference level that we do without even thinking, and they're actually quite hard for these types of cars to do, especially if you're going fast in a complicated environment and not expecting things. You make that decision — like looking at the way a person walks on the sidewalk and deciding whether they might walk into the street — and it's actually a pretty complicated thing for a robot to do.

    Race Results and the Google Connection

    Sertaç Karaman: So the results of the race: 89 people started, six finished, we were one of the finishers. CMU came first and got the two-million-dollar check. Stanford came second and got a one-million-dollar check. Virginia Tech came third and got half a million. We came fourth and didn't get anything, but we got a lot of experience and it was great to be a part of it.

    One note is that the Google car that you may have heard a lot about was essentially a spinoff from this race. If you look at the Google car you will see that the sensing package is very similar — it's very laser-scanner-oriented, has a couple of radars, and works somewhat with cameras but not so much. Essentially Google engineered what we and all the other teams built independently, and engineered it for ten years. That's the kind of thing they utilize nowadays.

    There's also the whole Tesla brand of camera-based cars and deep learning that's coming in. Ten years ago we knew about deep learning and so on, but it just didn't work. The moment somebody figured out doing it on a GPU, it started working pretty well.

    Reflections on Collaboration and Competition

    Sertaç Karaman: The question was about collaboration — back in the day we were really collaborative. It's very interesting that we actually wrote a paper with Cornell about our collision, just to teach the whole community why these kinds of things happen. But nowadays everybody is just doing their own thing and there's no going out.

    There's a quick answer and a broader answer. The quick answer is that it became important — there's a lot of money invested and people are expecting returns, and that affects the environment. That definitely drove it. I think we're still trying to work on it in academia and trying to publish papers, but a lot of people are worried about competing with these huge companies.

    The broader answer is that this became a norm. Back fifty years ago, you would look at the top company of the day — starting from a century ago, like Bell Labs for example — they would form labs and publish in the top science journals and be very open. Nowadays the big companies of the day prefer secretive labs. Microsoft was probably the last big company of the day to do it openly. Nowadays Googles and Apples don't do that anymore. Sometimes competition is good — it's a good thing that people feel like they don't know what the others are doing and want to compete, so that makes everyone better and better.

    Simulation and Testing

    Sertaç Karaman: The question was about simulation — were your simulated environment and your development environment separate or integrated? They were very integrated. Right now I think you would do things differently.

    We had one platform where you could run the whole software stack. If you started up a simulator it would simulate all the sensor data and everything. If you didn't start a simulator, the processes would be waiting for the data to come in from a real vehicle. So you could put it on a real vehicle or run it in simulation.

    Simulation really helped. On the day before the race, my 24/7 job was to keep simulating our algorithms — I had two computers, start simulation here, start simulation there, look at it, if one fails log it and send it out. Simulation has come a long way. Nowadays you can run things that you can show to people and it's very hard for people to tell it's not real. That wasn't the case back then.

    Testing is very important and will be very important for the future. Admittedly, we didn't have too much testing. I think the race itself was the farthest we had driven without any human intervention — before then we hadn't done that much. The race was 60 miles; I think we had done maybe a 20-mile stretch or something like that. We started maybe a year before, put together some of the infrastructure, but the vehicle itself — the race was in November and we probably got the clean vehicle in April, then put the sensors on over the summer. Draper Laboratory helped out a lot with the testing. If we didn't have them we probably would have failed outright.

    Autonomous Vehicles and Urban Transformation

    Sertaç Karaman: Transportation is a very interesting thing — it actually defines how you live quite a bit. If you look at the cities you know today, they look the way they do thanks to one invention: the affordable car. Throughout the last century, in the 1950s, cars were big and you would find subways being constructed for the first time. The reason was that cities were dirty and deemed disease-prone. Now you had the car — you could move way out into a better living lifestyle. That was the 20th century invention.

    It also changed the cities quite a bit. For example, Boston's central artery was built in the 1950s to service the cars coming in and out of the city. In some places at the extreme, like Los Angeles, you will see the suburban sprawl — it's very different from other places. Places that didn't have the time to expand, the resources to expand, or just didn't have the place to expand — it caused many problems. Even in rich countries this quick expansion just doesn't work and creates, if anything, ugly environments. In some places you need to be dense and big, so you have the cars but you just have to build big buildings that you cannot even serve with cars.

    Pollution and energy consumption — a lot of it comes from cars, especially inside cities. An interesting point is that if you look at cars, they're actually pretty inefficient the way they sit currently. If you look at BMWs over the years, you would see that they get heavier and they get faster — this is very correlated. If you get faster you have to become heavier because you have to pass crash tests. A BMW that you would buy in the 1970s would weigh something like 2,500 pounds; nowadays it's like 4,000 pounds roughly. The average passenger weight it's carrying is about one twenty-fifth of the vehicle's weight. In terms of size, it's about ten times the size of the passengers it carries. In terms of parking spots — usually in the United States for every car we have two parking spots. In some places parking spots take up like half the city. On average it's about one third.

    You might ask the question: is this the kind of environment you really want to live in? A lot of the infrastructure, a lot of the things that you see, are made for cars. For instance, we never walk on the street nowadays — streets are for cars. It wasn't like that a hundred years ago. You could walk on the streets however you wanted. Cars came in and took it over and changed the urban landscape quite a bit.

    It seems like there's an opportunity today to actually use 21st-century technologies — this could be robotics, but also online services, new business models, high-performance computing, and so on — to service the needs of people in cities. My guess is that people could be more mobile than they are. If transportation was very accessible and very easy, I think they would be more mobile, and that would generate more economic activity. For example, if transportation is more available and more affordable and it changes behavior — makes you more mobile — like, you'd be fine with having a class here and then 20 minutes later having a class at Harvard. Nobody would do that nowadays, but if it was that easy to get there you would probably do it. That's the way you ultimately generate more economic activity.

    I think you can make this even better by integrating a few things. One thing you can integrate is sharing — you can make an Uber-type scenario. You can use autonomy as well. And finally, electrification — especially if you're going a little bit slower, so you don't have to pass crash tests and things like that — you could really reduce the cost of transportation to the point where you could imagine going from anywhere to anywhere else in Boston for $0.99 with a five-minute wait time. If you want to share your ride it could be 50 cents. If you want to add one stop — a lot of us do one stop, you take a subway and then take a bus — one stop makes your transportation much cheaper. If you were to take an airplane and you wanted to take one stop, you could pay 30 cents and go anywhere to anywhere else in Boston.

    I think there's a good opportunity to utilize technology to bring the cost and availability of transportation to a point that really changes a lot of things.

    The Technological Landscape: Speed vs. Complexity

    Sertaç Karaman: The way I usually look at the technological landscape is that you can imagine a space of speed versus complexity. Speed is the speed of the vehicle involved, and complexity is the complexity of the environment you're dealing with. You can have high-speed, low-complexity environments like highways — they're actually easy to work with. We might actually conquer them in the next three years or something like that. Another thing would be parks or university campuses — much slower but much more complex, with people walking around. Fully autonomous driving in all conditions is probably pretty far, but there is some opportunity to do some interesting things elsewhere very quickly.

    One of the problems is that this is not just a technology problem. As you've seen, there's a lot involved in architecture — how do you actually utilize the city best? But one of the biggest problems ends up being law, insurance, and regulations. There are good and bad aspects — sometimes the law allows you to do certain things, but then is it really a safety hazard? Is it ethical to allow a lot of people to just test stuff around? Going forward, if I could say one thing to you: this is not just a problem in technology. It's also a problem in technology, society, policy, architecture, law, insurance, and business. You may need new business models. I personally think the opportunity is right out there, but we still need a bit more thinking to be able to attack this problem and really enable it so that you can do good and interesting things with it.

    Questions and Discussion

    Lex Fridman: A lot of this class is about deep learning, and in terms of autonomous vehicles, deep learning is mostly focused on the vision sensor or cameras. How far away are we from a car that safely navigates the streets of Boston without lidar and without any mapping — purely on cameras?

    Sertaç Karaman: It's a bit of a guessing game, to be honest. I am a big believer in computer vision and I do not think it's too far away. Cameras are actually pretty good sensors. The only problem with cameras is that there's a lot of data and little information, and you need to fish it out. It seems like the computers are coming to do that. I would be surprised if in ten years you can't build a car that just has a bunch of cameras and navigates with cameras, period. It would be very surprising to me. It would also be surprising if it happens next year, as some people are saying. Somewhere in between — I would think in three to five years you would be able to. I would suggest that if any of you are working with cameras, deep learning is an excellent technique. Try out model-based techniques as well — they're also coming along pretty well. Probably a solution that integrates them as best as possible would be viable. I'd be surprised otherwise.

    Audience member: On autonomous intersections, what role does communication play?

    Sertaç Karaman: If you wanted to do the crazy things I've shown you, you need to make sure everything communicates — and if everything doesn't, it would break pretty badly. I would actually imagine that one interesting thing to quickly do would be to have cars communicate with each other to do some interesting things — not just maybe intersections, but lane following and things like that. There are a few things you may see pretty quickly with autonomous cars related to vehicle-to-vehicle communication or vehicle-to-infrastructure communication. You could put up a camera on infrastructure and people could tune into it. The biggest problems are cybersecurity, to be honest, to deploy these things on autonomous vehicles.

    You could see things like sharing — you have a button, you press it, you timeshare. Or you can use autonomy technology for safety — that's a different type of sharing. Or you can have autonomous vehicles in isolated environments. With communication I think you can quickly see lane following and maybe coordination at intersections. With autonomy there are certain things we might see that don't involve communication at all.


    Polished transcript of Lex Fridman. All views are those of the original speakers. Watch on YouTube ↗
    Published by @martymcfly
    More from Lex Fridman
    More from @martymcfly
    Summary