We have all seen the videos circulating on social medias of multiple drivers comfortably sleeping at the wheel of their Teslas, or eating a burger whilst driving. But whereas the autonomous systems being trialled by the likes of Google's Waymo make heavy use of Lidar laser scanning, Tesla's achieves similar feats without this, relying instead on clever computing. WhichEV analyses how the Artificial Intelligence system of Tesla works differently from the competition.
Tesla’s Senior Director of AI, Andrej Karpathy presented Tesla’s methods for training its AI at the Scaled ML Conference in February. He explained how the company achieved the accuracy of traditional laser-based lidar with just a handful of cameras.
However, the high quality of this system doesn’t depend on the cameras themselves. Rather, engineers have built complex processing and neural nets to make sense of the wide range and quality of inputs. Also, Tesla’s AI team has built a ‘pseudo-lidar.' This combines the lines between traditional computer vision and the powerful point map world of lidar.
Traditional lidar-based systems rely on an array of invisible lasers or similar tech. These send a huge number of pings out into the world to detect surrounding objects, in order to create a real-time 3D visualisation of the world around the vehicle based on the distance of each laser point.
Thanks to these lasers, the system is able to identify other vehicles, humans, roads, buildings, and to enhance safety on the road.
The reason why this solution isn’t adopted by Tesla is due to the fact that it is very expensive. Costs for implementing it are prohibitive, with single sensors costing thousands of dollars each. Instead, cameras only cost a few dollars each, thanks to their prevalence in smartphones and laptops.
Also, the camera-based approach is much easier to implement on the hardware side.
However, to obtain the same level of safety Tesla's system needs to rely on a complex computer system that can translate raw camera inputs and vehicle telematics into intelligence.
Below you can see Tesla's animation of how this works at different ranges for the Tesla Model 3.
At a foundational level, the computer can identify lane markings, signs, and other vehicles from a series of sequential static images that the video is made of.
Tesla's system is highly advanced as it is not only analysing images, but also individual pixels within the image.
“We take a pseudo-lidar approach where you basically predict the depth for every single pixel and you can cast out your pixels,” Karpathy said.
This technology can replicate much of the functionality of a traditional lidar system but requires a lot of real-time processing power for the image deconstructions to be of any use.
Vehicles must have a system that can make determinations or predictions based on an image instantaneously, as on- the-road situations can change in a blink of an eye. That is why Tesla built its own hardware for the third major version of its autonomous driving computer and it is purpose-built to run Tesla’s code.
According to Karpathy, achieving the functionality of lidar was fundamental for success, as it unlocks all of the software solutions that were built to utilise inputs from traditional lidar systems.
“You basically simulate lidar input that way, but it’s purely from vision. Then you can use a lot of techniques that have been developed for lidar processing to achieve 3D object detection.”
At WhichEV, we've recognised how Tesla’s so-called pseudo-lidar solution is getting better with software updates. Karpathy showed off a range of 3D maps of the world created by the system and they look very similar to the results coming from cutting-edge lidar solutions.
“If you give yourself lidar and how well you can do versus if you do not have lidar, but you just use vision techniques and psuedo-lidar approaches, the gap is quickly closing,” Karpathy said.