With DriveMod, industrial organizations can convert their existing vehicles into a fully autonomous fleet. Here, we've embedded our self-driving...
Three Problems Autonomous Vehicle Researchers are Still Trying to Solve
Autonomy is progressing at an impressive rate, but we are still at the start. There are still numerous challenges in the AV sector. This article looks at the three main problems that AV researchers are still trying to solve.
The autonomous industry is progressing at an impressive rate, but we are still at the beginning: fully autonomous vehicles running safely on all public roads is several years away. There are still numerous challenges in the autonomous vehicle sector that researchers are trying to solve. These challenges span many areas, from in-vehicle technology to wider infrastructure communication. Cyngn specifically focuses on industrial use cases where autonomous vehicle deployments are much less complex. While fully autonomous operations on public roads are a few years away, it is important to note that industrial autonomous deployments are successfully being executed today.
Our VP of Engineering, Biao Ma, has identified three main problems that autonomous vehicle researchers are still trying to solve: occlusion, prediction, and fleet planning and control.
Just as humans require a clean line of sight to see objects, so too do sensors. Occlusion, then, occurs when an object prevents an autonomous vehicle from seeing other objects behind it.
An example of this is when you are driving on the road and there is a cyclist across the street. Let’s say, another car turns onto the road in front of you and blocks your view of the cyclist. This is occlusion, and it’s a perception problem that can occur with all combinations of objects blocking other objects. In some cases, the objects are moving. In other cases, the objects can be static like a light post, a tree, or even a building. Other forms of occlusion can occur when moving objects block each other, which often occurs in crowds.
According to a 2019 study, combating occlusion is a challenging task because “the frequency and variation of occlusion in the automotive environment is vast and can also be impacted by cultural and environmental factors.”
Like humans, most of our autonomous, sensory technology cannot see through these static or moving objects. Sensors emit beams that hit objects in the environment, which then travel back to the source to create 3D images of its surroundings. However, most sensors are bounded by traveling light. This means that when an object is blocking a second object, AV sensors are blind to what is behind this middle object. Not only do the beams lack the capacity to reach the target object, but occlusion can also result in lost data from the sensor.
This poses a problem in scenarios where it is crucial that autonomous vehicles have the capability to intuit the presence of an obstacle that isn’t necessarily visible.
An example of this is if the car in front of you slows to a stop to let a pedestrian pass. In most cases, you can intuit, without necessarily seeing the pedestrian ahead, why the car might have stopped. This intuition also tells you another important thing: you can’t “solve the obstacle” by whipping around the stopped car and plowing ahead.You might, after all, run the pedestrian over.
While there is some technology that is currently being developed to give autonomous vehicles a certain level of penetration beyond the next car. So far these technologies lack accuracy and produce minimal information.
What are researchers doing to solve the problem of occlusion?
According to Ma, there are two branches of exploration to try to push the technology forward: how AVs (1) see objects and (2) anticipate and respond without seeing a given object.
The first is about how autonomous vehicles see beyond the line of sight. Light travels in a straight line, but there are hidden objects that need to be seen to complete the entire picture. One possible solution is what’s known as collaborative perception. Think of it as a “crowd of eyes” on the street. This is essentially crowd-sourcing perception in an environment, such that vehicles, sensors, or cameras on the road can share information with each other. This ensures that each vehicle receives a complete picture of the surrounding space.
The second branch of exploration is about training a vehicle to develop an intuition about things that it can’t see. Humans naturally anticipate and respond in these scenarios because we possess intuition. For instance, if you are driving through a snowstorm, we know to slow down because while we may not be able to see it, we can imagine that black ice on the road could send our car spinning.
We can also anticipate that other cars will be driving unpredictably due to the weather and respond accordingly. Researchers are working to provide autonomous vehicles with this same sort of intuition.
The problem, Ma says, is that “there’s always a constant trade-off of responsiveness and fragility, and it is very difficult to achieve both.” By responsiveness, Ma means that vehicles should be able to quickly respond to potential danger or other objects. On the other hand, the system can’t be too fragile. If it is, then the system will be constantly responding to objects that may not be relevant, making it take forever for passengers to ever reach their destination. The system must properly balance the two aspects, which is a challenging task.
The second major obstacle is prediction. Prediction is the ability to predict the trajectory (the location and speed) of a target object in the near future. Prediction allows an autonomous vehicle to look at moving objects and guess where they will be within about three to five seconds. Typically, prediction systems will generate several possible outcomes, which are each individually assigned a likelihood of actually happening.
While driving, humans naturally make predictions all the time. In particular, humans know to take greater care around young drivers because they tend to be less predictable on the roads. Autonomous vehicles need to be able to take in this same information regarding its surroundings, run it through AI processing, and use this data to inform its decision-making and risk assessment.
There are three key technical factors for prediction: (1) the semantic, (2) physical limitations, and (3) relevancy prediction.
The first element informing prediction is the semantic. When people are first learning to drive, a major part of their study are all the symbols they’ll encounter on the road. What does a double yellow line mean? What does a blinking red light mean? Ma defines semantic as these symbols, or the meaning bounded by the environment that you are driving in.
Officially, this concept is defined as “a particular section of the driving environment having a common role that is bounded by either the traffic, social convention, or a specific area of the targeted driving space.” The semantic is how we can drive on a two-directional highway and not panic when a relatively close vehicle passes by in the other direction. This is because the dotted white line is telling us that the other vehicle is in its correct lane, heading in the correct direction.
The second kind of information that informs predictions are the physical limitations of each object. For example, it would be hard to run over a little boy that’s a half a block away, no matter how quickly he was running toward you. By contrast, a car heading in your direction a half a block away could very well crash into you. An autonomous driving system that’s good at making predictions will be good at differentiating between the capabilities of these two objects.
The third category consists of three different layers: (a) relevancy prediction, (b) predictions based on trajectory, and (c) whether there’s context around the object that requires additional caution.
First, relevancy prediction is about choosing which objects around you actually matter. But, how do you differentiate or predict what is relevant versus what is not? An example of this is when you are at an intersection, waiting to turn right so that you can get to the gas station that is on that upcoming street. If the car directly across from you is turning left onto the same street to also reach the gas station, we are aware that this car is relevant.
We, therefore, know we have to wait for them to turn first, so that we don’t hit them by turning onto the same road, at the same time. However, if we are at this same intersection and the car directly across is turning right (instead of left), we know that this car is now irrelevant.
The second layer is being able to decipher whether this object will be important for you to alter your trajectory. Is the object moving towards you in a way that could be relevant? Predictions based on trajectory use the current motion of the target object in combination with the current velocity of the object. For instance, given the previous example above, the driver’s signal may be on and signaling that they’re turning right (therefore irrelevant), but what if they change their mind and start turning towards your intended lane instead? While previously irrelevant, the object moving towards you is now relevant, presenting a scenario that a system must be able to predict.
Finally, the third layer, context considers whether there are other implications of the object by nature of its classification that forces you to respond. Consider a soccer ball that rolls in front of your car. When this happens, we have the intelligence to anticipate that there might be a little kid that follows. By knowing the context surrounding this object, we will respond by taking greater caution when approaching the ball in the road.
Ma explains how there are learning-based methods known as predictors that allow us to execute these three layers. Predictors are either algorithm-based, optimization-based, or rule-based, and help the system to predict relevancy, future trajectory, and context of a given object. There are currently learning-based methods that are working to provide predictions, along with algorithms that are based on the semantic and emotion of prediction.
“Different prediction mechanisms require the upstream systems to align and require the downstream system to use the information of the predicted trajectory,” says Ma.
Researchers study if these algorithms are getting better at prediction by comparing what really happened to what your system expected, and seeing how close these two are together. While there are significant improvements in the area of predictions, they are still in the early stages of advancement.
What are researchers doing to help improve autonomous vehicles prediction?
There are three directions that are occurring in the industry: (1) reducing the need for input, (2) improving the location of the input and output, and (3) bettering communication.
First, researchers are working to reduce the need for input. Current prediction systems are designed in a way that the prediction aspect has to be provided as input. This means the system requires knowledge of the semantic or other classifications in order to predict what another object will do. Researchers are creating new computational methods for prediction so that a system is not limited by the requirement of the input and instead, an upstream system tells the AV that the object is a pedestrian, or cyclist, for example — and that’s enough. These new methods that are being developed will make the vehicle more efficient when it comes to difficult prediction scenarios, as well as help combat corner cases that consist of unexpected behavior from other objects on the road.
The second is to improve the confidence and granularity of prediction by optimizing the input and output of information. Researchers know that if autonomous vehicles were better at making more behavior-level predictions, the vehicle would drive better. Consider a scenario where you’re trying to change lanes on a busy freeway. Here, humans know to look for a driver to wave them over. Autonomous vehicles do not understand this the same way, so researchers are trying to develop technology that will better allow AV’s to interpret these types of human-to-human communication. This will help vehicles interpret higher-level information, leading to a better understanding of what other drivers or pedestrians are trying to do.
The third direction is communication, which Ma argues is the best form of prediction. Being able to predict is efficient, but having the actual subject tell you what it is going to do is even better. By looking at the holistic view of the entire prediction stack, instead of viewing an autonomous vehicle as one subsystem, we can see that many systems can grow together. Working together allows for better communication and in turn, prediction.
3. Fleet Management
The final challenge is the coordination of a networked vehicle fleet. The first phase of autonomous vehicle development was getting individual vehicles to drive themselves; this includes developing an initial system and set of sensors. The second phase of development has been about getting systems of vehicles to coordinate and communicate with each other.
What are researchers doing to help improve AV coordination?
Algorithms are currently being processed and developed to improve fleet coordination. According to an MIT article, these various communication algorithms allow AVs to see beyond their own line of sight and improve observability and environmental awareness. Future trajectories can also be shared to improve prediction and motion coordination algorithms that can be utilized to “guarantee that decisions are jointly feasible.” This kind of communication and coordination will allow a fleet of autonomous vehicles to predict dynamic alterations in the surrounding environment. Ma says, “instead of each of the vehicles needing to perceive, track, and predict what they will do, algorithms, in a centralized and far more efficient way, will provide this information to everyone.”
The industry is continually trying to solve these three problems: occlusion, prediction, and fleet management. Solving these challenges will advance autonomy on public streets and enable autonomous vehicle technology in more complex environments. While organizations may think it will be years before this comes to their setting, Cyngn’s industrial AV technology is already working. Cyngn’s end-to-end, fully autonomous vehicle technology is available now.
Interested in learning more about how this technology can be implemented in your own environment? You can start your autonomy journey by visiting https://www.cyngn.com.
Podcast Episode Transcript:
The transcript of this conversation has been edited for clarity.
Luke Renner: This is Advanced Autonomy. I'm Luke Renner. You know, one of my favorite things about working in this space is that we are at the very, very beginning. And there are still tons of problems in the autonomous vehicle sector that we're trying to solve. These problems span all areas of in-vehicle technology to wider infrastructure communication.
My guest today is the VP of Engineering and the Head of Autonomy at Cyngn, a self-driving industrial vehicle company that we both work for. In this conversation, he's going to give us an engineer's insider look into the three main problems that autonomy has yet to solve: occlusion (which is seeing beyond blind spots), prediction, and the coordination of a network vehicle fleet.
Hi, Biao, welcome back.
Biao Ma: Hey, Luke. Happy to be back.
Luke Renner: So, let's dive into those. The first one you wanted to talk about today was occlusion. What is occlusion?
Biao Ma: Occlusion is there is a third object between you and the target object you're trying to perceive. You have one object you're trying to perceive but for some reason, there is a different object in front of you, blocking the first object. We call this situation occlusion and this is a problem that has a technical impact on a series of subsystems in autonomy.
Luke Renner: So occlusion is a middle object, blocking the view of the driver or the autonomous vehicle and preventing it from seeing another object?
Biao Ma: Yes. The middle object could be a moving object. It could be a static object, could be a vehicle, a tree, or a building.
Luke Renner: And this is a problem because just like humans can't see through trees, a lot of our sensory technology also can't travel through objects.
Biao Ma: Yes. So most of the sensors are bounded by light traveling straight. Certain technologies are being developed to have a certain level of penetration but not so far.
Luke Renner: Got it. So, I know that occlusion is not only about making it easier to see, it's also about making it easier to understand when you can't see. What can you tell me about that?
Biao Ma: That's true. There are two sides to this. The second angle of that is in certain scenarios, you should have that capability to anticipate or expect that something might happen or that some object might be there. So, this ability to anticipate should be built into your system.
Luke Renner: Okay. So this is clear. So what are researchers doing to try to solve the problem of occlusion?
Biao Ma: There are two branches of exploration to try to push the technology forward.
Luke Renner: Okay.
Biao Ma: One is really about, how do you see beyond the line of sight? Basically, yes, we know, light travels straight. And there is not really a good way to change that. But some messages and systems could be explored to see beyond that. For example, you could have collaborative perception. You have fleet or crowd perception points that could contribute to better perception.
Luke Renner: So, you're talking about using the other cars on the road to see for the other cars on the road?
Biao Ma: They don't have to be cars, right? It could be a different perception unit providing that to you. Maybe in the corner of the street, which you know, is be a school area, you’d have an intelligent light pole, sending perception, either info or objects or raw data, to you, right?
So that really gives you not only one eye, but also a crowd of eyes, looking at the street.
Luke Renner: So, that's one way. You said there were two ways that researchers are trying to solve the occlusion problem. What is the other way?
Biao Ma: So the second is one going in a different direction. The first is about how do you see objects. The second is about even though you can’t see a particular object, can you be smarter to anticipate that there will be danger or objects that you need to respond to?
Luke Renner: So, the second branch is training the vehicle to develop an intuition about things that it can't see?
Biao Ma: Yep.
Luke Renner: And humans do this naturally, right? Like, around a schoolyard, we slow down. When we're driving over a mountain pass, we behave differently because a deer could jump in front of the road at any moment. And so it's about kind of giving the autonomous vehicle some of that intuition and some of that intelligence.
Biao Ma: Exactly but there's always a constant trade-off of responsiveness and stability, right? So, you can't really achieve both. You want to make sure your system has a certain level of responsiveness. Just as you described, a car could come up out of nowhere, and you need to be able to quickly respond to that. On the other hand, you cannot be fragile. You cannot have a driving system that is constantly changing because of things that may or may not be relevant to you, right?
Luke Renner: You can’t be a scaredy-cat driver?
Biao Ma: Exactly. You can’t be a scaredy-cat.
Luke Renner: Okay, so this is really interesting. Let's transition to the second major obstacle: prediction. So before we get into why predictions are so difficult, I'd like you to actually define what you mean when you say prediction.
Biao Ma: Prediction is about predicting the trajectory of a target object in the coming frames — typically three to five seconds. By trajectory, I mean the location and the speed of the perceived object.
Luke Renner: So, it's about looking at objects that are moving and guessing where they're going to be in a few seconds later, is that right?
Biao Ma: Yes. But it could be one or many trajectories that get predicted.
Luke Renner: So making predictions, not only includes where the object may be headed but also a range of possibilities of where the object could head. Human drivers know that children are far less predictable than adults. So, driving by a kid usually necessitates greater care. So, in a situation like that, presumably, the AV takes in the information of its surroundings, runs it through AI processing, and uses all of this to make a decision.
So, can you give us a little bit of insight into how this is actually possible, you know? Show us under the hood.
Biao Ma: There are three key technical factors for prediction. One is really about the semantic. We can think of semantic as the meaning bounded by the environment or a segmented environment that you're driving. For example, this lane is left turn only; this lane is straight only.
So basically, you can think of a particular section of the driving environment having a common role that is bounded by either the traffic or social convention or a specific area of the targeted driving space.
Luke Renner: So, it's the context. It's the way we can drive on a two-lane highway and not freak out when a car passes us.
Biao Ma: Yeah, exactly. So you have a two-directional highway, even though the upcoming vehicle is close to you and relatively fast, the vehicle or the system shouldn't freak out. Because, you know, the semantic is saying that that lane over there is heading in the opposite direction and this is expected behavior.
Luke Renner: Alright, so that was the first one. And you said there were a couple of others. What are they?
Biao Ma: The second is that objects are, at the end of the day, bounded by physical limitations. Sometimes we say there's no Superman, right? Okay, so the kid may be highly unpredictable in terms of movement but the kid can’t fly.
Luke Renner: So the system can differentiate between a car that could go 60 miles an hour to run you over versus a child who cannot?
Biao Ma: Exactly. Exactly.
Luke Renner:Okay. And then what's the third?
Biao Ma: The third thing is under the hood and there are at least three layers in this category. The first one is relevancy prediction. So, relevancy prediction is there are many objects around me, right? So, a lot of times half these nearby objects are not important nor relevant at all. The question becomes, how do you differentiate, how do you predict that this is or is not relevant? For example, if you are taking a right turn at the intersection, and there’s a car across from you also taking a right turn, both of you are turning right but you aren’t relevant to that driver and that vehicle is not relevant to you.
So that’s the first layer. The second layer is predictions based on trajectory, using the current motion of the target object with the current velocity of the object. So, the question is will this object be important for you to alter your trajectory?
The third is really about whether there is a significant crosspoint of your trajectory that forces you to respond. Or are there other implications of the object by the nature of its classification or some implication related to the place you are traveling at?
There is a little ball in front of you. Do we have the intelligence to anticipate that maybe there is also a little boy because this is a school area?
So that is, again, this is a topic to be further explored and implemented.
Luke Renner: So, just to make sure I understand. So the first thing is, there's a lot of objects in the space. So the first filter is the object anywhere near us and relevant? The second filter is is the object moving toward us in a way that could be relevant? And the third is, is there some context around the object that might make it behave in a way that requires additional caution? Are those the three filters?
BIAO MA: Simply speaking, yes. And in terms of methods to do this, there are learning-based methods, components of which we call predictors or evaluators. Predictors can be specific algorithm-based, optimization-based, or rule-based so finding different and better ways to do this is really an interesting area for further development.
Luke Renner:Got it. So how good are we at predictions now?
Biao Ma: A simple answer to that is there are significant advances in this area but it is not good enough yet. What I mean by that is there are mechanisms regarding the semantic layer of the high definition map. For example, learning-based methods are trying to provide prediction with such input and there are algorithms implemented based on the semantic and emotion of prediction.
So, different prediction mechanisms require the upstream systems to align and require the downstream system to use the information of the predicted trajectory.
Luke Renner: Yeah, so, I'm wondering like, how are the researchers going to know that their algorithms are actually getting better at prediction?
Biao Ma: Think about, if you roll the clock back five seconds in your data, using your tool or your infrastructure. By doing that, you actually get a ground truth of the trajectory that the object will travel in the next few seconds in the future. So, this ground truth can give you a good way to evaluate how good you are at predicting that. You have one ground truth trajectory and you have one predicted trajectory, right? So, that gives you a ground truth.
Luke Renner: You compare what really happened to what your system expected and see how close those two are together.
Biao Ma: Yes.
So, what are researchers doing to help their autonomous vehicles get better at making predictions?
Biao Ma: There are at least three directions that I can see in the industry. Number one is to reduce the need for input. What I mean by that is, some prediction systems require the knowledge of the semantic or require certain classification provided to it to predict what the object will do. The upstream system needs to tell the AV that this is a pedestrian, this is a motorcycle so that assumption could be reduced so prediction could be smarter in a simple way.
The second direction is about the location of the input and output. How could prediction improve, given higher confidence or higher granularity? And not only improve trajectory prediction but also get better at predicting higher-level information such as what is the vehicle trying to do.
For example, what if we knew beyond which direction the vehicle was traveling? What if we could also predict that this vehicle is going to take a left turn or is trying to change lanes, right? So more behavior-level predictions can be used to help the vehicle drive better, right? So not only the vehicle's direction but also at a higher level. That's the second.
The third is about technical direction wise, the best prediction is communication. No matter how good you are at predicting what I will do, the best way is for me to tell you what I'll do.
So, actually, if we stop thinking of prediction as one subsystem, if we look at the holistic view of the whole stack, some modules or subsystems can grow together.
Luke Renner: This is all very fascinating. So, we've covered occlusion, we've covered prediction. Let's talk about the final challenge, which is fleet planning and control. So, my question for you is what is fleet planning and control and what makes it different from regular autonomous vehicle development?
Biao Ma: The initial phase of autonomous vehicle development has really been about a challenge. Can we make a car drive itself, right? That's the whole idea behind the initial development and the initial bring-up of autonomy, its settings, and its initial set of sensors — to get a proof of concept.
The second phase is really about how to scale up.
Algorithms are being proposed and developed to consider the coordination of a fleet. Instead of each of the vehicles needing to perceive and track and predict what they will do, algorithms, in a centralized way or in a far more efficient way, will provide this to the fleet. So, that is the new opportunity for getting things done more effectively.
Luke Renner: So, last time you were here, I asked you to make predictions. And I'm going to ask you to do it again. So of the three things that we talked about, occlusion, prediction, and fleet management, which of these problems do you think will be solved first?
Biao Ma: I don't think they will be solved one after the other. Solving occlusion, prediction, and fleet management actually requires the next generation of the autonomous vehicle stack. Each of the subsystems will need to work together. So, I do think there will be stuff down the road, not to mention a new generation of designs and methods.
Luke Renner: And how long do you think it's gonna take?
Biao Ma: I'm optimistic. So, I do think in the next three to five years, there will be significant improvement coming to occlusion prediction, large-scale decision finding, and control.
Luke Renner: Alright, Biao. I appreciate the time. Thanks so much. It was interesting.
Biao Ma: Great. Yeah. Very happy to be here.