ALPHA DRIVE | CARLA Leaderboard Case Study
page-template-default,page,page-id-22547,page-child,parent-pageid-22395,ajax_fade,page_not_loaded,,select-theme-ver-3.5.2,wpb-js-composer js-comp-ver-5.0.1,vc_responsive

CARLA Leaderboard Case Study:

A Partnership for Third-Party Evaluation

Alpha Drive and the CARLA team partnered to create the first public simulation benchmark to further research.

The CARLA open-source simulator team and Alpha Drive partnered together to provide researchers an open platform for the community to perform fair and reproducible evaluations, simplifying the comparison between different approaches.

The CARLA team created the baselines for evaluation while the Alpha Drive platform provides cloud orchestration and benchmarking capabilities allowing developers to compare amongst themselves and the latest research in the field.


The CARLA Autonomous Driving Leaderboard is provided FREE for public participation through the additional help of sponsors:

Third Party Data Sets

The CARLA team defined a common validation and test set of environments and scenarios for evaluation.

For the CARLA Leaderboard, the CARLA team developed various traffic situations based on the NHTSA pre-crash typology.


Agents experience multiple instances of 10 traffic scenarios when evaluated. Some of the scenarios include lane merging, lane changing, negotiations at traffic intersections, coping with pedestrians, and other elements.

A full list of the developed scenarios that were developed can be found here.

Participants are given access to a base set of environments and scenarios created by the CARLA team.

When teams are ready, they may submit their agents for evaluation using the Alpha Drive platform.

When submitting to the leaderboard, teams will not have access to the final set of environments and scenarios and will only have access to their final scores in order to eliminate bias.

Metrics and Measurements for Autonomous Vehicle Evaluation

How do you determine driver risk when the driver is an AI?

Previously, human behavior has determined risk in all driving scenarios. Currently, algorithm behavior plays a more pivotal role in on-road risk and will continue as we move towards a more autonomous future.

Current standards for measuring autonomous vehicle safety rely on driver disengagements. This is not an accurate measure nor possible when the safety driver is taken out of the vehicle in a fully autonomous system.

Data has always informed risk.

The data we collect and how we collect it needs to shift.

The RAND Corporation has outlined ways to drive to safety and what would be necessary to create a framework for measuring automated vehicle safety. We aim to follow their framework.

The metrics and measurements of evaluating automated and autonomous systems is an area that still needs to be defined and refined. Multiple stakeholders have differing vantage points and data to help inform the evaluation.

Alpha Drive provides the platform to enable key stakeholders (i.e. regulators, insurers, or otherwise) to define metrics and measurements for evaluation. The platform gives the automated and autonomous system developers access to this data while in development. It creates a data feedback loop to refine the right metrics and measurements for evaluation as the technology matures and provides transparency as the technology is deployed to the market.

A Starting Point for Metrics and Measurements

The CARLA Leaderboard sets a public example for evaluation and comparison.

This is just a start and by publishing this data publically we hope to further engage with other stakeholders and help use their data to develop more advanced metrics and measurements.


The driving proficiency of an agent can be characterized by multiple metrics. For this leaderboard, the CARLA team selected a set of metrics that help understand different aspects of driving.


  • Driving score:  $\frac{1}{N}\sum_{i}^NR_{i}P_{i}$— The main metric of the leaderboard, serving as an aggregate of the average route completion and the number of traffic infractions. Here $N$ stands for the number of routes, $R_{i}$ is the percentage of completion of the ${i}-th$ route, and $P_{i}$ is the infraction penalty of the ${i}-th$ route.
  • Route completion: $\frac{1}{N}\sum_{i}^NR_{i}$ — Percentage of route distance completed by an agent, averaged across $N$ routes.
  • Infraction penalty:  $\prod_{j}^{ped.,…,stop}(p_i^j)^{\#infractions}$ — Aggregates the number of infractions triggered by an agent as a geometric series. Agents start with an ideal 1.0 base score, which is reduced by a penalty coefficient for every instance of these.
View the Official CARLA Leaderboard Live Results
View the Official CARLA Leaderboard Live Results


The CARLA leaderboard offers individual metrics for a series of infractions. Each of these has a penalty coefficient that will be applied every time it happens. Ordered by severity, the infractions are the following.

  • Collisions with pedestrians — $0.50$
  • Collisions with other vehicles — $0.60$
  • Collisions with static elements — $0.65$
  • Running a red light — $0.70$
  • Running a stop sign — $0.80$

Besides these, there is one additional infraction which has no coefficient, and instead affects the computation of route completion $(R_{i})$.

  • Off-road driving — If an agent drives off-road, that percentage of the route will not be considered towards the computation of the route completion score.

Additional Events

Some events will interrupt the simulation, preventing the agent to continue.

  • Route deviation — If an agent deviates more than $30$ meters from the assigned route.
  • Agent blocked — If an agent is blocked in traffic without taking any actions for $180$ simulation seconds.
  • Simulation timeout — If no client-server communication can be established in $60$ seconds.
  • Route timeout — This timeout is triggered if the simulation of a route takes more than $0.8*route_distance_in_meters$ seconds.

Let’s do more faster.