berdiba_neurocat_logos

aidkit Robustness Report

This report presents testing results to enable evidence-based decisions about the next steps in machine learning perception model development and, in the end, validate its safety in the tested conditions. The testing applies neurocat augmentations to original data in order to test model performance in diverse real-world-like conditions that were underrepresented or absent in the original data distribution.

Test details
Dataset RailSem19 - (license)
neurocat augmentations rain drops
Model rail_markings GitHub project (based on bisenetv2 algorithm) - (license)
Model type semantic segementation
Class of interest rail-raised, rail-track
Area of interest lower half of image
Performance metrics and targets
Metric Threshold
mIoU for pixels in entire image >70%
True positive pixel accuracy for rail-raised in lower half of image >80%
Probability image will trigger >20% pixel misclassification on >50% of augmentations <10%
Probability image will trigger >20% vanishing pixels for rail-raised class in lower half of image on >25% of augmentations <10%

REPORT INTRODUCTION

  1. Data and Augmentation Overviews
    1. Data Summary
    2. Rain (Rain Drops) Configuration
    3. Inspect Augmentations
  2. Model Performance
    1. Model results
  3. Safety Results
    1. Error Rate: Whole image
    2. Error Rate: Object/area of interest
  4. Report Conclusion

1

Data and Augmentation Overviews

This section provides a quick overview of the number of augmentaton types, configurations, and frames as well as details on the original data and models and types of augmentations applied. The configuration tables provide the full parameters of the generated augmentations.

Data Summary

AUGM. TYPES

AUGM. CONFIGURATION

ORIGINAL FRAMES

AUGM. FRAMES

MODEL VERSIONS

1

1

100

5000

1

Rain (Rain Drops) Configuration

50 variations, sampled uniformly

Sampled Parameter Range
Average Drop Size 2.083.99
Drop Density 6489570
Fixed Parameter Value
Angle of View 50
Distance To Wind Shield 100
Ego Speed 40

Inspect Augmentations

2

Model Performance

This section provides an overview of model performance for assessment by machine learning developers. If more than one augmentation has been selected model performance will be shown seperately for each augmentation.

Model Version: V1 (ID 5)

Evaluation Metric Adherent Rain Drops (Rain Drops)
mIoU for pixels in entire image
92.90%
mIoU for pixels in lower half
92.47%
Pixel accuracy
98.92%
True positive pixel accuracy/rate lower half of class 'rail-raised'
92.02%
False positive pixel accuracy rate lower half of class 'rail-raised'
99.55%
True positive pixel accuracy/rate lower half of class 'rail-track'
95.67%
False positive pixel accuracy/rate lower half of class 'rail-track'
99.35%

3

Safety Results

This section provides overviews of the error patterns and rates in tabular and visualizer formats to aid the assessment of safety engineers and managers.

Error Rate for Adherent Rain Drops Augmentation (Rain Drops)

Error Pattern
Error on at least one augmentation
Error on more than 10% of the augmentations
Error on more than 25% of the augmentations
Error on more than 50% of the augmentations
More than 1% pixel misclassification in region 'lower half'
98.00%
84.00%
64.00%
50.00%
More than 10% pixel misclassification in region 'lower half'
28.00%
8.00%
5.00%
1.00%
More than 20% pixel misclassification in region 'lower half'
5.00%
1.00%
1.00%
0.00%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset RailSem19-512x1024 (subset-100-images)
Model v1 (ID 5)
Augmentation Adherent Rain Drops Augmentation (Rain Drops)
Number of frames 100
Number of augmentations per frame 50

Error Rate: Object and area of interest

Error Pattern
Error on at least one augmentation
Error on more than 10% of the augmentations
Error on more than 25% of the augmentations
Error on more than 50% of the augmentations
More than 1% vanishing of pixels of class 'rail-raised' in region 'lower half'
100.00%
100.00%
100.00%
97.00%
More than 10% vanishing of pixels of class 'rail-raised' in region 'lower half'
92.00%
67.00%
47.00%
19.00%
More than 20% vanishing of pixels of class 'rail-raised' in region 'lower half'
63.00%
27.00%
12.00%
3.00%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset RailSem19-512x1024 (subset-100-images)
Model v1 (ID 5)
Augmentation Adherent Rain Drops Augmentation (Rain Drops)
Number of frames 100
Number of augmentations per frame 50

4

Report Conclusion

The above report provided information for data, machine learning, and safety experts to assess the robustness of the tested model against the selected data augmentations. 

Given the thresholds selected when submitting the data for this test, the following headline observations can be made about the results. These key observations should be considered alongside the conclusions your experts reach based on the full data in the report.

Metric Threshold Result
mIoU for pixels in entire image >70% 92.90% Pass
True positive pixel accuracy for rail-raised in lower half of image >80% 92.02% Pass
Probability image will trigger >20% pixel misclassification on >50% of augmentations <10% 0.00% Pass
Probability image will trigger >20% vanishing pixels for rail-raised class in lower half of image on >25% of augmentations <10% 12.00% Fail

© neurocat GmbH

Back to top Arrow