neurocat Reports: Simple is better, readability counts

For decisions you can be confident in you need testing you trust and results you can understand.

With aidkit's augmentations, designed to match reality statistically and visually, you get more data that you can trust will test your model's limits in a way that carries over to the real world.

Once testing is done, our highly customizable reporting gets to the point while still allowing you to dig deeper where you need to. Check it out below or read our blog for a walkthrough.

neurocat-logo

aidkit Robustness Report

This report presents testing results to enable evidence-based decisions about the next steps in machine learning perception model development and, in the end, validate its safety in the tested conditions. The testing applies neurocat augmentations to original data in order to test model performance in diverse real-world-like conditions that were underrepresented or absent in the original data distribution.

Test details
Dataset Zenseact Open Dataset
ODD tags day, clear, city
neurocat augmentations rain streaks
Model DeepLabV3+ with ResNet50 backbone
Model type semantic segementation
Class of interest car
Area of interest lower third of image
Performance metrics and targets
Metric Threshold
Pixel accuracy in whole image >70%
True positive pixel accuracy for car class in lower third of image >80%
Probability image will trigger >20% pixel misclassification on >50% of augmentations <10%
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations <10%

REPORT INTRODUCTION

  1. Data and Augmentation Overviews
    1. Data Summary
    2. Rain (Rain Streaks) Configuration
  2. Model Performance
    1. Model results
  3. Safety Results
    1. Error Rate: Whole image
    2. Error Rate: Object/area of interest
    3. Inspect Inferences of Augmentations
  4. Report Conclusion

1

Data and Augmentation Overviews

This section provides a quick overview of the number of augmentaton types, configurations, and frames as well as details on the original data and models and types of augmentations applied. The configuration tables provide the full parameters of the generated augmentations.

Data Summary

AUGM. TYPES

AUGM. CONFIGURATION

ORIGINAL FRAMES

AUGM. FRAMES

MODEL VERSIONS

1

1

110

5500

1

Rain (Rain Streaks) Configuration

50 variations, sampled uniformly

Sampled Parameter Range
Fall Speed 7.018.99
Particle Opacity 1.01.19
Particle Size 0.000.01
Precipitation Rate 2.54249.50
Wind Speed Std 0.221.97
Fixed Parameter Value
Angle of View 50
Ego Speed 50
Height above Ground 1.5
Horizon Height 0.5
Shutter Speed 0.0025

Inspect Augmentations

2

Model Performance

This section provides an overview of model performance for assessment by machine learning developers. If more than one augmentation has been selected model performance will be shown seperately for each augmentation.

Model Version: DeepLabV3+Resnet50

Evaluation Metric Rain (rain streaks)
mIoU for pixels in entire image
37.09%
mIoU for pixels in lower third
30.33%
Pixel accuracy
72.94%
True positive pixel accuracy/rate lower third of class 'car'
56.41%
True positive pixel accuracy/rate lower third of class 'bus'
98.47%
True positive pixel accuracy/rate lower third of class 'person'
80.56%

3

Safety Results

This section provides overviews of the error patterns and rates in tabular and visualizer formats to aid the assessment of safety engineers and managers.

Error Rate: Whole image

Error Pattern
Error on at least one augmentation
Error on more than 10% of the augmentations
Error on more than 25% of the augmentations
Error on more than 50% of the augmentations
More than 1% pixel misclassification
100.00%
100.00%
100.00%
100.00%
More than 10% pixel misclassification
100.00%
99.09%
97.27%
90.91%
More than 20% pixel misclassification
96.36%
92.73%
83.64%
70.00%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset Zenseact Open Dataset (subset: day, clear, city)
Model DeepLabV3+Resnet50
Augmentation Rain (Rain streaks)
Number of frames 110
Number of augmentations per frame 50

Error Rate: Object and area of interest

Error Pattern
Error on at least one augmentation
Error on more than 10% of the augmentations
Error on more than 25% of the augmentations
Error on more than 50% of the augmentations
More than 1% vanishing of pixels of class 'car' in region 'lower third'
92.73%
92.73%
92.73%
90.91%
More than 10% vanishing of pixels of class 'car' in region 'lower third'
89.09%
86.36%
80.00%
70.00%
More than 20% vanishing of pixels of class 'car' in region 'lower third'
86.36%
75.45%
70.00%
60.91%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset Zenseact Open Dataset (subset: day, clear, city)
Model DeepLabV3+Resnet50
Augmentation Rain (Rain streaks)
Number of frames 110
Number of augmentations per frame 50

Inspect Inferences of Augmentations

Inferences on augmented frames that trigger the selected error pattern in the area of interest (with original frames for comparison).

4

Report Conclusion

The above report provided information for data, machine learning, and safety experts to assess the robustness of the tested model against the selected data augmentations. 

Given the thresholds selected when submitting the data for this test, the following headline observations can be made about the results. These key observations should be considered alongside the conclusions your experts reach based on the full data in the report.

Metric Threshold Result
Pixel accuracy in whole image >70% 72.94% Pass
True positive pixel accuracy for car class in lower third of image >80% 56.41% Fail
Probability image will trigger >20% pixel misclassification on >50% of augmentations <10% 70.00% Fail
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations <10% 60.91% Fail

Driving safe perception

Continue your journey with us

Don’t just manage risks. Mitigate your perception system’s chances of encountering unknown situations and enhance its resiliance.

© neurocat GmbH

Back to top Arrow