neurocat Reports: Simple is better, readability counts

For decisions you can be confident in you need testing you trust and results you can understand.

With aidkit's augmentations, designed to match reality statistically and visually, you get more data that you can trust will test your model's limits in a way that carries over to the real world.

Once testing is done, our highly customizable reporting gets to the point while still allowing you to dig deeper where you need to. Check it out below or read our blog for a walkthrough.

aidkit Robustness Report

This report presents testing results to enable evidence-based decisions about the next steps in machine learning perception model development and, in the end, validate its safety in the tested conditions. The testing applies neurocat augmentations to original data in order to test model performance in diverse real-world-like conditions that were underrepresented or absent in the original data distribution.

Test details

Dataset	Zenseact Open Dataset
ODD tags	day, clear, city
neurocat augmentations	rain streaks
Model	DeepLabV3+ with ResNet50 backbone
Model type	semantic segementation
Class of interest	car
Area of interest	lower third of image

Performance metrics and targets

Metric	Threshold
Pixel accuracy in whole image	>70%
True positive pixel accuracy for car class in lower third of image	>80%
Probability image will trigger >20% pixel misclassification on >50% of augmentations	<10%
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations	<10%

REPORT INTRODUCTION

Data and Augmentation Overviews
1. Data Summary
2. Rain (Rain Streaks) Configuration
Model Performance
1. Model results
Safety Results
1. Error Rate: Whole image
2. Error Rate: Object/area of interest
3. Inspect Inferences of Augmentations
Report Conclusion

1

Data and Augmentation Overviews

This section provides a quick overview of the number of augmentaton types, configurations, and frames as well as details on the original data and models and types of augmentations applied. The configuration tables provide the full parameters of the generated augmentations.

Data Summary

AUGM. TYPES

AUGM. CONFIGURATION

ORIGINAL FRAMES

AUGM. FRAMES

MODEL VERSIONS

1

110 5500

1 Rain (Rain Streaks) Configuration

50 variations, sampled uniformly

Sampled Parameter	Range
Fall Speed	7.018.99
Particle Opacity	1.01.19
Particle Size	0.000.01
Precipitation Rate	2.54249.50
Wind Speed Std	0.221.97

Fixed Parameter	Value
Angle of View	50
Ego Speed	50
Height above Ground	1.5
Horizon Height	0.5
Shutter Speed	0.0025

Inspect Augmentations

2

Model Performance

This section provides an overview of model performance for assessment by machine learning developers. If more than one augmentation has been selected model performance will be shown seperately for each augmentation.

Model Version: DeepLabV3+Resnet50

Evaluation Metric	Rain (rain streaks)
mIoU for pixels in entire image	37.09%
mIoU for pixels in lower third	30.33%
Pixel accuracy	72.94%
True positive pixel accuracy/rate lower third of class 'car'	56.41%
True positive pixel accuracy/rate lower third of class 'bus'	98.47%
True positive pixel accuracy/rate lower third of class 'person'	80.56%

3

Safety Results

This section provides overviews of the error patterns and rates in tabular and visualizer formats to aid the assessment of safety engineers and managers.

Error Rate: Whole image

Error Pattern	Error on at least one augmentation	Error on more than 10% of the augmentations	Error on more than 25% of the augmentations	Error on more than 50% of the augmentations
More than 1% pixel misclassification	100.00%	100.00%	100.00%	100.00%
More than 10% pixel misclassification	100.00%	99.09%	97.27%	90.91%
More than 20% pixel misclassification	96.36%	92.73%	83.64%	70.00%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset	Zenseact Open Dataset (subset: day, clear, city)
Model	DeepLabV3+Resnet50
Augmentation	Rain (Rain streaks)
Number of frames	110
Number of augmentations per frame	50

Error Rate: Object and area of interest

Error Pattern	Error on at least one augmentation	Error on more than 10% of the augmentations	Error on more than 25% of the augmentations	Error on more than 50% of the augmentations
More than 1% vanishing of pixels of class 'car' in region 'lower third'	92.73%	92.73%	92.73%	90.91%
More than 10% vanishing of pixels of class 'car' in region 'lower third'	89.09%	86.36%	80.00%	70.00%
More than 20% vanishing of pixels of class 'car' in region 'lower third'	86.36%	75.45%	70.00%	60.91%

The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.

Data subset	Zenseact Open Dataset (subset: day, clear, city)
Model	DeepLabV3+Resnet50
Augmentation	Rain (Rain streaks)
Number of frames	110
Number of augmentations per frame	50

Inspect Inferences of Augmentations

Inferences on augmented frames that trigger the selected error pattern in the area of interest (with original frames for comparison).

4

Report Conclusion

The above report provided information for data, machine learning, and safety experts to assess the robustness of the tested model against the selected data augmentations.

Given the thresholds selected when submitting the data for this test, the following headline observations can be made about the results. These key observations should be considered alongside the conclusions your experts reach based on the full data in the report.

Metric	Threshold	Result
Pixel accuracy in whole image	>70%	72.94%	Pass
True positive pixel accuracy for car class in lower third of image	>80%	56.41%	Fail
Probability image will trigger >20% pixel misclassification on >50% of augmentations	<10%	70.00%	Fail
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations	<10%	60.91%	Fail