For decisions you can be confident in you need testing you trust and results you can understand.
With aidkit's augmentations, designed to match reality statistically and visually, you get more data that you can trust will test your model's limits in a way that carries over to the real world.
Once testing is done, our highly customizable reporting gets to the point while still allowing you to dig deeper where you need to. Check it out below or read our blog for a walkthrough.
This report presents testing results to enable evidence-based decisions about the next steps in machine learning perception model development and, in the end, validate its safety in the tested conditions. The testing applies neurocat augmentations to original data in order to test model performance in diverse real-world-like conditions that were underrepresented or absent in the original data distribution.
Dataset | Zenseact Open Dataset |
ODD tags | day, clear, city |
neurocat augmentations | rain streaks |
Model | DeepLabV3+ with ResNet50 backbone |
Model type | semantic segementation |
Class of interest | car |
Area of interest | lower third of image |
Metric | Threshold |
---|---|
Pixel accuracy in whole image | >70% |
True positive pixel accuracy for car class in lower third of image | >80% |
Probability image will trigger >20% pixel misclassification on >50% of augmentations | <10% |
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations | <10% |
REPORT INTRODUCTION
1
This section provides a quick overview of the number of augmentaton types, configurations, and frames as well as details on the original data and models and types of augmentations applied. The configuration tables provide the full parameters of the generated augmentations.
AUGM. TYPES
AUGM. CONFIGURATION
ORIGINAL FRAMES
AUGM. FRAMES
MODEL VERSIONS
50 variations, sampled uniformly
Sampled Parameter | Range |
---|---|
Fall Speed | 7.018.99 |
Particle Opacity | 1.01.19 |
Particle Size | 0.000.01 |
Precipitation Rate | 2.54249.50 |
Wind Speed Std | 0.221.97 |
Fixed Parameter | Value |
---|---|
Angle of View | 50 |
Ego Speed | 50 |
Height above Ground | 1.5 |
Horizon Height | 0.5 |
Shutter Speed | 0.0025 |
2
This section provides an overview of model performance for assessment by machine learning developers. If more than one augmentation has been selected model performance will be shown seperately for each augmentation.
Model Version: DeepLabV3+Resnet50
Evaluation Metric | Rain (rain streaks) |
---|---|
mIoU for pixels in entire image | 37.09% |
mIoU for pixels in lower third | 30.33% |
Pixel accuracy | 72.94% |
True positive pixel accuracy/rate lower third of class 'car' | 56.41% |
True positive pixel accuracy/rate lower third of class 'bus' | 98.47% |
True positive pixel accuracy/rate lower third of class 'person' | 80.56% |
3
This section provides overviews of the error patterns and rates in tabular and visualizer formats to aid the assessment of safety engineers and managers.
Error Pattern | Error on at least one augmentation | Error on more than 10% of the augmentations | Error on more than 25% of the augmentations | Error on more than 50% of the augmentations |
---|---|---|---|---|
More than 1% pixel misclassification | 100.00% | 100.00% | 100.00% | 100.00% |
More than 10% pixel misclassification | 100.00% | 99.09% | 97.27% | 90.91% |
More than 20% pixel misclassification | 96.36% | 92.73% | 83.64% | 70.00% |
The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.
Data subset | Zenseact Open Dataset (subset: day, clear, city) |
Model | DeepLabV3+Resnet50 |
Augmentation | Rain (Rain streaks) |
Number of frames | 110 |
Number of augmentations per frame | 50 |
Error Pattern | Error on at least one augmentation | Error on more than 10% of the augmentations | Error on more than 25% of the augmentations | Error on more than 50% of the augmentations |
---|---|---|---|---|
More than 1% vanishing of pixels of class 'car' in region 'lower third' | 92.73% | 92.73% | 92.73% | 90.91% |
More than 10% vanishing of pixels of class 'car' in region 'lower third' | 89.09% | 86.36% | 80.00% | 70.00% |
More than 20% vanishing of pixels of class 'car' in region 'lower third' | 86.36% | 75.45% | 70.00% | 60.91% |
The table above tabulates error rates for different error patterns across error prevalence classes. The values in the cells are empirical estimates that a randomly selected image in the input data set falls into the respective error prevalence class.
Data subset | Zenseact Open Dataset (subset: day, clear, city) |
Model | DeepLabV3+Resnet50 |
Augmentation | Rain (Rain streaks) |
Number of frames | 110 |
Number of augmentations per frame | 50 |
Inferences on augmented frames that trigger the selected error pattern in the area of interest (with original frames for comparison).
4
The above report provided information for data, machine learning, and safety experts to assess the robustness of the tested model against the selected data augmentations.
Given the thresholds selected when submitting the data for this test, the following headline observations can be made about the results. These key observations should be considered alongside the conclusions your experts reach based on the full data in the report.
Metric | Threshold | Result | |
---|---|---|---|
Pixel accuracy in whole image | >70% | 72.94% | Pass |
True positive pixel accuracy for car class in lower third of image | >80% | 56.41% | Fail |
Probability image will trigger >20% pixel misclassification on >50% of augmentations | <10% | 70.00% | Fail |
Probability image will trigger >20% vanishing pixels for car class in lower third of image on >50% of augmentations | <10% | 60.91% | Fail |
Driving safe perception
Don’t just manage risks. Mitigate your perception system’s chances of encountering unknown situations and enhance its resiliance.