What benchmarks effectively measure robustness of robot perception under adversarial lighting?

Robust robot perception under adversarial lighting must be measured with benchmarks that capture both algorithmic sensitivity and real-world sensor effects. Effective benchmarks combine standardized corruption suites, perturbation-based tests, and physical-world trials so that performance metrics reflect safety-critical consequences.

Benchmark metrics and corruption suites

The ImageNet-C corruption benchmark and its associated metrics such as mean Corruption Error (mCE) and relative mCE quantify how classification accuracy degrades under systematic photometric changes. Thomas G. Dietterich Oregon State University coauthored work that popularized corruption-based evaluation for computer vision, showing how controlled severity levels (brightness, contrast, fog, glare) produce reproducible degradation curves. Complementary perturbation tests like ImageNet-P measure stability over temporal perturbations using flip rates and robustness-to-perturbation statistics, revealing brittleness that single-shot accuracy hides. For detection systems, the analogous robustness signal is the relative drop in mean Average Precision (mAP) when standard corruptions are applied to datasets such as COCO; reporting mAP under corruptions makes safety-relevant tradeoffs visible for downstream planning and control.

Physical-world lighting tests

Laboratory and field tests emulate sensor-level phenomena not captured by pixel-space corruptions: non-uniform illumination, spectral shifts from different lamps, specular reflections, and dynamic shadows. Alexey Kurakin Google Brain and Ian Goodfellow Google Brain demonstrated that adversarial patterns survive printing and re-photographing under varied lighting, motivating physical adversarial attacks as a distinct evaluation axis. Robustness benchmarks therefore include staged drives or object captures under controlled illuminants and varied exposure settings, plus camera-level measures such as signal-to-noise ratio and saturation frequency, because perception failures often stem from sensor clipping and auto-exposure behavior rather than model weights alone.

Relevance, causes, and consequences intersect: poor robustness under lighting increases false negatives for pedestrians and misclassification of signs, with direct safety and legal impacts in urban and rural territories. Cultural and infrastructural differences—for example, regions with limited street lighting or different vehicle headlamp standards—change the distribution of lighting corruptions robots face. Benchmarks that combine standardized corruption metrics, perturbation stability tests, and physical-world trials give the most actionable measure of robustness, enabling comparison across models, sensors, and deployment contexts and informing mitigation such as sensor fusion, exposure control, and training with realistic photometric augmentation.