Making the Discrete Continuous: Synthetic RAW Augmentations for Low-Light Person Detection

TL;DR — Real-world vision datasets are long-tailed: there are very few labeled pedestrians in very dark scenes, so we cannot properly evaluate how a detector behaves there. We use a physics-based RAW image augmentation that matches the camera sensor's noise model to synthesize low-light samples on demand, turning scene illumination from a sparse, discrete variable into a continuous, controllable one. On the safety-critical case of pedestrian detection for autonomous driving, real and synthetic low-light data produce statistically indistinguishable detection metrics — but only when the noise model is respected. This is a collaboration between the University of Glasgow, Dotphoton, and Brickroad, supported by the UKRI EPSRC Centre for Doctoral Training in Applied Photonics.

The problem: you can't evaluate what you can't sample

Modern vision models are good enough to underpin autonomous agents like self-driving cars. But they are both empowered and limited by the real data used to train and test them. Long-tailed, unbalanced datasets and out-of-distribution cases become genuine hazards when a system interacts with humans — a missed pedestrian can cause an accident.

Two object characteristics dominate detector behavior: mean illumination and size. Low light shrinks the dynamic range inside a bounding box; small objects approach the spatial resolution limit. The trouble is that scenes that are both dark and small are exactly the ones that are rare in real datasets — sometimes rare enough to be out-of-distribution. In the figure above, fewer than 100 person instances in the test set have an average of less than 100 electrons inside the bounding box. With so few samples, any per-illumination performance estimate in the dark is statistically meaningless.

Real data is discrete and limited, shaped by an expensive collection process bounded by the scenes we can physically capture. Synthetic data flips this: if we own the generation pipeline, we can deliberately sample as many low-light scenes as we need to balance the distribution and evaluate continuously across illumination.

The method: a physics-based RAW light-level augmentation

The study uses AODRaw, a state-of-the-art RAW object-detection dataset and model, as the test case — 2,260 high-resolution images captured with a Sony A7M4 sensor, containing 4,690 unique people in traffic scenes with rain, fog, and darkness.

RAW images are the right substrate. Unlike sRGB outputs from a camera's image signal processor, unprocessed RAW preserves a larger dynamic range, the sensor's linearity, and the independence of pixel noise. Rather than rendering scenes from scratch (which rarely models CMOS sensor noise correctly), the work applies an intensity-reduction algorithm directly to real RAW images as a light-level augmentation, then re-digitizes with a full Poisson-Gaussian noise model that matches the sensor. The result: from one bright real sample, we can generate a darker synthetic sample with the same area and a faithful noise model.

What the evaluation reveals

Real data hides the failure mode

Evaluated on real data alone, person-detection metrics look almost constant across the available illumination range — seemingly confirming the detector is robust to lighting. But this "robustness" is an artifact of the data: the low-light bins are nearly empty, and the sparse distribution forces coarse, logarithmic bins that cannot resolve what is actually happening in the dark.

Person detection performance on real data as a function of instance illumination. The metrics look flat — but only because there is almost no data at low light levels.

Synthetic data exposes it

Repeating the analysis on synthetic samples — uniformly spaced light-level targets across the low-light range, with a fixed number of samples per point — tells a very different story. The detector does fail under very low illumination when gain/ISO are not adjusted. No detections were made below 3.5 electrons inside the bounding box, driving average precision to zero, and mAP falls from ~30% to 10% or less between 1000 and 10 electrons.

Person detection performance on synthetic data. With dense, controllable sampling, the real failure mode at low light becomes clearly measurable.

Can the model tell real from synthetic?

The strongest validity test: does the detector perceive synthetic dark samples the same way it perceives real ones? Pairing real bright instances with real dark instances of matched area, then synthesizing dark counterparts that match the real dark instances' illumination, the experiment compares detection performance on real versus synthetic low-light data over three runs of 500 paired points.

Real versus synthetic low-light detection performance, comparing the noise-aware RAW augmentation against a naive intensity reduction. Overlapping error bars indicate the model cannot distinguish them.

Overlapping error bars show that real and synthetic data are perceived similarly — and the noise-aware augmentation consistently beats a naive pixel-intensity reduction, landing much closer to real-data metrics. Several metrics (mAP, AP₇₅, AP₆₀) are only statistically indistinguishable from real data when the noise model is respected. At very high IoU thresholds (AP₈₀) a gap remains, which may reflect the model distinguishing the two — or, just as plausibly, that label quality in real dark scenes is worse than in the synthetic samples, which inherit clean labels from their bright sources.

Takeaway

Synthetic data, generated from real RAW images with a faithful sensor noise model, turns a limited discrete variable (scene illumination) into a continuous, controllable one. That lets us characterize safety-critical performance — pedestrian detection in the dark — with a level of detail the real data simply cannot support, while producing samples that are, by most detection metrics, indistinguishable from reality.

Paper and authors

Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light. Accepted as a non-archival paper at the CVPR 2026 AUTOPILOT Workshop (Autonomous Understanding Through Open-world Perception and Integrated Language Models for On-road Tasks). Available as an arXiv preprint (arXiv:2605.22455).

Valeria Pais, Malena Mendilaharzu, Daniele Faccio, Luis Oala, Christoph Clausen, and Bruno Sanguinetti.

Affiliations: University of Glasgow, Dotphoton, and Brickroad. Supported by the UKRI EPSRC Centre for Doctoral Training in Applied Photonics [EP/S022821/1].