What defenses mitigate power side-channel attacks on shared GPUs?

Shared GPUs in cloud and cluster environments can leak sensitive information through small variations in power draw. Evidence from academic work highlights the feasibility of extracting cryptographic keys and machine-learning model attributes from side channels, motivating a defense-in-depth approach to protect multi-tenant hardware.

Hardware and platform-level defenses

Hardware isolation reduces the basic opportunity for power side-channel exploitation. NVIDIA Corporation documents Multi-Instance GPU as a means to provide spatial separation of compute contexts, which can limit concurrent sharing of power domains. Physical partitioning, dedicated voltage regulators, and filtered power rails raise the bar for attackers who rely on fine-grained current or voltage measurements. Restricting access to on-board power and thermal telemetry in firmware prevents high-resolution monitoring that many attacks exploit. Cloud providers and HPC centers can also enforce strict scheduling so that sensitive workloads do not co-reside with untrusted tenants.

Software and algorithmic defenses

At the software level, constant-time kernels and algorithmic masking reduce signal dependence on secret-dependent control flow or data-dependent power consumption. Libraries used for cryptography and machine learning can adopt power-aware coding practices that avoid input-dependent branching and data-dependent memory access patterns. Introducing deliberate noise through randomized execution timing or dummy operations can obscure patterns, though care is required because poorly designed noise can be filtered out by statistical analysis. Limiting high-resolution access to performance counters and implementing rate-limiting for sensor reads narrows the information available to potential attackers.

Academic demonstrations by researchers such as Daniel Genkin Bar-Ilan University and Yuval Yarom University of Adelaide have shown practical side-channel extraction techniques that informed these mitigations and the need for coordinated vendor and operator responses. The consequences of insufficient defenses include intellectual property loss, cross-tenant data leakage, and broader trust erosion in shared-cloud models, which are especially consequential for research institutions and small enterprises that rely on multi-tenant GPU access.

A layered strategy combining vendor-level hardware controls, cloud scheduling policies, and secure software design is the most reliable approach. Ongoing collaboration among hardware vendors, cloud operators, and academic security researchers is essential to adapt defenses as attackers refine statistical and signal-processing methods.