How do randomized numerical linear algebra methods scale for kernel matrices?

Kernel matrices are dense n-by-n objects whose exact eigen-decomposition costs O(n^3) time and O(n^2) memory, which makes direct methods impractical for large datasets. Randomized numerical linear algebra offers practical approximations that trade controlled accuracy for much lower resource use. Nyström method, random Fourier features, and randomized low-rank SVD are the main approaches practitioners use to scale kernels while preserving predictive performance.

How scaling is achieved in practice

The Nyström method approximates the full kernel by sampling m landmark columns and forming a low-rank surrogate; this reduces storage to O(n m) and computation to roughly O(n m^2 + m^3) for naive implementations. Christopher K. I. Williams at University of Cambridge showed how Nyström-style approximations enable Gaussian process inference on larger problems, motivating wide adoption in kernel-based modeling. Random Fourier features transform inputs into m-dimensional random feature space so that inner products approximate the kernel, giving linear-in-n cost O(n m) to form features and O(m^3) or O(m^2) for downstream parameter estimates depending on solver choice. Joel A. Tropp at California Institute of Technology has developed theoretical tools underpinning randomized matrix approximations that explain why these algorithms succeed with high probability and provide bounds on required m for a target error.

Accuracy, causes, and consequences

Accuracy depends on the effective rank or spectral decay of the kernel matrix: when eigenvalues fall rapidly, small m suffices and randomized methods deliver near-exact results; when decay is slow, approximations require larger m and may degrade predictions. Leverage-score sampling and adaptive landmark selection can reduce m by focusing budget on informative directions, at the cost of extra preprocessing. Bernhard Schölkopf at Max Planck Institute for Intelligent Systems emphasizes that kernel structure and data geometry determine how well low-rank approximations capture relevant variability.

The practical consequence is that randomized methods convert an infeasible cubic problem into one that scales nearly linearly with n for fixed m, enabling large-scale Gaussian processes, kernel ridge regression, and spectral clustering on millions of points. There are trade-offs: approximation error affects uncertainty quantification in scientific and medical applications, and the choice of m influences computational energy use and fairness of downstream decisions. Contextual factors such as data modality, noise level, and territorial availability of compute resources shape how aggressive practitioners can be in reducing m while maintaining trustworthy results.