Uncertainty Estimation and Generalization Bounds for Modern Deep Learning
Advances in Function-Space Variational Inference, Linearized Laplace Approximation, Deep Ensembles, and Chernoff-Based Generalization Bounds
the little light that was already shining
while these pages were slowly aligning.
Abstract
This thesis investigates how Bayesian principles can deepen our understanding of modern deep learning systems. While neural networks achieve remarkable predictive performance, their ability to generalize and to quantify uncertainty remains only partly understood. This thesis approaches this challenge from both methodological and theoretical angles: unifying Bayesian inference, function-space modeling, and large-deviation theory under a common probabilistic perspective.
On the methodological side, the thesis introduces the Deep Variational Implicit Process (DVIP), a scalable Bayesian framework that extends implicit processes to deep architectures. DVIP models distributions over functions that are easy to sample from but lack explicit densities, enabling expressive, non-Gaussian priors and efficient variational inference in function space. The model achieves competitive performance with deep Gaussian processes at a fraction of the computational cost. Complementing this, two post-hoc methods—the Variational Linearized Laplace Approximation (VaLLA) and the Fixed-Mean Gaussian Process (FMGP)—are proposed to equip pretrained deterministic networks with calibrated uncertainty estimates. Both approaches deliver well-calibrated predictions on large-scale tasks, bridging deterministic and Bayesian deep learning.
The theoretical contributions focus on one of the central open questions in modern machine learning: why do large, over-parameterized neural networks generalize so well? To address this, the thesis develops a unified probabilistic framework that connects three key mechanisms—diversity, smoothness, and stochasticity—within the language of PAC–Bayesian and large-deviation theory. The framework formalizes how ensemble diversity reduces generalization error by encouraging functional independence among predictors, and how smoothness—captured through the curvature of the loss landscape—can be interpreted as enlarging the rate function that governs the concentration of empirical loss. PAC–Chernoff bounds derived within this setting remain meaningful even in interpolation regimes, providing a quantitative, distribution-dependent explanation for double-descent behaviour. Finally, stochasticity in optimization, particularly in stochastic gradient descent (SGD), is analyzed as an implicit form of regularization. The noise introduced by mini-batch sampling acts as a probabilistic mechanism that biases the learning process toward flatter minima and more stable solutions. This connection between optimization dynamics and generalization, grounded in large-deviation principles, offers a probabilistic account of how randomness and structure interact in deep learning. Altogether, this theoretical line of work unifies seemingly distinct explanations of generalization under a single mathematical framework, clarifying how probabilistic structure, model diversity, and stochastic training jointly shape predictive performance.
Taken together, these contributions provide both practical tools for scalable uncertainty estimation and theoretical insight into the probabilistic structure of deep learning. This thesis argues that reliable generalization and calibrated uncertainty emerge naturally when learning systems are viewed—and designed—through the lens of Bayesian reasoning.
Keywords: Bayesian deep learning, Gaussian process models, variational inference, implicit processes, uncertainty quantification, PAC–Bayesian generalization bounds, large-deviation analysis, stochastic optimization, implicit regularization, probabilistic learning theory