Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches
Abstract
Geometry-aware optimal transport and active pruning enhance Gaussian process regression for efficient saddle point searches on high-dimensional energy surfaces.
Gaussian process (GP) regression provides a strategy for accelerating saddle point searches on high-dimensional energy surfaces by reducing the number of times the energy and its derivatives with respect to atomic coordinates need to be evaluated. The computational overhead in the hyperparameter optimization can, however, be large and make the approach inefficient. Failures can also occur if the search ventures too far into regions that are not represented well enough by the GP model. Here, these challenges are resolved by using geometry-aware optimal transport measures and an active pruning strategy using a summation over Wasserstein-1 distances for each atom-type in farthest-point sampling, selecting a fixed-size subset of geometrically diverse configurations to avoid rapidly increasing cost of GP updates as more observations are made. Stability is enhanced by permutation-invariant metric that provides a reliable trust radius for early-stopping and a logarithmic barrier penalty for the growth of the signal variance. These physically motivated algorithmic changes prove their efficacy by reducing to less than a half the mean computational time on a set of 238 challenging configurations from a previously published data set of chemical reactions. With these improvements, the GP approach is established as, a robust and scalable algorithm for accelerating saddle point searches when the evaluation of the energy and atomic forces requires significant computational effort.
Community
Let’s discuss: Why GPR-acceleration has failed to deliver on its promise, and how we finally fix it.
Being between disciplines, the discussion is also split.
ML Perspective
The Problem: Gaussian Process (GP) models are powerful but suffer from a well-known curse: the punishing cost of hyperparameter optimization. The need to invert a kernel matrix leads to cubic scaling
with the number of training points. In on-the-fly scientific applications where data arrives sequentially, the time spent tuning the GP quickly becomes the primary bottleneck, making the entire approach impractical.
Our Solution: We solve this by decoupling the data used for prediction from the data used for hyperparameter tuning.
FPS for Hyperparameters
: Our core innovation is to perform the expensive hyperparameter optimization on a small, fixed-size subset of data. This subset is actively curated usingFarthest Point Sampling (FPS)
, which ensures it remains representative even as the full dataset grows. This breaks the cubic scaling bottleneck.- A Principled Metric: To make FPS effective for complex physical data (molecules), we use a distance metric based on
Optimal Transport
(the Earth Mover's Distance). It ispermutation-invariant
, allowing us to select a truly geometrically diverse subset from the point-cloud-like data. - Stabilizing Training: We introduce
data-driven guardrails
, including anadaptive variance barrier
andHyperparameter Oscillation Detection (HOD)
, to ensure the optimization process remains stable even on this compact subset. - vs. Other Methods: Unlike methods that use
inducing points
, our approach uses the full data history for the final prediction, ensuring no loss of accuracy.
The Impact: This "subset for tuning, full set for prediction" strategy makes the GP practical. We achieve a >2x speedup in wall-time over the previous state-of-the-art and drastically reduce model failure rates. It provides a general blueprint for creating efficient and robust surrogate models for expensive on-the-fly learning tasks.
Chemistry Perspective
The Problem: Finding saddle points
(transition states) is crucial for understanding reaction mechanisms, but it's bottlenecked by the cost of quantum chemistry calculations. GPR-accelerated methods like GPDimer
promised to reduce the number of these calculations, but they were often unstable and ultimately slower in total time-to-solution
.
Our Solution: The OT-GP
framework makes GPR acceleration a practical reality by fixing the source of these instabilities.
- A Chemically-Aware Metric: We introduce a
permutation-invariant
distance metric based on theEarth Mover's Distance
. Unlike the previousD_1Dmaxlog
metric, our EMD understands that identical atoms are interchangeable. It is not fooled by low-energy symmetric events likemethyl rotations
orproton transfers
. - Preventing Instability: This robust representation prevents the algorithm from adding redundant data to the training set, which was a primary cause of ill-conditioned kernel matrices and the
catastrophic failures
(e.g., exploding molecules) seen in older methods. - Targeted Data Acquisition: We use this EMD with
Farthest Point Sampling (FPS)
to intelligently select data for hyperparameter tuning and anadaptive trust radius
to guide the search, ensuring the model remains stable.
The Impact: On a benchmark of 238 reactions, OT-GP is a transformative improvement.
- Data Efficiency: We preserve the ~10x reduction in
Hartree-Fock
calculations compared to the standard Dimer method (median of ~28 vs. ~254). - Wall-Time Speedup: We are >2x faster than the previous GPDimer (12.6 min vs. 28.3 min) and nearly 2x faster than the standard Dimer (12.6 min vs. 23.7 min).
- Reliability: We achieve a 96% success rate, solving cases where all previous methods failed.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information (2025)
- Enhancing Trust-Region Bayesian Optimization via Newton Methods (2025)
- Symmetry-Aware Bayesian Optimization via Max Kernels (2025)
- A Particle-Flow Algorithm for Free-Support Wasserstein Barycenters (2025)
- Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization (2025)
- AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates (2025)
- Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper