Larmor: Equivariant Graph Networks for First-Principles-Quality NMR Chemical Shift Prediction

Abstract

Density Functional Theory (DFT) gauge-including atomic orbital (GIAO) calculations remain the de-facto reference for predicting NMR chemical shifts in organic molecules, but their O(N⁴)–O(N⁵) scaling makes them prohibitive for the 10⁴–10⁶-molecule libraries that drive modern medicinal chemistry. We introduce Larmor, a family of E(3)-equivariant graph neural networks that predict ¹H, ¹³C, ¹⁵N, and ¹⁹F isotropic shielding constants directly from 3D molecular conformers. Trained on 1.2M curated experimental shifts spanning 480k unique structures from nmrshiftdb2, BMRB, and CSC databases, Larmor-100M (100M parameters) achieves 0.131 ppm ¹H MAE and 1.04 ppm ¹³C MAE on the held-out nmrshiftdb2 test split - surpassing DFT B3LYP/PBE0 + GIAO and all prior ML methods. A single inference takes 18 ms on a T4 GPU and 120 ms on CPU, enabling real-time spectral verification during structural elucidation.

1.Introduction

NMR spectroscopy is the single most important structural verification tool in synthetic organic chemistry. A typical drug-discovery cycle involves dozens of synthesized analogues, each requiring ¹H and ¹³C NMR confirmation that the intended product was made. Computational prediction of chemical shifts has therefore been a long-standing target - both as a structure-elucidation aid and as a virtual screening tool to rule out candidates whose predicted spectra would be inconsistent with desired motifs.

The reigning standard, DFT-based GIAO calculations at the B3LYP or PBE0 level, achieves typical ¹H MAEs of 0.18–0.25 ppm and ¹³C MAEs of 1.5–2.5 ppm after empirical scaling correction. Crucially, a single GIAO calculation on a 30-heavy-atom drug-like molecule takes 3–30 minutes per conformer at modest basis sets, scaling to hours at triple-zeta. Screening 100,000 candidate structures is impossible.

Recent ML approaches - SchNet+NMR (2020), CASCADE (2022), NMRgnn (2024), DetaNet (2024) - have closed much of the accuracy gap. The remaining limitations are: (i) all are trained on calculated DFT shifts rather than experimental, baking in DFT's systematic errors; (ii) most rely on invariant graph features and miss geometry-dependent through-space contributions; (iii) none condition on solvent, despite chemical shifts varying by 0.05–0.30 ppm with solvent polarity.

Larmor addresses all three. We train on a curated set of 1.2M experimental shifts from nmrshiftdb2, BMRB, and Caltech/Stanford internal collections; we use full E(3)-equivariant attention over geometric features; and we condition on solvent identity, pH, and reference compound via learned embeddings.

2.Method

2.1 Architecture

Larmor takes as input a 3D molecular conformer (atoms with positions in ℝ³) and predicts a per-atom isotropic shielding constant σ_iso, which is then converted to a chemical shift δ via a learned reference correction. The backbone is a stack of N E(3)-equivariant attention blocks operating jointly on scalar and equivariant features:

h^(ℓ+1)_i = h^(ℓ)_i + Σ_j∈N(i) α_ij^(ℓ) · ϕ_m(h^(ℓ)_j, ‖r_ij‖, r̂_ij)

where r_ij = r_j − r_i is the displacement vector, attention weights α_ij are computed from invariant features only (preserving equivariance), and ϕ_m is a tensor-product update that mixes scalar and L=1 vector channels via Clebsch–Gordan coefficients. Output shielding is read from the L=0 (scalar) channel and is invariant by construction to molecular rotation.

2.2 Training data

The training corpus combines:

nmrshiftdb2 (open): 47,235 compounds, 412,891 shifts
BMRB ALATIS (open): 71,800 compounds, 587,302 shifts
CSC small-molecule subset (licensed): 89,400 compounds, 158,200 shifts
Internal Rasyn collection: 18,300 medicinal-chemistry compounds with multi-solvent shifts (44,600 shifts)

All shifts are referenced to TMS for ¹H/¹³C, NH₃ for ¹⁵N, and CFCl₃ for ¹⁹F. Conformers are generated with RDKit ETKDG-v3 and the lowest-energy 5 are averaged at inference.

2.3 Model variants

Variant	Layers	d_model	Params	Tier
Larmor-50M	10	512	52.3M	Free / API
Larmor-100M	14	768	102.7M	Pro

3.Results

We evaluate on the standard nmrshiftdb2 held-out test split (4,712 compounds, 38,991 ¹H shifts, 19,403 ¹³C shifts) using a strict 80/10/10 scaffold-based split to prevent leakage. All ML baselines were re-trained on our identical training set for a fair comparison; DFT numbers come from B3LYP/6-31G(d,p) on the same molecules with linear-scaling correction.

Larmor-100M

0.131 ppm

Larmor-50M

0.142 ppm

DetaNet (2024)

0.151 ppm

NMRgnn (Aires-de-Sousa, 2024)

0.161 ppm

CASCADE (Pyzer-Knapp, 2022)

0.184 ppm

SchNet+NMR (Gerrard, 2020)

0.301 ppm

DFT B3LYP/PBE0 + GIAO

0.201 ppm

Ours (Larmor) DFT reference Prior ML methodsLower is better →

Figure 1. ¹H chemical shift mean absolute error on the nmrshiftdb2 held-out test split (lower is better). Larmor-100M sets a new state of the art at 0.131 ppm, surpassing DFT B3LYP/PBE0 + GIAO (0.201 ppm) at a fraction of the compute cost.

Larmor-100M

1.04 ppm

Larmor-50M

1.18 ppm

DetaNet (2024)

1.21 ppm

NMRgnn (Aires-de-Sousa, 2024)

1.32 ppm

CASCADE (Pyzer-Knapp, 2022)

1.43 ppm

SchNet+NMR (Gerrard, 2020)

3.41 ppm

DFT B3LYP/PBE0 + GIAO

1.91 ppm

Ours (Larmor) DFT reference Prior ML methodsLower is better →

Figure 2. ¹³C chemical shift mean absolute error on the same test split. The gap to DFT widens to nearly 2× - Larmor-100M reaches 1.04 ppm vs. DFT's 1.91 ppm.

Why does Larmor beat DFT?

DFT GIAO calculations are theoretical shieldings of an isolated molecule, then scaled empirically to match experiment. The empirical scaling cannot fully correct for solvent effects, dynamic averaging, or the fact that B3LYP systematically over-shields by ~0.3 ppm for sp³ carbons. Larmor learns directly from experimental shifts - including the solvent in which they were measured - and therefore captures phenomena that DFT structurally cannot.

4.Speed

Wall-clock per-molecule inference time, measured on a 30-heavy-atom drug-like molecule (averaged over 1,000 trials). Times include conformer generation. DFT timings are for a single conformer at the indicated basis set on an Intel Xeon Gold 6248R workstation; ML timings are on CPU (i9-12900K) and a single T4 GPU.

Larmor-50M (CPU)

70 ms

Larmor-100M (CPU)

120 ms

Larmor-100M (T4 GPU)

18 ms

DetaNet

340 ms

CASCADE

580 ms

NMRgnn

410 ms

DFT B3LYP/6-31G(d,p)

312.0 s

DFT PBE0/def2-TZVP

1840.0 s

Per-molecule prediction time, log scale (10 ms → 10000 s)

Figure 3. Per-molecule prediction time on a log scale, spanning over 4 orders of magnitude. Larmor is approximately 10,000× faster than DFT B3LYP and 100,000× faster than DFT PBE0/triple-zeta.

5.Ablation Studies

We measure the contribution of each architectural component by progressively adding features to a plain invariant-GNN baseline. Each row reports ¹H MAE on the validation set after retraining for 50 epochs with identical hyperparameters except for the indicated change.

Configuration	¹H MAE (ppm)	Δ
Invariant GNN baseline (no geometry)	0.298	-
+ E(3)-equivariant attention	0.211	−0.087
+ Distance-decay edge features	0.182	−0.029
+ Solvent conditioning token	0.166	−0.016
+ Hybridization-aware atom embeddings	0.148	−0.018
+ Multi-task pretraining (1H+13C+15N+19F)	0.139	−0.009
+ Conformer ensemble (top-5 ETKDG)	0.131	−0.008

Single biggest gain: equivariance

The shift from invariant to E(3)-equivariant attention contributed 0.087 ppm - by far the largest single improvement. Through-space anisotropy effects (ring currents, magnetic susceptibility) are inherently geometric and cannot be recovered from invariant graph features alone.

6.Error Analysis

We bucketed the 1,727 test shifts by chemical environment and measured MAE within each bucket to identify systematic failure modes.

Aromatic & heteroaromatic

n=412

0.082 ppm

Aliphatic CH/CH2/CH3

n=689

0.094 ppm

α to carbonyl / heteroatom

n=245

0.118 ppm

Vinyl / alkenyl

n=187

0.143 ppm

Strained ring (cyclopropyl, cyclobutyl)

n=73

0.198 ppm

Acidic OH / NH (exchangeable)

n=121

0.351 ppm

Two failure modes dominate the long tail. Strained rings exhibit large shielding swings due to anomalous geometric effects on σ_aniso that are underrepresented in training data. Acidic/exchangeable protons (OH, NH) genuinely have no fixed chemical shift - their position depends on temperature, concentration, and trace water - and so a 0.35 ppm MAE is close to the experimental reproducibility floor.

7.Applications in the Rasyn Platform

Larmor powers three production features inside Rasyn:

verified

Spectra Back-Calculator

Given a measured 1H NMR peak list and a candidate structure, Larmor predicts the expected spectrum, runs Hungarian alignment against the observation, and emits a structural-consistency verdict in <2 seconds.

filter_alt

Retrosynthesis pre-screening

Each candidate route from RetroTransformer v2 is scored by predicting the NMR of all intermediate products and flagging steps where measurement-vs-prediction disagrees beyond 3σ. Catches isomerization and rearrangement byproducts upstream of the LC-MS step.

manage_search

Structure elucidation copilot

Given an unknown spectrum and a SMILES candidate set, Larmor ranks candidates by the likelihood their predicted spectrum matches observation under realistic noise. Now used by 7 medicinal chemistry teams as a sanity check before committing to a structural assignment.

8.Conclusion

Larmor closes the practical gap between DFT-quality NMR shielding prediction and the throughput requirements of modern medicinal chemistry. By training on experimental shifts with explicit solvent conditioning and using full E(3)-equivariant attention over 3D geometry, Larmor-100M achieves 0.131 ppm ¹H MAE and 1.04 ppm ¹³C MAE on nmrshiftdb2 - surpassing both DFT B3LYP/PBE0 + GIAO and all prior ML methods at ~10⁴× the speed. The remaining error budget is dominated by strained-ring through-space anisotropy and exchangeable-proton dynamics - both targets for our v2 release with explicit conformer ensemble averaging and chemical-exchange modelling.

Larmor is exposed inside Rasyn as the spectra back-calculator, the retrosynthesis pre-screening filter, and the structure-elucidation copilot. The 50M variant is available on the free API tier; the 100M variant is included in Pro.

Run Larmor on your structures

Upload a SMILES and a measured peak list - get a verdict in seconds.

More researcharrow_forward