Abstract
A retrosynthesis route is only useful if a chemist can actually run each step. We introduce Marcus, a family of hierarchical transformers that predict the joint distribution of solvent, catalyst, base, temperature, and yield for any reactant–product pair. Trained on 4.7M reactions from Reaxys, USPTO, the Open Reaction Database, and Rasyn's in-house optimized-batch dataset, Marcus-1B (1.0B parameters) achieves 92.1% top-3 solvent accuracy, 89.9% top-3 catalyst accuracy, 8.4 °C temperature MAE, and R2 = 0.72 on yield prediction - surpassing GraphRXN-Conditions, RXN4Chem (IBM), and the Coley/Maser baselines. The hierarchical decoder factorizes the conditional distribution P(solvent, catalyst, base, T | reaction), preventing the "mode collapse to the most common condition" failure mode of flat seq2seq models. Marcus is deployed inside Rasyn's retrosynthesis pipeline and the Condition Compiler to populate executable lab protocols from skeletal route plans.
1.Introduction
Predicting the right conditions for a chemical reaction has historically been the domain of process chemists with decades of experience. Should this Suzuki coupling use Pd(PPh3)4 or Pd(dba)2/SPhos? Toluene/H2O at reflux, or 1,4-dioxane at 90 °C? K2CO3, K3PO4, or Cs2CO3? The right answer depends on substrate, scale, and constraints invisible to a SMILES string.
Coley et al. (2018) opened the field with a simple feedforward network on Morgan fingerprints achieving 41% top-1 solvent accuracy. Subsequent work - Maser's Smiles2Conditions (2021), IBM's RXN4Chem (2023), GraphRXN-Conditions (2024) - pushed top-3 accuracy past 80% but treated each condition (solvent, catalyst, base, T) as an independent multi-class classification task. This factorization is wrong: the choice of base depends on the catalyst, and the temperature depends on the solvent's boiling point.
Marcus models the conditions as a structured joint distribution and decodes them autoregressively in a chemically motivated order - first solvent (reflecting compatibility with reagents), then catalyst (conditional on solvent), then base/additive, then temperature, then yield. Each step attends to the reaction graph and to all previously decoded conditions, capturing the cross-condition dependencies that matter in practice.
2.Method
2.1 Reaction-graph encoder
Each reactant and product is parsed to its molecular graph and atom-mapped via RXNMapper. The encoder operates on the union graph with edge-type embeddings distinguishing intra-molecule bonds, inter-molecule reactant–product correspondence, and edges marking the reaction core (atoms whose bond environment changes). A 12-layer GraphTransformer with rotational position encodings produces a contextualized atom embedding tensor.
2.2 Hierarchical decoder
The decoder produces conditions in a fixed chemically motivated order:
- Solvent - selected from a 142-class controlled vocabulary (DMSO, THF, MeCN, …) plus a "mixture" head for binary solvent systems
- Catalyst / pre-catalyst - 384-class vocabulary including ligand-bound complexes (Pd(PPh3)4, Pd(dba)2/SPhos, …)
- Base / additive - 89 classes
- Temperature - regression head with Gaussian likelihood
- Yield - regression head conditioned on all of the above
Each step receives the reaction encoding plus embeddings of all previously decoded conditions via cross-attention. This factorization captures couplings such as "K2CO3 doesn't dissolve well in toluene → prefer Cs2CO3 when toluene is the solvent."
2.3 Training data
The corpus combines four sources, with deduplication and aggressive filtering for atom-mapping consistency, reasonable yields (5–95%), and exclusion of one-step deprotection trivialities:
- Reaxys-derived corpus (licensed): 3.1M reactions with full conditions and reported yield
- USPTO-MIT-Conditions (open): 1.0M patent reactions parsed by Lowe + manual review
- Open Reaction Database (ORD): 540k reactions from open-source contributions
- Rasyn in-house optimized batch dataset: 87k high-throughput screening rows from partner labs (Aragen, IFM Therapeutics, 1200 Pharma)
We employ a curriculum: the model is pretrained on the noisier literature corpus (Reaxys + USPTO + ORD) for 80% of training, then fine-tuned on the cleaner in-house optimized-batch data. This curriculum is responsible for the final 0.3pp top-3 lift in our ablation.
2.4 Model variants
| Variant | Encoder layers | Decoder layers | d_model | Params | Tier |
|---|---|---|---|---|---|
| Marcus-100M | 12 | 8 | 768 | 108M | Pro |
| Marcus-1B | 24 | 16 | 2048 | 1.04B | Enterprise |
3.Results
We evaluate on the USPTO-MIT-Conditions held-out test set (50,287 reactions, no overlap with training) and additionally on a curated internal medicinal-chemistry test set of 4,210 reactions across 12 named reaction classes. All baselines were re-trained on identical training data and evaluated with identical metrics for a fair comparison.
3.1 Temperature regression
| Method | Params | MAE (°C) | % within ±20 °C |
|---|---|---|---|
| Marcus-1BOurs | 1.0B | 8.4 | 91.2% |
| Marcus-100MOurs | 108M | 11.7 | 87.4% |
| GraphRXN-Conditions (2024) | 85M | 14.2 | 79.1% |
| RXN4Chem (IBM, 2023) | 40M | 16.8 | 74.3% |
| Maser et al. - Smiles2Conditions (2021) | 12M | 19.4 | 68.0% |
4.Yield Prediction
Yield prediction is the hardest task in this benchmark - yield is heavily influenced by execution factors (purity of starting material, mixing, temperature control) that are invisible from a reaction SMILES alone. The R2 ceiling for any structure-only model is therefore intrinsically bounded; we estimate it at ~0.80 by training an oracle model on the in-house high-throughput screening data with explicit batch metadata.
Marcus-1B achieves R2 = 0.72 / MAE = 8.7%, closing more than half of the remaining gap to the oracle ceiling.
| Method | Params | R² | MAE (%) |
|---|---|---|---|
| Marcus-1BOurs | 1.0B | 0.72 | 8.7 |
| Marcus-100MOurs | 108M | 0.65 | 10.2 |
| Yield-BERT (Schwaller, 2021) | 12M | 0.55 | 12.4 |
| GraphRXN (yield head) | 85M | 0.58 | 11.8 |
| Random Forest (Ahneman, 2018) baseline | - | 0.42 | 14.6 |
5.Ablation Studies
Top-3 solvent accuracy on the validation set after progressively adding components to a plain reactant-SMILES → conditions transformer baseline.
| Configuration | Top-3 solvent | Δ |
|---|---|---|
| Plain reactant→conditions transformer | 71.4% | - |
| + Reaction-graph encoder (atom-mapped) | 78.6% | +7.2 |
| + Hierarchical decoder (solvent → catalyst → base → T) | 82.1% | +3.5 |
| + Reaction-class conditioning prior | 84.7% | +2.6 |
| + Reagent–condition co-attention | 86.9% | +2.2 |
| + Multi-task yield head | 88.1% | +1.2 |
| + Curriculum: literature → optimized batch | 88.4% | +0.3 |
6.Applications in the Rasyn Platform
Marcus is the conditions backbone for two production features:
Retrosynthesis-to-protocol
Every step of a route generated by RetroTransformer v2 is passed through Marcus to produce executable conditions: solvent, catalyst, base, T, expected yield, and a confidence score. The chemist receives a complete lab protocol, not just an arrow-pushing diagram.
Condition Compiler
Given a single reaction with unknown optimal conditions, Marcus enumerates the joint top-K conditions and proposes a 2- or 3-factor DoE plan around the highest-confidence prediction. Outcome data feeds back into Marcus's curriculum fine-tuning loop.
Reaction yield ranking
When a route has multiple synthetically valid disconnections, Marcus's predicted yield is one of the inputs to the route-ranker - penalizing low-yield steps that would compound over a long sequence.
7.Limitations
- Closed-vocabulary conditions. The 142-class solvent and 384-class catalyst vocabularies cover ~98% of literature reactions but cannot suggest novel solvents or ligands. Open-vocabulary generation is future work.
- Yield ceiling at R2 ≈ 0.80. Yield depends on execution quality (mixing, purity, temperature control) that is structurally invisible. Any structure-only model has an information-theoretic ceiling well below R2=1.
- Long-tail named reactions. Reactions with fewer than ~50 training examples (e.g. exotic transition-metal cross-couplings) show 10–15pp lower top-3 accuracy. Targeted active-learning loops with partner labs are addressing this.
- Patent-bias. Reaxys and USPTO over-represent successful published reactions. Marcus may underweight unconventional but viable conditions that are simply absent from the literature.
8.Conclusion
Marcus models reaction conditions as a structured joint distribution rather than a bag of independent classifications, and decodes them in a chemically motivated order with cross-condition co-attention. Trained on 4.7M reactions and benefiting from a literature → optimized-batch curriculum, Marcus-1B sets new SOTA across solvent, catalyst, temperature, and yield prediction simultaneously. Inside Rasyn it is the difference between a retrosynthesis route and an executable lab protocol.
Generate conditions for any reaction
Drop a SMILES reaction - get a complete protocol in seconds.
More researcharrow_forward