Blog

Winning Recipe for Predicting Polymer Properties

Jetze Schuurmans

Updated October 16, 2025

7 minutes

Open Polymer Prediction Challenge: Analysis of the Winning Approach

The NeurIPS Open Polymer Prediction Challenge 2025 attracted over 2,240 teams competing to predict five polymer properties from SMILES representations: glass transition temperature (Tg), thermal conductivity (Tc), density (De), fractional free volume (FFV), and radius of gyration (Rg). We analyzed the winning solution by James Day and identified several key insights that challenge current research trends while demonstrating the continued effectiveness of classical machine learning techniques.

Key Takeaways

Property-specific models remain superior for limited data: Despite the research community's push toward general-purpose foundation models, property-specific models proved more effective when working with constrained datasets.

Ensemble methods continue to excel: This traditional machine learning technique delivered exceptional performance, reinforcing its value in modern competitions.

External data demands careful curation: As discussed in The challenges of molecular property datasets, integrating external data sources requires meticulous preprocessing to address inconsistencies and noise.

General-purpose BERT outperformed domain-specific models: ModernBERT exceeded the performance of chemistry-specific models, though polyBERT embeddings were retained as valuable tabular features.

Strategic 3D model selection: The winning solution employed Uni-Mol-2-84M as its 3D model. This choice is particularly interesting given that Praski et al. demonstrated superior performance from graph transformer models like R-MAT on molecular property prediction tasks, especially for drug-related properties. R-MAT models offer easy implementation and reduced memory requirements, making the Uni-Mol-2 choice worth examining.

Architecture Overview

data_overview

The winning approach generated property-specific predictions using ensembles of ModernBERT, AutoGluon, and Uni-Mol-2 models through a multi-stage pipeline:

Initial training on externally labeled datasets
BERT model retraining on a pseudolabeled PI1M subset
Extensive feature engineering for tabular models
Post-processing adjustment for glass transition temperature predictions to compensate for distribution shift between training and leaderboard datasets

Data Strategy

data_overview

Dataset Composition and Augmentation

Model validation relied on 5-fold cross-validation using the competition's original training data. The training data was substantially augmented with external datasets and locally executed MD simulations. The winner identified significant data quality challenges and a distribution shift between the training and leaderboard datasets.

Addressing Distribution Shift

Investigation revealed a pronounced distribution shift in glass transition temperature (Tg) between training and leaderboard datasets.

TG_BIAS_COEFFICIENT The lower bound (LB) score is the wMAE metric used in the competition. The Bias Coefficient is a factor that is multiplied with the standard deviation of the glass transition predictions, this product is then added to the original Tg predictions.

To correct this systematic bias, predictions underwent post-processing: submission_df["Tg"] += (submission_df["Tg"].std() * 0.5644)

External Data Sources

The solution incorporated several external datasets:

These datasets presented multiple challenges: random label noise, non-linear relationships with ground truth, constant bias factors, and out-of-distribution outliers.

Data Cleaning Methodology

Three general strategies were applied across all datasets:

Label rescaling via isotonic regression: An isotonic regression model transformed raw labels by learning to predict ensemble predictions from the original training data. This approach effectively corrected for constant bias factors and non-linear relationships with ground truth. Final labels often represented Optuna-tuned weighted averages of raw and rescaled values to minimize overfitting.

Error-based filtering: The ensembles' predictions were used to identify samples exceeding an error threshold, which were discarded. Thresholds were defined as ratios of sample error to mean absolute error from ensemble testing on the host dataset, ensuring consistent threshold ranges across properties and facilitating Optuna hyperparameter search.

Sample weighting: Optuna tuned per-dataset sample weights, enabling models to discount lower-quality training examples appropriately.

Dataset-specific interventions included:

RadonPy: Manual inspection identified and removed outliers, particularly thermal conductivity values exceeding 0.402 that appeared inconsistent with ensemble predictions. Optuna frequently favored this filtered version during hyperparameter tuning.

MD Simulations: Rather than applying general cleaning strategies, the solution implemented model stacking. An ensemble of 41 XGBoost models predicted simulation results, and these predictions served as supplemental features for AutoGluon. This approach allowed second-level models to learn arbitrary non-linear relationships in potentially noisy simulation data. The tabular models achieved a CV wMAE improvement of approximately 0.0005 compared to excluding simulation results entirely.

Optuna included general cleaning strategies and the RadonPy filter rule as hyperparameters, confirming their value through optimization.

Deduplication Strategy

Dataset augmentation introduced duplicate polymers identified by converting SMILES to canonical form. Optuna determined optimal sampling weights for duplicates, with lower-weighted entries removed.

To prevent validation set leakage, Tanimoto similarity scores were computed for all training-test monomer pairs. Training examples with similarity scores exceeding 0.99 to any test monomer were excluded to eliminate near-duplicates.

MD Simulation Data Generation

Molecular dynamics simulations were executed for 1,000 hypothetical polymers from PI1M through a four-stage pipeline:

Configuration Selection: A LightGBM classification model predicted optimal configuration choice between two strategies:
Fast but unstable: psi4's Hartree-Fock geometry optimization (~1 hour per polymer, 50% failure rate)
Slow and stable: b97-3c based optimization (~5 hours per polymer)

Classification features included RDKit molecular descriptors, backbone versus sidechain characteristics, and conformers from ETKDGv3 generation with MMFFOptimization.

RadonPy Processing:
Confirmation search execution
Automatic degree of polymerization adjustment to maintain ~600 atoms per chain, independent of monomer size
Charge assignment
Amorphous cell generation
Equilibrium Simulation: LAMMPS computed equilibrium simulations with settings specifically tuned for representative density predictions.
Property Extraction: Custom logic estimated FFV, density, Rg, and all available RDKit 3D molecular descriptors.

BERT Implementation

bert

Model Selection

The solution achieved optimal results with ModernBERT-base, a general-purpose foundation model, rather than chemistry-specific alternatives. Both ChemBERTa and polyBERT underperformed relative to ModernBERT-base. Among general-purpose BERT variants, ModernBERT outperformed alternatives like DeBERTa. Given ModernBERT's extensive training on coding data and strong coding task performance, CodeBERT was evaluated and performed comparably. Larger models failed to deliver improvements given the limited fine-tuning data available.

Pretraining on PI1M

The solution implemented a two-stage pretraining approach:

Stage 1: An ensemble of BERT, Uni-Mol, AutoGluon, and D-MPNN models generated property predictions for 50,000 PI1M polymers.

Stage 2: BERT models were pretrained on a pairwise comparison classification task, predicting which polymer exhibited higher or lower property values in each pair. Polymer pairs with similar property values were excluded. The objective functioned as a multi-task classifier, simultaneously predicting relationships across all five properties. This additional pretraining stage consistently improved performance over third-party foundation models.

Fine-tuning Protocol

The fine-tuning process followed standard BERT practices:

AdamW optimizer
No frozen layers
One-cycle learning rate schedule with linear annealing
Automatic mixed precision
Gradient norm clipping at 1.0
Optuna-tuned learning rate, batch size, and epoch count

The limited training data necessitated differentiated learning rates: the backbone learning rate was set one order of magnitude lower than the regression head learning rate. The choice of No Frozen Layers surprised me given the limited training data, but the differentiated learning rates likely mitigated overfitting and training instability.

Data Augmentation

Both pretraining and fine-tuning employed Chem.MolToSmiles(..., canonical=False, doRandom=True, isomericSmiles=True) to generate 10 non-canonical SMILES per molecule, expanding training data tenfold. At inference, 50 predictions per SMILES were generated and aggregated using the median as the final prediction.

Tabular Modeling

Framework and Feature Selection

AutoGluon served as the tabular modeling framework, with Optuna selecting optimal features for each property.

Feature Engineering

The feature set encompassed diverse molecular representations:

Molecular descriptors and fingerprints:

All RDKit-supported 2D and graph-based molecular descriptors
Morgan fingerprints
Atom pair fingerprints
Topological torsion fingerprints
MACCS keys

Graph and structural features:

NetworkX-based graph features
Backbone and sidechain features
Gasteiger charge statistics
Element composition and bond type ratios

Model-derived features:

Predictions from 41 XGBoost models trained on MD simulation results (FFV, density, Rg predictions, and 3D structure descriptors)
Embeddings from polyBERT models pretrained on PI1M

Model Comparison

Alternative frameworks, including XGBoost, LightGBM, and TabM, underwent extensive hyperparameter tuning with approximately 20× the computational budget allocated to AutoGluon. Despite this additional optimization effort, AutoGluon maintained superior performance.

3D Molecular Modeling

Uni-Mol 2 84M was selected primarily for implementation efficiency. The model required no feature engineering or custom training loops, streamlining the development process. GPU memory constraints (24GB) emerged when processing larger molecules exceeding 130 atoms, particularly affecting FFV training data. Consequently, Uni-Mol 2 84M was excluded from the FFV prediction ensemble.

Unsuccessful Approaches

The following strategies failed to improve performance:

Graph Neural Networks, specifically D-MPNN
GMM-based data augmentation from public notebooks
Chemistry-specific embedding models

Acknowledgements

We thank James Day for openly sharing the code, solution notebook, and detailed writeup that enabled this analysis.

Credits for banner image: Lone Thomasky & Bits&Bäume / Distorted Lake Trees / Licenced by CC-BY 4.0

Written by

Jetze Schuurmans

Machine Learning Engineer

Jetze is a well-rounded Machine Learning Engineer, who is as comfortable solving Data Science use cases as he is productionizing them in the cloud. His expertise includes: AI4Science, MLOps, and GenAI. As a researcher, he has published papers on: Computer Vision and Natural Language Processing and Machine Learning in general.

Contact

Let’s discuss how we can support your journey.

‌

Response

Related Topics

Context Files

Related Topics

Winning Recipe for Predicting Polymer Properties

Jetze Schuurmans

Open Polymer Prediction Challenge: Analysis of the Winning Approach

Key Takeaways

Architecture Overview

Data Strategy

Dataset Composition and Augmentation

Addressing Distribution Shift

External Data Sources

Data Cleaning Methodology

Deduplication Strategy

MD Simulation Data Generation

BERT Implementation

Model Selection

Pretraining on PI1M

Fine-tuning Protocol

Data Augmentation

Tabular Modeling

Framework and Feature Selection

Feature Engineering

Model Comparison

3D Molecular Modeling

Unsuccessful Approaches

Acknowledgements

Written by

Jetze Schuurmans

Let’s discuss how we can support your journey.