PDB-Quantifier: Reference-Based Protein Structure Quality Assessment and Prioritization via One-Class Learning


Creative Commons License

Uğurlu S. Y.

20th INTERNATIONAL CONFERENCE ON ENGINEERING & NATURAL SCIENCES, Ruse, Bulgaristan, 23 - 30 Mayıs 2026, ss.269-276, (Tam Metin Bildiri)

Özet

Selecting reliable protein structures from the Protein Data Bank is a critical prerequisite for docking, virtual screening, molecular dynamics, and other structure-based computational workflows. However, commonly used structure selection strategies often rely on rigid thresholding of individual quality descriptors and may fail to capture the multivariate nature of structural reliability. In this study, we present PDB-Quantifier, a reference-based framework for assessing protein structures that applies one-class unsupervised machine learning to score structural conformity without requiring explicit quality labels.

A curated ADS benchmark set containing 85 protein structures was used as the reference domain. For each structure, feature vectors were constructed from experimentally reported and validation-related descriptors, including resolution, refinement statistics, atom and residue counts, ligand-related variables, and structural validation measures such as clashscore and Ramachandran outliers. Missing values were handled by mean imputation, and an Isolation Forest model was trained to learn the intrinsic distribution of the reference set. The trained model was then applied to two independent external datasets: an MTi benchmark set and a randomly sampled Protein Data Bank set.

The ADS reference set showed a high inlier ratio (0.89) and mean conformity score (0.64), whereas the MTi and random datasets displayed markedly lower conformity scores (0.37 and 0.32, respectively), indicating substantial distributional deviation from the learned reference space. Distributional differences were further supported by Kolmogorov-Smirnov analysis. Overall, the results demonstrate that one-class unsupervised learning can effectively model structural consistency and generate a practical conformity-based scoring system for protein structure prioritization. PDB-Quantifier provides a reproducible and data-driven strategy for reference-guided screening of protein structures in computational structural biology.