JOURNAL OF SOLUTION CHEMISTRY, cilt.15, sa.7, ss.1001-1035, 2025 (SCI-Expanded)
The growing emphasis on sustainable solvents in modern organic synthesis demands a deeper understanding of structure–property relationships to guide reaction outcomes. While solvent polarity is a critical parameter, it cannot be fully captured using conventional physical constants. The empirical ET(30) parameter is a reliable benchmark, but its experimental determination is labor-intensive. Although deep learning models have been proposed for prediction, they often sacrifice interpretability. To address this, Inter-Pol, an interpretable machine learning framework, has been developed. Inter-Pol integrates 1618 features from RDKit, Mordred, PyBioMed, and CDK, followed by rigorous feature selection, which eliminates 1191 fingerprint features from Morgan, Avalon, and MACCS. After feature selection, Inter-Pol employs a rule-based RuleFit model within an interpretable framework. The model generates human-readable rules, such as “LogP > threshold, then high ET(30), such as 42,” that link molecular descriptors to polarity outcomes, enhancing both transparency and usability in solvent selection. These rules not only explain how and why the model produces its predictions but also provide actionable guidance for ET(30) optimization. For example, removing hydrogen bond donor groups can result in higher LogP values, reflecting increased polarity, which in turn contributes to higher ET(30) values, as guided by the rules extracted from Inter-Pol. Thus, the interpretable rules derived from Inter-Pol serve not only as predictive components but also as a practical tool for rational solvent design and experimental planning. Across both the 2022 and 2023 benchmarks used in the state-of-the-art paper, the Inter-Pol framework consistently delivers competitive predictive performance–achieving test set R-squared values of 0.965 and 0.929, respectively–surpassing the 2022 Neural Network benchmark (R-squared=0.88) and approaching the 2023 Neural Network performance (R-squared=0.952), while offering enhanced interpretability through its transparent, rule-based architecture. These results confirm Inter-Pol’s effectiveness in balancing interpretability and accuracy, offering a robust framework for solvent selection in cheminformatics and synthesis.