Navigating a 1E+60 Chemical Space of Peptide/Peptoid Oligomers

Check out our latest paper Navigating a 1E+60 Chemical Space of Peptide/Peptoid Oligomers in the special issue on Exploration of (Ultra)Big Chemical Spaces in Molecular Informatics Wiley!

In this work, we present an update to the Peptide Design Genetic Algorithm (PDGA) and demonstrate its ability to efficiently explore a peptide/peptoid oligomer space containing >10^60 possible structures. We highlight how a conceptually simple method can be repurposed into a powerful generative algorithm capable of discovering novel compounds within an ultra-large chemical space. On top of that, the modular nature of the PDGA allows for seamless integration of new chemistries, scoring functions, and building blocks - avoiding the need for retraining, a common limitation in generative machine learning models.

Abstract
Herein we report a virtual library of 1E+60 members, a common estimate for the total size of the drug-like chemical space. The library is obtained from 100 commercially available peptide and peptoid building blocks assembled into linear or cyclic oligomers of up to 30 units, forming molecules within the size range of peptide drugs and potentially accessible by solid-phase synthesis. We demonstrate ligand-based virtual screening (LBVS) using the peptide design genetic algorithm (PDGA), which evolves a population of 50 members to resemble a given target molecule using molecular fingerprint similarity as fitness function. Target molecules are reached in less than 10,000 generations. Like in many journeys, the value of the chemical space journey using PDGA lies not in reaching the target but in the journey itself, here by encountering non-obvious analogs. We also show that PDGA can be used to generate median molecules and analogs of non-peptide target molecules.

Author(s) Markus Orsi and Jean-Louis Reymond