Revolutionizing drug discovery: Giga-scale screening unleashes power of AI and virtual libraries

In a recent review published in the journal Nature, researchers examined recent breakthroughs in ligand discovery tools, their potential to reshape the drug research and development process, and the hurdles faced.

Computer-assisted technologies for developing drugs have been in use for several years. In recent times, pharma and academia have seen a shift toward embracing computational tools. The transition is facilitated by the abundance of data on ligand characteristics and binding to therapeutic targets, three-dimensional (3D) protein structures, and the emergence of on-demand virtual libraries comprising billions of drug-like small molecules. To fully utilize the resources, rapid computational approaches for effective and rapid giga-scale screening are required.

In the present review, researchers reviewed existing data on computer-assisted approaches in drug discovery and development (DDD).

Study: Computational approaches streamlining drug discovery. Image Credit: angellodeco / Shutterstock

Very-large-scale integration (VLS) technology for identifying high-grade hits

The Protein Data Bank (PDB) comprises >200,000 structures of proteins. High-resolution cryo-electron microscopic imaging and X-rays cover >90% of protein families, and the remaining gaps are filled by AlphaFold2 modeling and/or homology. Chemical spaces used to screen and synthesize potential drug candidates have increased from 107 off-the-shelf molecules to >3.0 x 1010 molecules synthesized on-demand from 2015 to 2022, with the potential to extend to >1015 compounds.

In comparison to HTS (105 to 107) and fragment-based ligand discovery (FBLD, 103 to 105), giga-scale deoxyribonucleic acid (DNA)-encoded libraries (DEL) screening (1010) and giga-scale VLS use considerably larger initial libraries (1010 to 1015). The hit rate (%) of HTS and giga-scale DEL screening are similar (0.01 to 0.5), higher for FBLD (1.0 to 5.0), and highest for VLS (10 to 40a, where a represents the proportion of estimated hits that were confirmed experimentally).

The affinity for initial hits is very weak for FBLD (small fragments sized 100 to 1,000.0 μM), weak (1.0 t 10 μM) for HTS, medium for DEL screening (0.1 to 10 μM), and medium to a high level (0.010 to 10 μM) for VLS. In addition to quantitative structure-activity relationship (QSAR)-based optimization for identifying leads, HTS requires customized synthesis of structure-activity relationships, FBLD requires growing or merging of the fragments, and DEL screening requires resynthesis of label-free hits.

VLS involves quantitative optimization of structure-activity relationships based on catalog structures and requires one-tenth (0.0 to 50) of the number of customized synthesis processes required for HTS, FBLD, and DEL screening to identify leads. Further, HTS and FBLD do not generate novel hits. HTS processes require scaffold hopping or modifications, and FBLD requires rational designs to attain intellectual property (IP) novelty. On the contrary, most VLS hits are novel.

HTS limitations include modest library sizes, unknown modes of binding, and expensive equipment; FBLD limitations include the need for expensive equipment for nuclear magnetic resonance (NMR), surface plasmon resonance (SPR), and X-ray imaging, as well as many optimization steps; DEL screening results in several false positives and requires off-deoxyribonucleic acid hit resynthesis. VLS requires computational resources, which have been reduced using modular-type VLS by >1,000-fold.

Virtual screening algorithms are based on protein structures, ligands, or both. Protein-based algorithms require high-resolution structures, whereas ligand-based ones require large datasets for ligand activity. Hybrid screening requires data on ligand activity and protein-ligand 3D complexes to generate three-dimensional interaction fingerprints and artificial intelligence (AI)-based models.

Chemical library types and computational-driven technology to streamline the discovery of drugs

Pharma firms in-house screen enormous numbers of compounds, whereas collections from vendors allow for rapid (<1.0 week) delivery of in-stock molecules featuring unique chemical-type scaffolds that can be searched easily and are compatible with high-throughput screening (HTS). However, the cost of managing physical drug libraries, their slow growth, and their small size limit their applicability.

On-demand REAL and chemical spaces enable rapid parallel synthesis of on-demand molecules from >12,000 building blocks undergoing >180 reactions, with a success rate of >80.0% and delivery within 2.0 to 3.0 weeks. Examples include Galaxy by WuXi, Enamine REAL, and CHEMriya by Otava. Including additional synthons (e.g., using the V-SYNTHES algorithm) and reaction scaffolds enables high novelty and rapid polynomial growth for virtual chemical space-based drug development.

The V-SYNTHES algorithm can be used to effectively screen >31 billion compounds, including >3.0 x1010 compounds from REAL space and >1015 compounds from expanded chemical spaces, by fully enumerating molecules that optimally fit the target pocket. Generative spaces (GDB-13,17,18, and GDBChEMBL) include all theoretically conceivable molecules and chemical spaces. Only theoretical-type plausibility, predicted at 1,023 to 1,060 drug-like molecules, limits such realms.

Despite providing broad coverage of spaces, the success rates and reactionary pathways of the compounds produced are not known, warranting computational estimation of their ability to synthesize drug candidates. In generative spaces, atomic graphs are used to generate saturated hydrocarbon structures and skeletons comprising unsaturated molecules. The skeletons are expanded by heteroatom substitution and converted into meaningful compounds.

Computationally driven drug discovery is based on easily accessible on-demand or generative virtual chemical spaces, as well as structure-based and AI-based computational tools that streamline the drug discovery process. In comparison to the standard gene-to-lead discovery timeline of four to six years, computationally driven technology can identify potential drug candidates within 2.0 to 12 months.

Using rapid, flexible docking, deep learning, or scoring approaches with higher accuracy post-processing tools based on quantum mechanics and free energy perturbation (FEP) can increase high-affinity hits for giga-scale chemical spaces. In addition, rapidly expanding low-cost cloud computing, specialized chips, and graphics processing unit (GPU) acceleration also aid computational tools.

Based on the review findings, the DDD ecosystem seems to be transforming from computer-aided to computer-driven for rapid and cost-effective drug discovery using elaborate potency prediction tools and potent and selective leads. However, computational estimations require validation by performing in vitro and in vivo experiments at each step of the drug discovery pipeline.

Journal reference:
  • Sadybekov, A.V., Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023). DOI:,

Posted in: Drug Discovery & Pharmaceuticals | Device / Technology News | Medical Science News | Medical Research News | Pharmaceutical News

Tags: Artificial Intelligence, Deep Learning, DNA, Drug Discovery, Drugs, Electron, Gene, High-throughput screening, Imaging, Ligand, Protein, Research, Small Molecules, Technology, X-Ray

Comments (0)

Written by

Pooja Toshniwal Paharia

Dr. based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Source: Read Full Article