GDB Databases
About
GDB-11 enumerates small organic molecules up to 11 atoms of C, N, O and F following simple chemical stability and synthetic feasibility rules.
GDB-13 enumerates small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date.
How to cite
To cite GDB-11, please reference:
Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physico-chemical properties, compound classes and drug discovery. Fink, T.; Reymond, J.-L. J. Chem. Inf. Model. 2007, 47, 342-353.
Virtual Exploration of the Small Molecule Chemical Universe below 160 Daltons. Fink, T.; Bruggesser, H.; Reymond, J.-L. Angew. Chem. Int. Ed. 2005, 44, 1504-1508.
To cite GDB-13, please reference:
970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. Blum L. C.; Reymond J.-L. J. Am. Chem. Soc., 2009, 131, 8732-8733.
To cite GDB-17, please reference:
Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Ruddigkeit Lars, van Deursen Ruud, Blum L. C.; Reymond J.-L. J. Chem. Inf. Model., 2012, 52, 2864-2875.
Download
The GDB databases are hosted on the open-access repository Zenodo. You can download the databases and subsets of it using the links below. All the molecules are stored in dearomatized, canonized SMILES format and compressed as tar/gz archive (for Windows users: Download 7-zip to open archives). To see the structures in drawing format, we suggest MarvinView, a free chemical structure viewer from ChemAxon.
Set | Link | Size |
---|---|---|
GDB-17 | ||
GDB-17-Set (50 million) | GDB17.50000000.smi.gz | 314 MB |
Lead-like Set (100-350 MW & 1-3 clogP)(11 million) | GDB17.50000000LL.smi.gz | 75 MB |
Lead-like Set (100-350 MW & 1-3 clogP) without small rings (3-4 ring atoms)(0.8 million) | GDB17.50000000LLnoSR.smi.gz | 55 MB |
GDB-13 | ||
Entire GDB-13 (including all C/N/O/Cl/S molecules) | gdb13.tgz | 2.6 GB |
GDB-13 Subsets (The sum of all the subsets below correspond to the entire GDB-13 above) | ||
Graph subset (saturated hydrocarbons) | gdb13.g.tgz | 1.1 MB |
Skeleton subset (unsaturated hydrocarbons) | gdb13.sk.tgz | 14 MB |
Only carbon & nitrogen containing molecules | gdb13.cn.tgz | 443 MB |
Only carbon & oxygen containing molecules | gdb13.co.tgz | 299 MB |
Only carbon & nitrogen & oxygen containing molecules | gdb13.cno.tgz | 1.8 GB |
Chlorine & sulphur containing molecules | gdb13.cls.tgz | 189 MB |
GDB-13 Subsets (For details please refer to the Table 2 in J Comput Aided Mol Des (2011) 25:637 to 647) | ||
GDB-13 Subset AB (~635 Millions) | AB.smi.gz | 2.4 GB |
GDB-13 Subset ABC (~441 Millions) | ABC.smi.gz | 1.7 GB |
GDB-13 Subset ABCD (~277 Millions) | ABCD.smi.gz | 1.1 GB |
GDB-13 Subset ABCDE (~140 Millions) | ABCDE.smi.gz | 565 MB |
GDB-13 Subset ABCDEF (~43 Millions) | ABCDEF.smi.gz | 171 MB |
GDB-13 Subset ABCDEFG (~13 Millions) | ABCDEFG.smi.gz | 50 MB |
GDB-13 Subset ABCDEFGH (~1.4 Millions) | ABCDEFGH.smi.gz | 6.2 MB |
GDB-13 Random Sample. Annotated with frequency and log-likelihood (Please refer to Exploring the GDB-13 chemical space using deep generative models) | ||
GDB-13 Random Sample (1 Million) | gdb13.1M.freq.ll.smi.gz | 14.8 MB |
GDB-13s | ||
GDB-13s | GDB-13s.smi.gz | 423.0 MB |
FDB-17 | ||
FDB-17 | FDB-17-fragmentset.smi.gz | 62.2 MB |
GDB4c | ||
GDB4c (SMILES) | GDB4c.smi.gz | 6.2 MB |
GDB4c3D (SMILES) | GDB4c3D.smi.gz | 161 MB |
GDB4c3D (SDF) | GDB4c3D.sdf.tar.gz | 2 GB |
Other | ||
GDBMedChem (SMILES) | GDBMedChem.smi | 353.6 MB |
GDBChEMBL (SMILES) | GDBChEMBL.smi | 276 MB |
GDB-13 random selection (1 million) | gdb13.rand1M.smi.gz | 7.2 MB |
Fragment-like subset (Rule of three) | gdb13.frl.tgz | 1.2 GB |
Dark matter universe up to 9 heavy atoms | dmu9.tgz | 87 MB |
GDB-11 | ||
Entire GDB-11 (including all C/N/O/F molecules) | gdb11.tgz | 122 MB |
Fragrance Like Subsets: For details please refer to Ruddigkeit et al. Journal of Cheminformatics 2014, 6:27 | ||
FragranceDB (SuperScent + Flavornet) | FragranceDB.smi | 56 KB |
TasteDB (SuperSweet + BitterDB) | TasteDB.smi | 44 KB |
FragranceDB.FL (Fragrance-like subset of FragranceDB) | FragranceDB.FL.smi | 32 KB |
ChEMBL.FL (Fragrance-like subset of ChEMBL) | ChEMBL.FL.smi | 452 KB |
PubChem.FL Fragrance-like subset of PubChem | PubChem.FL.smi | 20 MB |
ZINC.FL (Fragrance-like subset of ZINC) | ZINC.FL.smi | 1.3 MB |
GDB-13.FL (Fragrance-like subset of GDB-13) | GDB-13.FL.smi.gz | 165 MB |
Scaffold Hopping: There is a diversity driven fragment library (extracted from GDB-17) available that has been indexed for ReCore from BioSolveIT. | ||
3D Scaffold Hopping tool | ReCore | 165 MB |
Tagsfree Encoding System for Combinatorial Peptide Libraries
About
TAGSFREE is a program for designing split-and-mix peptide libraries that can be decoded by amino acid analysis. The analysis is independent of peptide topology (linear, branched, cyclic) and amino acid type (natural or non-natural, including beta-amino acids).
How to cite
Any work based on the TAGSFREE method must cite the following publications:
A General Method for Designing Combinatorial Peptide Libraries Decodable by Amino Acid Analysis. Kofoed J.; Reymond J.-L. J. Comb. Chem. 2007, 9, 1046-1052.
Identification of protease substrates by combinatorial profiling on tentagel beads. Kofoed J.; Reymond J.-L. Chem. Commun. 2007, 48, 4453-4455.