We recently combined antiSMASH with a new generic algorithm that probabilistically identifies both known and unknown types of secondary metabolite biosynthesis gene clusters. This allowed us to perform a global quantitative and comparative analysis of biosynthetic gene clusters throughout the microbial tree of life. The resulting bird’s eye perspective on microbial secondary metabolite biosynthesis lead to the unearthing of major novel classes of molecules; thus, it shows that a computational approach can radically transform the way in which novel natural products are discovered. Intriguingly, a systematic computational analysis of the evolution of biosynthetic gene clusters showed that new biosynthetic complexity continuously evolves in nature, through the recombination of functionally interconnected groups of genes (sub-clusters) and high rates of insertions, deletions and duplications.
The rapid developments in the field are increasing the speed with which gene clusters are being discovered. To exploit the riches of information that are being generated, it will be essential that contextual data on these gene clusters is stored in a consistent fashion. Therefore, we launched a community data standard, the Minimum Information about a Biosynthetic Gene cluster (MIBiG), which allows for the integration of chemical, genomic and ecological data. This will be foundational to mapping connections between genes, chemical structures and environments, which will shed new light on how natural product chemical diversity evolves.