P95 BIRD: a comprehensive database for the storage and manipulation of genetic information and natural products
Sunday, January 11, 2015
California Ballroom C and Santa Fe Room
Phil Rees and Chris Dejong, Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON
Translating genomes to natural products requires the manipulation and processing of large amounts of information. Decreasing costs of sequencing have led to an exponential increase in the amount of data available to be processed. In order to leverage this information, it is crucial to be able to relate genetic data to the chemical space of known natural products. BIRD is a database that stores all known natural products to date, and is an extensible framework for the storage and processing of genetic information. BIRD implements a NoSQL graph based database management system, Neo4j. BIRD contains all known microbial natural products (approximately 46,000 compounds); over 500 known genetic biosynthetic gene clusters, each associated with its corresponding natural product; other large sets of genetic sequence information, including publicly available genomes and ribosomal 16S sequences; and the predictions from several software applications that we have developed to facilitate the connection of genomes to small molecules, including results from both the PRISM and GRAPE software applications. One distinguishing aspect of this information and data structure is that it is fully extensible to include new information and form new relationships between data points that is simply not possible with a SQL database.The BIRD database is a novel data structure containing an exhaustive resource of natural products that will allow for genomic data to be leveraged in the discovery of natural products.