P70A Long Insert Mate Pair Libraries for Assembling and Finishing Genomes
Monday, July 21, 2014
David A. Mead, Svetlana Jasinovica, Erin Ferguson, Brendon Keough, Amanda Krerowicz, Megan Niebauer, Michael Lodes and Scott Monsma, R&D, Lucigen Corporation, Middleton, WI
A complete genome sequence is needed to better understand the biology of an organism. Thousands of microbial genomes have been sequenced but rarely are they finished. Next generation DNA sequencing (NGS) instruments produce gigabases per run, but the short read lengths and small size of sequenced fragments result in gaps, misassembled contigs, collapsed repeats and missing sequences, leaving these regions to be finished manually, if at all. A technology that provides long range sequence linkage from short reads is needed for accurate, economical de novo assembly of genomes. A novel 20-50 kb mate pair library construction technology that improves sequence assembly has been developed, resulting in reduced numbers of contigs and superior scaffolding, enabling researchers to finish genomes at a lower cost and with higher quality data. This technology produces 90% efficient true mate pairs using encrypted sequence codes to distinguish chimeras. NGS libraries were constructed using a reference E. coli strain, and three sources lacking reference genomes: Thermus aquaticus and natural product pathway containing BACs. Without mate pair libraries the genome assemblies contained hundreds of contigs and the BAC assemblies contained dozens of contigs. The addition of mate pair data allowed accurate de novo assembly and finishing of the BACs and genomes, due to the 90% true mate pair library efficiency. High physical mate pair coverage of a genome can now be used for de novo assembly and economical finishing of genomes and BACs.