S134: Annotation & misannotation of enzyme function in functionally diverse enzyme superfamilies

Wednesday, July 27, 2011: 9:00 AM
Bayside A, 4th fl (Sheraton New Orleans)
Patricia C. Babbitt, Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA
Correct annotation of molecular function for the huge number of sequences increasingly added to public databases is important for leveraging their use for many applications. We briefly review results of our previously published study quantitating misannotation of reaction specificity in the public databases Genbank NR, UniProtKB/TrEMBL, UniProtKB/Swiss-Prot, and KEGG for 37 model families from six different functionally diverse enzyme superfamilies.  While the SwissProt database shows consistently low levels of misannotation for these families, the other databases show similar and alarmingly high levels of misannotation for a large proportion of them. Reasons for these high levels of misannotation are discussed.

A new global analysis of sequence and structural relationships among homologous proteins from the Nucleophilic Attack 6-bladed β-Propeller (N6P) superfamily extends these studies on a larger scale, showing that ~500 proteins annotated in public databases as “putative strictosidine synthases” or “strictosidine synthase family proteins,” are unlikely to catalyze the strictosidine synthase reaction and more likely to catalyze hydrolytic reactions instead. This study illustrates how computational analysis of structure-function relationships in enzyme superfamilies can be applied for prediction of function and identification of misannotation for many enzymes, even when only a few members of the group have been biochemically or structurally characterized. Strictosidine synthases make natural products useful for the development of drugs targeting diseases that include cancer, malaria, and schizophrenia, so that discrimination of organisms that make strictosidine from those that do not offers useful guidance for experimental searches to identify new variants of this compound.