Algorithmic Developments for Sequence Analysis, Structure Modeling and Functional Prediction of Proteins

Date

2006-12-20

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Sequence, structure and function, being the three most important properties of proteins, are interrelated through homology relationships. In this post-genome era, we are equipped with abundant sequence information. Homology inference is thus of great practical importance because of its ability to make structural and functional predictions through sequence analysis. In an effort to explore and utilize the protein sequence-structure-function relationships, with homology detection and utilization as the central scheme, this work concentrates on algorithmic development of methods and systems for sequence similarity search, structure modeling and functional prediction purposes, as well as performs structure prediction and classification for specific protein families. Three algorithmic developments are described in this dissertation. First, to facilitate identification of structurally or functionally important interactions between positions in a protein family, a program has been developed to perform positional correlation analysis of multiple sequence alignments using different methods. The program has been shown to be useful to identify functionally important position pairs or networks of correlated positions. Second, to further increase the sensitivity of sequence similarity search methods in terms of homology detection and structure modeling ability, a method has been developed by incorporating predicted secondary structure information with sequence profiles. Evaluation on PFAM-based system shows that this method provides improved structure template detection ability and generates alignment of better quality. Third, in order to systematically assess the structure modeling abilities of different sequence similarity search programs, a comprehensive evaluation system has been developed. This large-scale automatic evaluation system assesses the fold recognition ability and alignment quality of different programs from global and local perspectives using both reference-dependent and reference-independent approaches, which provides an instrument to understand the progress and limitations of the field. Two structure prediction and classification projects using manual analysis and existing tools are also described in this dissertation. First, the structure of C-terminal domain of Gyrase A is predicted through inferred homology relationship with regulator of chromosome condensation (RCC1). This prediction has been validated by experimental data. Second, a hierarchical structure classification of thioredoxin-like fold proteins has been carried out, which promotes understanding of fold definitions and sequence-structure-function relationships

Description

Citation