Validation of a novel expressed sequence tag (EST) clustering method and development of a phylogenetic annotation pipeline for livestock gene families

Date

2009-05-15

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Prediction of functions of genes in a genome is a key step in all genome sequencing projects. Sequences that carry out important functions are likely to be conserved between evolutionarily distant species and can be identified using cross-species comparisons. In the absence of completed genomes and the accompanying high-quality annotations, expressed sequence tags (ESTs) from random cDNA clones are the primary tools for functional genomics. EST datasets are fragmented and redundant, necessitating clustering of ESTs into groups that are likely to have been derived from the same genes. EST clustering helps reduce the search space for sequence homology searching and improves the accuracy of function predictions using EST datasets. This dissertation is a case study that describes clustering of Bos taurus and Sus scrofa EST datasets, and utilizes the EST clusters to make computational function predictions using a comparative genomics approach. We used a novel EST clustering method, TAMUClust, to cluster bovine ESTs and compare its performance to the bovine EST clusters from TIGR Gene Indices (TGI) by using bovine ESTs aligned to the bovine genome assembly as a gold standard. This comparison study reveals that TAMUClust and TGI are similar in performance. Comparisons of TAMUClust and TGI with predicted bovine gene models reveal that both datasets are similar in transcript coverage. We describe here the design and implementation of an annotation pipeline for predicting functions of the Bos taurus (cattle) and Sus scrofa (pig) transcriptomes. EST datasets were clustered into gene families using Ensembl protein family clusters as a framework. Following clustering, the EST consensus sequences were assigned predicted function by transferring annotations of the Ensembl vertebrate protein(s) they are grouped to after sequence homology searches and phylogenetic analysis. The annotations benefit the livestock community by helping narrow down the gamut of direct experiments needed to verify function.

Description

Citation