Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs

Show full item record

Title: Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs
Author: Childs, Kevin
Abstract: There are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes , phylogenetic comparison , expectation maximization and combinatorial . For convenience , the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence . Such motifs are referred to as (l , d ) -motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence . There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods . For some values of l and d , there will exist many sets of random words in a cluster of gene promoters that appear to form an (l , d ) -motif . For these motifs , it will be impossible to distinguish biological motifs from randomly generated motifs . A better formalization of motifs is the (l , f , d ) -motif that is derived from a biological consideration of motifs . The motivation for (l , f , d ) -motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant . It is shown that there exist (l , f , d ) -motifs that can be found in the promoters of gene clusters that would not be recognizable from random sequences if they were described as (l , d ) -motifs . The inclusion of the f -value in the definition of motifs suggests that the sequence space that is occupied by a motif will consist of a several clusters of closely related sequences . An algorithm , CM , has been developed that identifies small sets of overabundant sequences in the promoters from a cluster of genes and then combines these simple sets of sequences to form complex (l , f , d ) -motif models . A dataset from a yeast gene expression experiment is analyzed with CM . Known biological motifs and novel motifs are identified by CM . The performance of CM is compared to that of a popular expectation maximization algorithm , AlginACE , and to that from a simple combinatorial motif finding program .
URI: http : / /hdl .handle .net /1969 .1 /1351
Date: 2005-02-17

Citation

Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs. Available electronically from http : / /hdl .handle .net /1969 .1 /1351 .

Files in this item

Files Size Format View
etd-tamu-2004C-2-CPSC-Childs.pdf 1.717Mb application/pdf View/Open

This item appears in the following Collection(s)

Show full item record

Search DSpace

Advanced Search

Browse