One early goal after getting a draft or finished genome sequence is to explore the sequence for its particular set of genes and transcribed genomic sequences (Windsor and Mitchell-Olds 2006). Gene sets can be generated either entirely computationally or by a combination of computational and manual annotation. While the former is useful in generating preliminary gene sets, manual annotation, though laborious to perform, is still necessary for ensuring accuracy and completeness of gene sets. In this regard, different gene prediction programs, including EAnnot (Electronic Annotation) offers an efficient way to generate an automated gene set. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals (Ding et al. 2004).
Translation of genetic information from model organisms is extremely useful in annotating conserved regions of other genomes. In angiosperms, sequencing of the Arabidopsis (Arabidopsis thaliana), rice, papaya, sorghum, poplar, and grape genomes has provided a foundation for accelerating prediction of gene families and identification of novel genes in other plant taxa. Different computational tools are available for analyzing and clustering the genetic information into various functional segments (gene families, TE, etc.) based in part on information from such reference genomes.
Was this article helpful?