Comparison Results of MGAlign with Sim4 and Spidey
The dataset contains of 936 annotated mRNA sequences found on the Human
Chromosome 22 as annotated by the Chromosome 22 Gene Annotation
Group at Sanger Institute obtained from the World Wide Web
(Collins et al., 2003). There are a total of 5176 annotated exons
in this set of 936 mRNAs. These 936 mRNA sequences were aligned
back to the 48Mbp
long Human Chromosome 22 genomic sequence (Collins et al., 2003).
A copy of the dataset used for the comparison is provided below:
The mRNA files are numbered from 1 to 936. Files with an extension "fasta" contains the FASTA sequences of the mRNAs with the accession number as the header. The corresponding numbered file with an extension "cds" contains the annotations as provided by Sanger. The first line of this file indicates the orientation of the mRNA as compared to the genomic sequence and subsequent lines show the positions of the exons. The start and end position on each line is separated by a dash.
All programs were used at their default settings. Note that there are a
total of 5176 annotated exons.
The comparisons were done on a Duron 850MHz with 512MB of RAM running Slackware Linux 9.0.
Location of small exons
By utilizing a small wordsize and search using a limited search space,
MGAlign was able to determine all but one exon (6bp) that was too
small to be detected even with the small wordsize. Examples of
small exon location are presented below (click on image for full-sized
(A) Locating small internal exons. Schematic diagram of the alignment
between mRNA sequence (Sanger Institute accession no. AC000026.C22.2.mRNA)
and human chromosome 22 genomic sequence showing only exonic regions
1-3 and 8-10. The alignments generated by MGAlign, Spidey and Sim4
are shown below the genomic sequence. (B) Locating small terminal
diagram of the alignment between mRNA sequence (accession no. bK440B3.1)
and human chromosome 22 genomic sequence showing all exons.