Comparison Results of MGAlign with Sim4 and Spidey


The dataset contains of 936 annotated mRNA sequences found on the Human Chromosome 22 as annotated by the Chromosome 22 Gene Annotation Group at Sanger Institute obtained from the World Wide Web at http://www.sanger.ac.uk/HGP/Chr22 (Collins et al., 2003). There are a total of 5176 annotated exons in this set of 936 mRNAs. These 936 mRNA sequences were aligned back to the 48Mbp long Human Chromosome 22 genomic sequence (Collins et al., 2003).

A copy of the dataset used for the comparison is provided below:

The mRNA files are numbered from 1 to 936. Files with an extension "fasta" contains the FASTA sequences of the mRNAs with the accession number as the header. The corresponding numbered file with an extension "cds" contains the annotations as provided by Sanger. The first line of this file indicates the orientation of the mRNA as compared to the genomic sequence and subsequent lines show the positions of the exons. The start and end position on each line is separated by a dash.


All programs were used at their default settings. Note that there are a total of 5176 annotated exons.

Programs Predicted Exons Correct Exons Wrong Exons Missing Exons Misaligned Exons False Negative True Positive Percentage of Correct Exons Total time (hours)
MGAlign 5175 5134 0 1 41 0.02% 99.21% 99.19% 3.35
Sim4 5157 5080 3 22 74 0.43% 98.51% 98.15% 7.99
Spidey 7377 4490 2822 621 65 12.00% 60.86% 86.75% 7.89

The comparisons were done on a Duron 850MHz with 512MB of RAM running Slackware Linux 9.0.

Location of small exons

By utilizing a small wordsize and search using a limited search space, MGAlign was able to determine all but one exon (6bp) that was too small to be detected even with the small wordsize. Examples of small exon location are presented below (click on image for full-sized version):

(A) Locating small internal exons. Schematic diagram of the alignment between mRNA sequence (Sanger Institute accession no. AC000026.C22.2.mRNA) and human chromosome 22 genomic sequence showing only exonic regions 1-3 and 8-10. The alignments generated by MGAlign, Spidey and Sim4 are shown below the genomic sequence. (B) Locating small terminal exons. Schematic diagram of the alignment between mRNA sequence (accession no. bK440B3.1) and human chromosome 22 genomic sequence showing all exons.


