Home | Statistics | Download | Search | Predict | References | Guide | Contact

News

  • SPdb team is presenting at the 1st SYMBIO2008 symposium, Singapore on 1st Aug 2008
  • SPdb 5.1 updates released.
  • SPdb 5.0 is now available (Swiss-Prot Release 55.0; EMBL Release 93).
  • SPdb 4.0 is now available.
  • SPdb 3.2 is now available. This release employs updated filtering techniques and is manually curated. Several new features have been included :
  • SPdb 3.1 is now available. This release contains updates from EMBL.
  • SPdb 3.0 is now available. This new release integrates data from EMBL.
  • Jan 05 - SPD is now SPdb. This is to reflect more accurately the name of the database. We will continue to develop and update the database.
  • SPD 2.1 is now available. More features of the signal peptide have been included and presented in graph format. Users can now view the hydropathy plot, amino acid residue properties.
  • SPD 2.0 is now available. This release is in conjunction with the release of Swiss-Prot Release 45(25th October 2004).
  • SPD team is presenting a poster at the joint conference of the 12th International Conference on Intelligent Systems for Molecular Biology (ISMB) and 3rd European Conference on Computational Biology (ECCB) at Glasgow, Scotland, UK from 31st Jul 2004 to 4th Aug 2004.

Introduction

SPdb is a signal peptide database containing signal sequences of archaea, prokaryotes and eukaryotes. The signal-associated data is stored in a MySQL relational database and provided as DNA and protein sequences. FASTA-formatted files containing the sequences are available for download.

This database currently is at release 5.1 and contains 27433 entries, of which 2512 are experimentally verified signal sequences (obtained by filtering the data, followed by manual curation where the mature endogenous proteins are sequenced on their N-terminal) and 24921 are unverified signal sequences. All sequences were derived from Swiss-Prot protein database (release 55.0) which is part of Uniprot. The nucleotide sequences were obtained from EMBL nucleotide database (release 93).

TrEMBL entries are not included.


Citation
If you are using this database or downloaded the datasets, please cite :
Choo KH, Tan TW, Ranganathan S. 2005. SPdb - a signal peptide database. BMC Bioinformatics 6:249


Versioning

SPdb is released as a major update e.g. from 2.x to 3.x when there is a major release of Swiss-Prot.
In between the major releases, SPdb may have intermittent add-ons or improvements, leading to minor releases marked as n.1, n.2, n.3 etc..


Method

The preliminary entries in SPdb are extracted using the methodology proposed by Nielsen et al.. Briefly, entries containing the keyword SIGNAL under the Feature Table in Swiss-Prot were considered. (a) Phage genes were excluded.(b) Entries with uncertain or unspecified start or end positions of signal sequence were excluded. (c) Signal sequence not experimentally determined (POTENTIAL, PROBABLE, BY SIMILARITY) (see Annotation Description From Swiss-Prot) were excluded. (d) Proteins encoded by non-nuclear genes were excluded.


We have adapted this methodology by adding our own rules and criteria for filtering. Since our goal is to build a repository of experimentally verified signal peptides, we want to be inclusive but we clearly label the entries appropriately. Thus, all entries mentioned in (a) to (e) above were retained in the unfiltered or unverified set, pending experimental verification by N-terminal sequencing.

Entries excluded from Swiss-Prot :
-Entries with words NOT CLEAVED, IN SOME ISOFORM, IN ISOFORM LONG, OR ?? IN SOME/(x% OF THE) MOLECULES found in field FT
-Entries with signal sequences less than length 11
-Entries with words HYPOTHETICAL found in field DE where no similarity was observed to other existing entries
-Tat-containing signal peptides
-Mitochondria and chloroplast transit peptides

After parsing the database entries in Swiss-Prot, the nucleotides sequences from EMBL database are integrated with the filtered sequences from Swiss-Prot. Only sequences of fungi, human, invertebrate, mouse, organelle, bacteriophage, plant, prokaryote, rodent, viral, mammals and vertebrate sequences according to the categorization of the data as specified in the release note of EMBL database are considered.

SPs of type I and II (lipoproteins) membrane proteins are present in SPdb. SPs of latter can be filtered off using the search tool provided.

Entries excluded from EMBL :
-Entries belonging to the groups of data : EST (Expressed sequence tag), GSS (Genome Survey Sequences), HTG (High-Throughput Genome Sequences, unfinished DNA sequences generated by high-throughput sequencing), PAT (Patent Sequences), STS Sequences, synthetic sequences, contig sequences, unclassified.

Entries with complemented CDS, entries with uncertainties: symbols like '>' or '<' quoted in the number range and entries with more than 1 CDS quoted are all marked for manual curation.

The experimentally verified entries from Zhang & Henzel were checked manually to ensure that the entries (if available from Swiss-Prot) are annotated and made available in SPdb.

NOTE :
Abbreviations used: FT = Features table  KW = Keyword (for a quick reference, click here).

For the full documentation, please refer to the user manual of Swiss-Prot.


For more information on this site/database, please contact us.

 

© Copyright 2005-2009 Department of Biochemistry, National University of Singapore. All Rights Reserved. Disclaimer