De Novo Sequencing

The database-dependent peptide identification strategy does not allow the identification of unexpected PTMs or unusual peptides that are not represented in protein databases. In some cases it may therefore be necessary to extract an amino acid sequence from an MS/MS spectrum in a fully database-independent manner. This is referred to as de novo peptide sequencing (Figure 3.2). Manual de novo sequencing is a tedious task that requires expertise and patience, and therefore several computer programs were developed that allow an automated assessment of a peptide's amino acids sequence. All de novo sequencing tools exclusively use the information in the MS/MS spectrum to derive an amino acid sequence. Several tools are now available and provide reliable de novo sequencing results with high-quality spectra [22-29]. All tools, however, suffer from inherent properties of MS/MS spectra that are often characterized by inaccurate measurements, missing peaks (gaps), and noise. Up-to-date software tools provide a probability estimate whether the extracted amino acid sequence is correct and may additionally assign probabilities to sequence substrings. The best-performing tools use probabilistic approaches and include, for example, PepNovo [28], PEAKS [24], and the Novo hidden Markov model (HMM) [27]. PepNovo is freely available and uses a probabilistic network whose structure reflects the physicochemical characteristics of peptide fragmentation in CID. PEAKS is a commercial software that computes the best possible peptide sequence for MS/MS spectra and provides confidence scores for amino acids in the sequence. Novo HMM generates a model that calculates emission probabilities for the suggested amino acid sequence from the observed spectrum. The last few years have seen significant improvements in MS methods and particularly in software tools to improve MS/MS spectrum analysis. These developments have now paved the way for the acquisition of reliable, high-quality peptide sequences in a database-independent way, which opens the road for the proteome analysis of organisms with unsequenced genomes.

