Introduction

This chapter aims at introducing fundamental knowledge in proteomic MS data processing. The increase in sensitivity, speed, and availability of high-performance proteomics technologies give researchers the possibility to generate large amounts of data. To provide useful scientific insight, these data must be properly processed though careful manual validation of each result is no longer possible at this scale. Consequently, software development, databases, and mathematical modeling are playing an important role in today data interpretation. Beside a general presentation of existing tools and well-established methods, we introduce basic concepts of data processing to give the readers the opportunity to learn about the tools underlying mechanisms. We consider top-down proteomics (analysis of enzymatic peptides) and MS/MS only (see Chapter 3) for a discussion of PMF, and we refer the reader to Chapters 3 and 27 for intact protein analyses.

Plant Proteomics: Technologies, Strategies, and Applications. Edited by G. K. Agrawal and R. Rakwal Copyright © 2008 John Wiley & Sons, Inc.

9.2 DATABASE SEARCHING Basic Notions

A natural way to identify proteins is by comparing experimental MS/MS data with protein sequences found in a database. We consider that the MS/MS data are already available as mass or peak lists—that is, as a sequence of experimental peptide masses and intensities accompanied by fragment masses and intensities as follows:

Peptide #1 mass, intensity fragment mass, intensity fragment mass, intensity Peptide #2 mass intensity fragment mass, intensity fragment mass, intensity Peptide #3...

For commodity, we refer to a mass list as a spectrum indifferently. The principle of database searching is the following: Each database protein sequence is digested theoretically (e.g., trypsin cleaves after lysine or arginine unless they are followed by proline); the experimental data are searched with these theoretical peptides masses; if one experimental peptide mass is found, then a theoretical fragmentation spectrum is computed from the peptide sequence and compared with the experimental fragmentation data; a scoring function determines the correlation between experimental and theoretical masses. The theoretical spectrum is the set of all possible fragment masses for a given instrument type; these masses can be computed by applying simple rules to the peptide sequences. The search ends by reporting every matched peptide with its highest score and by grouping the peptide matches by proteins (see Figure 9.1). Obviously, the quality of the scoring function plays an important role.

In practice, an MS instrument does not measure masses but measures m/z; and depending on the instrument resolution, the charge z may be unknown. In this case the search algorithm must consider all possible charges within a certain range (z = 1, 2, 3, 4) for an ESI instrument typically, and each experimental m/z is considered as several possible masses.

Protein modifications, if present, result in peptide modifications after digestion, which change amino acid masses and hence peptide and fragment masses. To search for modified peptides, it is necessary to modify mass computations. Fixed modification (e.g., carboxyamidomethyl cysteines), does not really require additional work, because a mass is substituted for another mass. On the contrary, variable modifications cause additional computations. For instance, we may want to consider Met oxidation as a variable; and, consequently, a peptide ATMIQWMK would yield three peptide masses (0, 1, or 2 oxidations) and four theoretical fragmentation spectra (fragments masses are different, depending on which methionine is oxidized).

Database Theoretical sequences peptides Spectra

Database Theoretical sequences peptides Spectra

FIGURE 9.1. Scheme of a database search. The database sequences are digested theoretically, and each peptide is matched with 0, 1, or more spectra. Each match is given a score (number along the arrows). Most of the database sequences have no matching peptide, and many spectra are not matched. It is possible that a peptide matches several spectra (last peptide of d4) or a spectrum matches several peptides (s4). The peptide with the highest score is retained for each spectrum, and the spectrum yielding the highest score is retained for each peptide. The retained matches are grouped by database sequence and reported at the end of the search (lower left).

FIGURE 9.1. Scheme of a database search. The database sequences are digested theoretically, and each peptide is matched with 0, 1, or more spectra. Each match is given a score (number along the arrows). Most of the database sequences have no matching peptide, and many spectra are not matched. It is possible that a peptide matches several spectra (last peptide of d4) or a spectrum matches several peptides (s4). The peptide with the highest score is retained for each spectrum, and the spectrum yielding the highest score is retained for each peptide. The retained matches are grouped by database sequence and reported at the end of the search (lower left).

The last stage of the database search algorithm is the grouping of peptide identifications in protein identifications. Although straightforward in principle, this final step requires some care because it is possible that distinct proteins share some peptides. Depending on which peptides were identified, it may not be possible to distinguish one protein from another one on the basis of the detected peptides [1].

Was this article helpful?

0 0
How To Win Your War Against Allergies

How To Win Your War Against Allergies

Not Able To Lead A Happy Life Because Of Excessive Allergies? Want To Badly Get Rid Of Your Allergy Problems, But Are Super Confused And Not Sure Where To Even Start? Don't Worry, Help Is Just Around The Corner Revealed The All-In-One Power Packed Manual Containing Ample Strategies And Little-Known Tips To Get Rid Of Any Allergy Problems That Are Ruining Your Life Learn How You Can Eliminate Allergies Completely Reclaim Your Life Once Again

Get My Free Ebook


Post a comment