Predicting the C-Terminal Amino Acid of a Peptide from MS/MS Data


Proteomics investigates the complement of a genome. It is currently mainly based on mass spectrometry (MS) which is the tool of choice to investigate proteins. Two computational approaches to derive the tandem mass spectrum precursor’s sequence are widely employed. Database search essentially retrieves the sequence by matching the spectrum to all entries in a database whereas de novo sequencing does not depend on a sequence database. Both approaches generally benefit from knowledge about the enzyme that was used to generate the measured peptides. Most algorithms default to trypsin since it is abundantly used. Trypsin cuts after arginine and lysine and thus the c-terminal amino acid is not known precisely. Furthermore, 90% of protein terminal peptides may not end with either arginine or lysine and are thus not conformant with the algorithm’s assumptions. Here an algorithm, named RKDecider, to sort the c-terminal amino acid into one of three groups (arginine, lysine, and other) is presented. Although around 90% accuracy was achieved during data mining spectra for rules that determine the c-terminal amino acid, RKDecider’s accuracy is a little less and achieves about 80%.


Click to download RKDecider

How To

The RKDecider is a JAVA based console application and therefore needs a recent installation of the JAVA Virtual Machine. There are only two parameters that can be given to RKDecider:

-s which is the absolute or relative path to the input file, containing the MS/MS spectra.
-f the desired fragment tolerance for the analysis (default: 0.3).

Example Usage: java -jar -s measurements.mgf

As a result three output mgf files will be created: One containing the spectra which probably are from peptides that terminate with arginine (R). One containing the spectra which probably are from peptides that terminate with lysine (K). And the final one which is undecided or results from non-tryptic cleavage.