We have developed a number of programs to aid in the Protein Sequence Analysis And Modelling (PSAAM). The package contains a main program (PSAAM), and several ancillary programs (INFORMAT, calculates mean sequence profiles and mutability moment (conservation moment) from aligned sequences; FINDSEQ, will find sequence strings (structural motifs, binding domains, etc.) in a sequence file, or a directory of sequence files; GENBANK, will extract sequence data from a file saved from Entrez).
The main PSAAM program includes many useful features for analysis of structural information from sequence data, cartoons for graphical representation of secondary structures, prediction programs, and prediction aids, a ribosome function for generation of coordinates sets in PDB format (including Header information) for viewing by molecular graphics packages (examples, see also PDVWIN), and a SeqPlot function for plotting secondary structural models (helical cylinders and wheels, coils, etc., with each residue indicated by a circle, color-coded (or coded by line thickness) according to the current physico-chemical index. This latter function can be used to make "helical cylinder models".
Analysis of sequences is through generation of profiles. The program calculates and plots the following types of profile:
The software package PSAAM is available for download as a ZIP file:
Download PSAAM here. All necessary files are contained in psaam.zip. Read the Readme.txt file for installation details. Read lpsagree.txt for details on registration.
For further information, see the PSAAM Manual.
Files in the Brookhaven Protein Data Bank for protein structures with a resolution better than 2.5 angstroms were analyzed using a modified Kabsch and Sander program. The secondary structures were classified as surface, buried, exposed or indeterminate, using an algorithm in which the axis of the structure was determined, and the vector of exposure to the aqueous phase calculated along the axis at each atom in the polypedtide backbone. This made it possible to recognize amphipathic structures; buried structures could be readily identified as those which showed no amphipathic character, and no exposure; structures were classified as exposed if they showed no amphipathy, but a high exposure. Structures were classified as indeterminate if they did not fall into one of the other three classes.
This new classification, and the new set of probability parameters, addresses a deficiency in the classical approach to structural prediction (in which the structural classification provided in the preamble to the data set in the .PDB files is used as the "target" for calculation of probability parameters). The classical approach fails to recognize that the partitioning of residue type to secondary structure is modulated (strongly) by the location in the protein,- buried structures are hydrophobic, exposed residues hydrophilic,- so that the probability parameters for surface and buried secondary structures of the same type are quite different.
Amino AA Structural potential factors
acid counts
{-------Sheet---------} {--------Helix-------} {----Turns-----} Hbnd Coil
surf burd udif expd udif surf burd expd 3-10 -Hbd H-bd resd resd
Ala 1713 0.68 0.84 0.82 0.86 1.76 1.48 1.70 1.87 1.14 0.62 0.88 0.52 0.87
Arg 669 1.34 0.19 1.21 0.34 1.12 1.10 0.86 0.85 0.96 1.14 1.04 1.09 0.91
Asn 924 0.63 0.40 0.46 0.49 1.11 0.91 0.69 1.84 1.11 1.30 1.44 1.88 1.11
Asp 1047 0.43 0.30 0.64 0.54 1.18 0.92 0.39 1.98 1.27 1.44 1.19 1.23 1.36
Cys 512 0.85 1.77 1.58 0.22 1.07 0.75 1.52 0.74 1.22 0.44 0.58 1.20 1.40
Glu 910 1.00 0.22 0.50 0.62 1.20 1.55 0.59 1.04 1.61 0.82 1.00 0.62 0.79
Gln 680 1.21 0.52 1.34 1.00 0.40 1.32 0.35 1.11 1.04 0.94 0.94 0.66 0.88
Gly 1733 0.52 1.07 0.59 0.85 0.20 0.39 0.82 0.33 0.59 1.96 2.27 0.78 0.93
His 469 0.67 0.60 1.27 1.21 1.60 1.21 0.79 1.21 1.24 1.12 0.73 0.84 1.11
Ile 942 1.55 2.08 1.94 1.92 1.16 0.99 1.15 0.40 0.64 0.51 0.30 1.31 0.86
Leu 1497 1.11 1.55 1.10 0.23 1.19 1.31 2.10 0.88 0.73 0.50 0.59 0.56 0.88
Lys 1163 1.08 0.30 0.24 1.46 0.71 1.42 0.32 0.97 1.18 1.04 1.09 1.59 0.80
Met 313 1.16 1.85 1.57 3.25 1.53 1.12 1.62 2.41 0.51 0.58 0.54 1.07 0.76
Phe 694 1.22 1.42 1.52 0.81 1.18 1.12 1.51 0.81 1.04 0.79 0.67 0.73 0.79
Pro 866 0.51 0.24 0.57 0.39 0.40 0.53 0.39 0.22 1.63 1.26 1.51 0.45 1.70
Ser 1464 1.06 0.84 0.65 1.62 0.47 0.59 0.72 0.90 1.28 1.38 1.29 0.88 1.13
Thr 1201 1.48 0.89 1.05 1.22 0.68 0.76 0.93 0.94 0.64 1.14 0.74 1.12 1.18
Trp 299 0.91 1.51 1.65 1.89 1.60 1.15 0.79 0.63 1.21 1.43 0.59 0.94 0.62
Tyr 696 1.16 1.42 1.92 1.62 1.47 0.75 0.49 1.35 1.30 0.71 0.69 2.01 1.07
Val 1433 1.60 2.26 1.59 1.26 1.29 1.01 1.49 0.39 0.44 0.41 0.36 1.21 0.83
Totals 2335 1361 547 170 281 4124 569 102 956 1797 2519 343 4121
(% of 19225) 12.1 7.1 2.8 0.9 1.5 21.5 3.0 0.5 5.0 9.3 13.1 1.8 21.4
The Walsh-Crofts indices are available as part of the PSAAM package.