Structural prediction


PSAAM

We have developed a number of programs to aid in the Protein Sequence Analysis And Modelling (PSAAM). The package contains a main program (PSAAM), and several ancillary programs (INFORMAT, calculates mean sequence profiles and mutability moment (conservation moment) from aligned sequences; FINDSEQ, will find sequence strings (structural motifs, binding domains, etc.) in a sequence file, or a directory of sequence files; GENBANK, will extract sequence data from a file saved from Entrez).

The main PSAAM program includes many useful features for analysis of structural information from sequence data, cartoons for graphical representation of secondary structures, prediction programs, and prediction aids, a ribosome function for generation of coordinates sets in PDB format (including Header information) for viewing by molecular graphics packages (examples, see also PDVWIN), and a SeqPlot function for plotting secondary structural models (helical cylinders and wheels, coils, etc., with each residue indicated by a circle, color-coded (or coded by line thickness) according to the current physico-chemical index. This latter function can be used to make "helical cylinder models".

Analysis of sequences is through generation of profiles. The program calculates and plots the following types of profile:

The Profiles Window also contains a structural cartoon showing the current secondary structural model, and allows editing of the structural model.

The software package PSAAM is available for download as a ZIP file:

Download PSAAM here. All necessary files are contained in psaam.zip. Read the Readme.txt file for installation details. Read lpsagree.txt for details on registration.

For further information, see the PSAAM Manual.

Walsh-Crofts Structural Propensity Indices

A set of indices has been generated which are probability parameters of the type calculated by Chou and Fasman, but using a new classification of the structural data base (Laura Walsh and Antony Crofts, in preparation). The new classification has taken account of the fact that buried structures are different in composition from surface structures.

Files in the Brookhaven Protein Data Bank for protein structures with a resolution better than 2.5 angstroms were analyzed using a modified Kabsch and Sander program. The secondary structures were classified as surface, buried, exposed or indeterminate, using an algorithm in which the axis of the structure was determined, and the vector of exposure to the aqueous phase calculated along the axis at each atom in the polypedtide backbone. This made it possible to recognize amphipathic structures; buried structures could be readily identified as those which showed no amphipathic character, and no exposure; structures were classified as exposed if they showed no amphipathy, but a high exposure. Structures were classified as indeterminate if they did not fall into one of the other three classes.

This new classification, and the new set of probability parameters, addresses a deficiency in the classical approach to structural prediction (in which the structural classification provided in the preamble to the data set in the .PDB files is used as the "target" for calculation of probability parameters). The classical approach fails to recognize that the partitioning of residue type to secondary structure is modulated (strongly) by the location in the protein,- buried structures are hydrophobic, exposed residues hydrophilic,- so that the probability parameters for surface and buried secondary structures of the same type are quite different.

Normalized Probability Coefficients for distribution of amino acids into secondary structures

Amino AA                 Structural potential factors
acid  counts
      		{-------Sheet---------}  {--------Helix-------}  {----Turns-----}  Hbnd  Coil
        	 surf  burd  udif  expd  udif  surf  burd  expd  3-10  -Hbd  H-bd  resd  resd
Ala	1713	 0.68  0.84  0.82  0.86  1.76  1.48  1.70  1.87  1.14  0.62  0.88  0.52  0.87
Arg	 669	 1.34  0.19  1.21  0.34  1.12  1.10  0.86  0.85  0.96  1.14  1.04  1.09  0.91
Asn	 924	 0.63  0.40  0.46  0.49  1.11  0.91  0.69  1.84  1.11  1.30  1.44  1.88  1.11
Asp	1047	 0.43  0.30  0.64  0.54  1.18  0.92  0.39  1.98  1.27  1.44  1.19  1.23  1.36
Cys	 512	 0.85  1.77  1.58  0.22  1.07  0.75  1.52  0.74  1.22  0.44  0.58  1.20  1.40
Glu	 910	 1.00  0.22  0.50  0.62  1.20  1.55  0.59  1.04  1.61  0.82  1.00  0.62  0.79
Gln	 680	 1.21  0.52  1.34  1.00  0.40  1.32  0.35  1.11  1.04  0.94  0.94  0.66  0.88
Gly	1733	 0.52  1.07  0.59  0.85  0.20  0.39  0.82  0.33  0.59  1.96  2.27  0.78  0.93
His	 469	 0.67  0.60  1.27  1.21  1.60  1.21  0.79  1.21  1.24  1.12  0.73  0.84  1.11
Ile	 942	 1.55  2.08  1.94  1.92  1.16  0.99  1.15  0.40  0.64  0.51  0.30  1.31  0.86
Leu	1497	 1.11  1.55  1.10  0.23  1.19  1.31  2.10  0.88  0.73  0.50  0.59  0.56  0.88
Lys	1163	 1.08  0.30  0.24  1.46  0.71  1.42  0.32  0.97  1.18  1.04  1.09  1.59  0.80
Met	 313	 1.16  1.85  1.57  3.25  1.53  1.12  1.62  2.41  0.51  0.58  0.54  1.07  0.76
Phe	 694	 1.22  1.42  1.52  0.81  1.18  1.12  1.51  0.81  1.04  0.79  0.67  0.73  0.79
Pro	 866	 0.51  0.24  0.57  0.39  0.40  0.53  0.39  0.22  1.63  1.26  1.51  0.45  1.70
Ser	1464	 1.06  0.84  0.65  1.62  0.47  0.59  0.72  0.90  1.28  1.38  1.29  0.88  1.13
Thr	1201	 1.48  0.89  1.05  1.22  0.68  0.76  0.93  0.94  0.64  1.14  0.74  1.12  1.18
Trp	 299	 0.91  1.51  1.65  1.89  1.60  1.15  0.79  0.63  1.21  1.43  0.59  0.94  0.62
Tyr	 696	 1.16  1.42  1.92  1.62  1.47  0.75  0.49  1.35  1.30  0.71  0.69  2.01  1.07
Val	1433	 1.60  2.26  1.59  1.26  1.29  1.01  1.49  0.39  0.44  0.41  0.36  1.21  0.83
                     
Totals	 	2335  1361   547   170   281  4124   569   102   956  1797  2519   343  4121
(% of 19225)	 12.1   7.1   2.8   0.9   1.5  21.5   3.0   0.5   5.0   9.3  13.1   1.8  21.4
The Walsh-Crofts indices are available as part of the PSAAM package.