pSAAM


Protein Sequence Analysis and Modelling FOR WINDOWS

COPYRIGHT, 1987-1994 UNIVERSITY OF ILLINOIS
BY A.R. CROFTS

INTRODUCTION

The program pSAAM for Windows© is derived from the program AMPHI, which is a component of the SEQANAL package written by A.R. Crofts to assist in analysis of protein sequences for information about secondary and tertiary structure(15). This new program expands considerably the options available for Protein Sequence Analysis and Modelling, and provides a more user-friendly environment. The program is menu driven and mouse activated. Most functions are fairly intuitive, and a series of on screen Help menus allows most options to be used easily without reference to the manual. This document contains all the information in the Help menus, and expanded information on installation, file format, and support packages.

PSAAM contains all the functions found in AMPHI, and will replace AMPHI in future versions of the SEQANAL package.

Several new features have been added.

The program brings to the Windows© environment a self contained package for analysis of protein sequence data. Companion programs in the pSAAM package are INFORMAT and FINDSEQ. The programs have been written to allow compatability with the file formats used by the ENTREZ interface with GenBank, and the MACAW program for sequence alignment, both provided by NCBI (National Center for Biotechnology Information). Together with pdViewer for Windows©, the program also provides a convenient tool for examination of protein structure and physical chemistry, and for structural prediction, and an environment for building two or three dimensional models of structures derived from analysis of sequence data, prediction, and comparison with known structures.

INSTALLATION

The program reads in data from several ancillary files at run-time. The location of these files is specified in the file psaam.ini. The program expects to find psaam.ini in the ROOT directory (c:\ if you are using the C: disk). If you want to use your own setup, you should modify the entries in this file to show the revised path(s), and make sure that psaam.ini is in the appropriate ROOT directory.

For a "standard " installation, proceed as follows:

Change directory to the ROOT directory on your C: disk

Make a sub-directory called C:\PSAAM

Mount the distribution diskette in an approriate drive.

Copy all files to C:\PSAAM

The following is an alphabetical list of necessary files, but additional items might included in updated versions.

  • aacharge.dat
  • acol.dat
  • allprfls.usr
  • cfpostrn.pfl
  • jmmsconv.dat
  • mpredict.dat
  • profiles.e3d
  • psaam.exe
  • psaam.ico
  • psaam.ini
  • stdposaa.crd
  • vbrun300.dll
  • You should also copy the files:

  • findseq.exe
  • findseq.ini
  • srchstrg.lib
  • informat.exe
  • psaaman.wri
  • fndsqman.wri
  • infrmman.wri
  • readme.txt
  • sysreq.txt
  • conditns.txt
  • genbank.exe
  • Copy psaam.ini to your ROOT directory (C:\), and copy vbrun300.dll to your WINDOWS\SYSTEM diractory (C:\WINDOWS\SYSTEM).

    Return to Program Manager, and select a group window in which you want to show the program icon.

  • Click on File in the Program Manager Window
  • Click on New
  • Click on OK
  • For Windows 3.0:

  • Chose a title to show with the icon (eg. pSAAM), and type in.
  • Program name is C:\psaam\psaam.exe; type in the box provided.
  • Click on Icon.
  • Type "C:\psaam\psaam.ico" as name of icon file.
  • Click on View Next.
  • Click on OK, Click on OK again to return to Program Manager.
  • For Windows 3.1

    Chose a title to show with the icon (eg. pSAAM); type in Description text box.

    Type program name and path (C:\psaam\psaam.exe) in Command Line box

    In Working Directory box, type in the path and name of the directory were you keep your data files.

    Click on Change Icon

    Either type in c:\psaam\psaam.ico to use icon supplied, or use the Browse function to find that file, or an alternative icon if you prefer.

    Click on OK at each window to return to Program Manager

    Program should now be installed. Click on icon to run.

    Use a similar proceedure to install findseq.exe (with findseq.ini in c:\), and informat.exe.

    The library file vbrun200.dll should be in the DOS path, conventionally in the \WINDOWS\SYSTEM directory. If Windows can't find the vbrun200.dll file, move it to the directory you specified in Working Directory, or change Working Directory to c:\windows\system, or to c:\psaam.

    Manuals

    Manuals are provided for each program as psaaman.wri, fndsqman.wri, and infrmman.wri.

    You can use the Write program (which comes with Windows), and these manual.wri files as an on-line manuals. If you have a printer, you can print the manuals out from Write.

    OVERVIEW OF PROGRAM OPERATIONS

    The program starts in a Profiles Window. Other windows for Cartoons, File I/O, Prediction Aids, Modelling, Ribosome, and Hard Copy are accessible from the Profiles window. Before using any of the functions, the program needs a sequence of amino acids. Sequence in-formation, together with additional information about the structure or physical chemistry of the polypeptide chain, depending on the file format chosen, must first be read into the program from a file.

    Summary of Menu Options.

    Input and output of files is handled by a File I/O window which pops up when you click on Files in the menu bar. Sequence data can also be loaded from a text file (for example, a Gene Bank file), using a Gene Bank Conversion window. If you try to use a function that needs a sequence before reading data in, the program prompts you for a file.

    Profiles are generated, stored, unsaved, etc. through options which drop-down when you click on Profiles. The parameters used in calculations are set at initial default values, but can be readily changed using the drop-down lists accessed by clicking on the labelled list boxes.

    Clicking on Imported drops down a menu allowing import and use of sequence index files.

    Cartoons of structure, helical wheels and helical cylinders can be displayed through the Cartoons menu. A Structural Cartoon which appears in the Cartoon display area provides a visual summary of the present structure. Information at the sequence level can be shown in the Seq Info area of the window, and the structure edited through an interactive edit function.

    Click on Cls to Clear the screen, and reset arrays used to store data for printer plots.

    Prediction Aids brings up a menu of prediction aids, including a structural prediction module based on the Robson-Garnier Predict program, with membrane helix option.

    Hardcopy allows access to Plotting and Modelling routines, including Profile plots, Sequence Models, and Ribosome facility.

    Redraw toggles Autoredraw function.

    End closes all windows, and exits program.

    DATA INPUT AND OUTPUT
    File I/O
  • Input and output of data files is handled thr
  • ough a File I/O window, which pops up when you click on Files under the Input/Output Menu in the Profiles Window, or elsewhere when approriate.

    General overview

    All file operations require two steps:

    First you select a file name by using the Disk, Directory, File and Text boxes. For Input operations, use Disk, Directory and File selection options. When you have selected the appropriate directory, Double-Click on the file name to select a file. For Output operations requiring a new file name, use Disk and Directory selection options, and type the file name in the Text box. For selection of files for Output, click on OK to allow correct assignment of output file name to output operation. The filename appears in the text box to indicate that a file has been selected.

    When a file name has been selected, you next have to choose an input or output operation by selecting from the appropriate menu.

    Information about file formats is included in an Appendix.

    Input Menu

    Model

    Input a model. Models are stored as two one-line strings,- first the sequence in single-letter code, then the structure in HETC code, each ending with <CR, LF>. The program also recognizes the more extensive single-letter code from structural analysis of secondary features proposed by Walsh and Crofts (see Appendix), but translates this into HETC code. These files may optionally have a first line containing the protein name, preceeded by the &-character. If such a line is present, the protein name is stored, and used in the program. If not, the file name is used as the protein name.

    Sequence

    Input a sequence. The program recognizes three formats, and deals with them automatically. All formats are for protein sequences. Nucleic acid sequences are not handled by the program. Formats accepted are:

    FASTA format. This is one of the options available for output from GeneBank. The first line (starting with >) is a comment line, and is ignored by the program; subsequent lines (till the end-of-file, or another >) contain the sequence in single letter code. Only the first sequence is used.

    NBRF/PIR format. This is the format of sequence files used by the Protein Information Resource. The first line (starting with >P1;) contains a protein name, and this is used for the protein name by pSAAM. The next line is a comment line, and is ignored by the program. Subsequent lines contain the sequence in single letter code, terminated by an asterisk (*). If more than one sequence is in the file, the program ignores all but the first.

    pSAAM format. The program expects a sequence as a one-line string of amino acid single-letter codes, ending with a <CR, LF>.

    Info Format files.

    Files in this format were produced by using the programs MCF or MPREDICT which were in the old SEQANAL package (15). These are prediction programs which generate a putative secondary structural model from sequence data. The files include some additional physico-chemical data. The Model files contain the same structural information in a more compact format, and the physico-chemical information can be simply generated within the program. They have supercede the Info format. Unless you want explicit access to the physico-chemical data, use the Model format. The current version of pSAAM does not handle Info files directly, but they can be accesses by using INF2MOD.EXE to convert Info files to Model format.

    Other

    Under this sub menu are options for input of:

    Seq Index

    Input a sequence index. Sequence index files are generated by the user, and contain one item of data on each line, and one line for each residue in the sequence. The number of values MUST match the number of residues in the sequence under analysis. Otherwise the program will issue a warning message, and allow you to try again. Sequence index files can be generated from sets of aligned sequences using the program INFORMAT, which scores a mutability index (10-12) using information theory (13). INFORMAT also allows you to calculate the mean hydropathy (or a mean profile for any other index in the ALLPRFLS.USR file) at each point in the sequence. The user can experiment with other types of data, or simply use this function to label residues (for example sites of mutation, liganding residues, etc.) by generating an appropriate Seq Index file using a text editor.

    Filename

    Input a filename. This option is used to select path and name for import of a file of phi and psi values for use in Ribosome (see Ribosome under HardCopy menu).

    Output Menu

    Type a file name in text box and click OK before using these options.

    Model Filename

    Output a file in Model format, using HETC code to represent the current structural model (see above). A dialogue box prompts you for a protein name to add as the first line of the file.

    Profile Filename

    Select a file name for output of current profile. The data for the current profile are filed as two columns, each row containing a residue number and the profile value. The profile name and parameters are appended.

    Save Plot Filename

    Select file name for output of a Plot (see SeqPlot under HardCopy menu)

    PDB Filename

    Rename output file for .PDB format file (used in RIBOSOME Window).

    Once an I/O operation has been selected, the program sets about its work, and informs you of progress. On completion, the File I/O window is closed, and the user is returned to the Profiles Window.

    Input from Text

    Sequence data is available in several formats, but the program requires data in the format of a one-line string of single letter code. The program can read sequence data from a text file, and convert it to the required format through the GeneBank to Pro facility, which is activated when you click on Input from Text under the Input/Output Menu in the Profiles Window. This option is particularly useful if you have access to the Gene Bank, and can down-load files of sequence information (hence the name of the window).

    The GeneBank to Pro conversion window has a file selection facility, a file information section, a menu bar, and a text window. Text from a selected file is displayed in the text window. The mouse is then used to select a portion of the text containing the sequence, and this section is then converted to the appropriate format, and loaded into the program, filed, or both. The conversion routine selects only those characters from the text which are alphabetical (either lower or upper case), and ignores all others.

    First you have to select a file. Use the drive, directory and file selection tools, and double-click on the file name to select it. The name of the selected file will then appear in the Selected File text box. Now click on Select Text, and the text will be displayed in the large text window. If the file is longer than about 600 lines, a message box will appear to warn you that the file can only be browsed in 600-line segments. You might prefer to use a text procesor to edit the file into smaller chunks if it is more than a few segments long.

    The file must contain the sequence in text, but the format doesn't matter as long as no other letter text is imbedded in the sequence (numbers, spaces, tabs, punctuation, etc. will be ignored). Next use the Arrow, Pg Up, Pg Dn keys, and the Scroll Bar to locate the text containing the sequence. Use the Mouse to highlight the sequence to be converted by clicking and dragging, then click on Convert in the window Menu Bar. The converted text is then displayed in the text window for preview. The program asks if you want to load the data. If you click OK, the sequence is loaded, default values for hydropathy, charge, etc. assigned, and the structural model is set with all residues in random coil configuration (coded C). You can exit back to the main program at this point (click on Quit), to analyze the sequence, or you can file the sequence for later use. If you click on Cancel in response to the above query, the sequence data is not loaded, but is still available for filing. First click on OutFile Name to input a file name, then click on File It. You can then continue to browse through the file for additional sequences. However, the main analysis program can only deal with one sequence at a time.

    PROFILES

    Click on Profiles to see a drop-down menu allowing selection of various options for display, decoration, saving and unsaving of profiles. When this option is selected after loading a new sequence file, the program automatically displays a Structural Cartoon, and activates the Sequence Information and Structure Editing features (see Seq Info below).

    Profile parameters

    Profiles are calculated using parameters for index type, span, smoothing, and angle, and are displayed using parameters for color and scale. These parameters are set to default values at run-time, which are indicated in the text area to the left of the window. The values also appear in the titles of the drop-down lists below the menu bar. The drop-down lists are used to change current parameter values, and are accessed by clicking on the arrow. Select an item by clicking on the list. An updated parameter list appears in the text area, and in the list title.

    Index

    The set of values used to calculate the profile. The current index is selected from the Index List. Each index contains a set of twenty values, one for each of the common amino acids, describing some property (for example hydropathy, helical propensity, residue volume, etc.) on a common scale. On selection of an index, an internal array is set with the index value for each residue in the sequence under analysis. This array is then used in subsequent calculations. The Index List is loaded at run-time from the file allprfls.usr. This file can be edited to add new indices to the list (see Appendix).

    Span

    The span over which running averages are calculated. The mean value is calculated according to index selected, and stored in an array as the sequence index value for the central residue of the span.

    Angle

    Moment calculations use the Angle value to determine vector direction. The angle represents the rotation of the next residue about the secondary structure axis.

    Smooth

    Profiles are automatically smoothed if appropriate values are set for Smooth and Smooth Span (SmSpan). The algorithm used is as follows: a running average of values in the sequence index array is calculated over the number of residues selected in SmSpan, and stored in a temporary array. This is repeated over a number of cycles selected in Smooth. The smoothed values are transferred to the Display array. To skip auto-smoothing, set Smooth to 0, or SmSpan to 1.

    Scale

    Profiles are displayed either at a scale automatically determined by the maximal and minimal values of the profile (Auto), or on a fixed scale determined by user input (Fixed). If fixed is selected, a dialogue box appears before display of the profile to query scale values. The present scale can be selected as the default.

    Color

    The color of the line used to draw the profile is set by selection from this drop-down list.

    Hydropathy

    Calculate and display a profile using a running average algorithm, and the currently selected parameters. This option is appropriate for indices with values ranging about zero, such as the free-energy (explicit or implicit) values used in hydropathy scales. Profiles are displayed with a reference line at 0. In some cases, where indicated by authors, an additional reference line is shown to indicate a minimal value for membrane helices (see Appendix).

    Different authors recommend different spans over which to average the hydropathy, depending in the factor to be tested. For non-membrane proteins, a span of 5 or 7 is reasonable, followed by smoothing through one or two cycles with a span of 5 or 7. This procedure will indicate portions of the sequence with a hydropathy typical of buried structures. For membrane proteins, the hydropathy is generally used to predict membrane spanning helices. Since the rise per residue of the a-helix is 1.5 Ao and the width of the hydrophobic phase is about 30 Ao, it takes about 20 residues to cross the hydrophobic barrier. A span of 19 or 21 for calculation of the average hydropathy is therefore often considered appropriate (1, 4, 5, 8). However, this large span tends to smear out the interfacial transition region, and this author prefers to use the span of 7, smooth 2 x 7 protocol set as a default in this program.

    The running average is calculated, by summing the index values for all residues in a span, and dividing by the number in the span (as determined by the value of Span selected). The mean value is stored in an array as the sequence index value for the central residue of the span (for this reason, span values are odd). The residue count is incremented, and the process repeated for all residues in the sequence, starting at residue 1. The profile is smoothed and scaled before display, depending on parameter settings (see above). If Circles are toggled on, residue circles are also displayed.

    Moment

    Calculate and display profile of mean moment using the currently selected index, angle, span and smooth cycles. When a hydropathy index is used, the profile calculated is of the mean hydrophobic moment (amphipathy, or amphiphilicity). Moment can be calculated using other indices, or using an imported Sequence Index. In each case, the algorithm used is that suggested by Eisenberg (5, see also 7).

    By summing the hydrophobicity vectors for successive residues, the sequence can be tested for the possible occurrance of amphipathy in different secondary structures, depending on the choice of angle. For alpha-helix, angles in the range 90 - 100o are appropriate; for 310-helix, use 120o; for extended chains (sheets), use 160 - 180o. High values calculated using 0o as the residue rotation angle show the occurrence of successive residues of similar index value. The geometrical structural information on which these suggestions are based can be found in any standard text on protein structure.

    Profiles are displayed with an arbitrary reference line at 1. If Circles are toggled on, residue circles are displayed.

    Prediction

    This option calculates a profile using the same algorithm as for Hydropathy calculations, but assumes that index values are positive, and range about a value of 1, as in the probability scales used by Chou and Fasman (2), and in other prediction indices. The profile is displayed without residue circles. The reference line is draw at 1.05.

    4PTurns

    Calculates and displays a composite turns profile using the 4-point turns index values of Chou & Fasman (2). In this calculation, the current program parameters for Index and Span are ignored, and and the program uses the position-dependent turn indices to test the probability of turns. The routine checks the current smoothing parameters, and notifies you if they are inappropriate. The recommended values are 2 smooths with span of 3, which gives a reasonable profile. If your current smoothing span is greater than 3, the program pops up a message box to offer some options, with a default of the recommended value. See note below about Replot of 4-point turns profiles.

    Replot

    Redisplay last calculated profile, using current display parameters, or an Unsaved profile using stored parameters. Color, toggling of circles, and scale can be changed before clicking replot if desired. Clearing the screen does not reset the display array used as temporary storage for the current profile, so Cls followed by Replot will regenerate the profile on an uncluttered screen. Because the 4-point turns profile uses local parameters, Replot may not work properly if your last profile was of this type.

    Save

    Use this option to store the current profile, and the information used to generate it. Storage slots are available for 20 profiles. On selection of this option, a drop-down list of the storage slots is presented. Click on an item to use it. Used stores show the stored profile title. Selection of a used store will erase stored data, and replace it with the current profile and data. All stored data is lost on exit from the program.

    Unsave

    Recall a stored profile to the current display.

    Select a storage slot from the drop-down menu. Unsaving a profile will erase the current profile data from the display array. Display the Unsaved profile by Clicking on Replot (see above).

    Togl Circles

    The residues along the sequence are represented by circles, color-coded to show the residue type. For Hydropathy and Moment plots:

    Yellow hydrophobic (values for hydropathy greater than 1)

    Green polar, uncharged

    Blue positive charge (lysine, arginine or histidine)

    Red negative charge (glutamate or aspartate)

    When an imported index is used, color-coding is used to represent data values, with the range set through the Color Switch option under the Imported menu:

    Blue highest range of values

    Green intermediate values

    Yellow lowest range of values.

    Display of Residue Circles can be toggled on/off with Togl Circles option.

    File Profile

    This option allows you to file the data from the current profile in an ASCII (Text) format, which can be used as input to Spreadsheet or Plot programs. The routine needs a file name; if one has not been selected, the program will prompt you, and bring up the File I/O window. The data for the current profile in the Display array are filed as ASCII text in two columns, each row containing a residue number and the profile value. The profile name and parameters are appended. To file a set of profile values, use the Save and Unsave options to bring each in turn into the Display array, and select a new file name for each.

    The SEQ INFO facility

    The lower left corner of the Profiles window (Seq Info) serves two functions:

    Sequence Information

    If you click at any position on the Structural Cartoon, or on the Profile display area, the Sequence Information text box changes to indicate the residue corresponding to that point on the horizontal scale. Information given is the residue number, single letter code, and structural code (H, E, T or C for Helix, Extended Chain (Sheet), Turn, or random Coil) for the residue pointed to.

    The position of the selected residue is shown by a black box in the structural cartoon, and as a black residue circle on the last displayed profile. When the residue pointer is changed, the residue circle is filled with the current color-coding color.

    The Seq Info scroll bar can be used to change location along the sequence. Click on the arrows to increment (decrement) the residue pointer by 1, click on the bar to change by 1/10th sequence length, or use mouse to move slider bar along sequence to a new position.

    Structure or Sequence Editing

    The Structure and Sequence strings can be edited by typing into the text box under the scroll bar. Select to edit either the Structure or Sequence string by clicking either Strct or Seq Button. The active Button is green. The default setting is the Structure Editor.

    When the Strct Button is activated, the "secondary structure" of the currently displayed residue can be changed by typing in a new structural code letter (H, E, T or C, in upper or lower case). The Structural Cartoon and the internal array recording the structure are both updated.

    When the Seq Button is activated, the sequence can be "mutated" by typing in a new single-letter amino acid code. The amino acid name, charge, and index value for the residue are changed, the internal array recording the sequence is updated, and the color circle in the profile display changes to reflect the mutation. All subsequent routines use the updated sequence.

    IMPORTED

    This menu enables you to import and use sequence indices (see Seq Index under File I/O window section). The program expects a sequence index file to contain a value for each residue in the current sequence.

    The program INFORMAT.EXE supplied with the pSAAM package is useful for generating Seq Index files. Use INFORMAT to generate a file of values showing mutability for aligned sequences (12), scored using information theory, or to generate a file of values showing mean hydropathy (or other index value) for residues in aligned sequences. Alternatively, the user can generate Seq Index files for any other parameter, as long as the format follows the rules specified.

    Use Imp. Index

    Sets the values in the sequence index array to corresponding values in the imported Seq Index. After this, the program will use this set of values in calculation of Hydropathy profiles, Moment profiles, Helical Wheels, Cylinders, etc. To change to an index from the Index List, select the index as above. The imported Seq Index is stored separately, and can be recalled by clicking on Imp. Index again.

    Import New Values

    Clicking on this option opens the File I/O window to allow import of a new set of values.

    Color switch

    Use this option to change the values at which residue circles change color. Color of residue circles changes from blue to green as values change from high to intermediate; from green to yellow when values change from intermediate to low range.

    CLS

    Clear Screen. This option clears the profile display area, and resets the profile count and temporary arrays used to store profile data for the Plot to Printer routine (see HardCopy menu).

    CARTOONS

    This option allows access to several cartoon displays of the currently predicted structure.

    Strct. Cartoon

    Click on this item to display the Structural Cartoon. Bars of different width and color are used to indicate the secondary structure along the sequence. In order of de-creasing width:

    Yellow indicates Helix

    Purple indicates Turn

    Red indicates Sheet

    Blue indicates Coil

    The horizontal scale represents residue number along the sequence, and is the same scale used in Profile displays, allowing direct comparison between cartoon and profile. Information about the sequence, and editing of the Structural Cartoon, can be obtained through the Seq Info facilties (see above).

    Helix Wheels

    Select this option to open the Helical Wheels window, which allows display of Helical wheels.

    The wheels show the helix viewed end-on along its axis. Each residue is represented by a spoke terminating in a residue circle. Successive residues are displaced 100o

    clockwise round the circle. This gives a view down the helix from the N-terminus. Residue circles are displaced outwards after the first 18 residues. If the helix has more than 36 residues (two concentric circles), the program displays the rest of the helix as another helical wheel in the next position.

    The Residue Circles in the helical wheels are labelled with the single letter code for the residue, and color-coded using the current index and the same code as for residue types in the profiles (see above). The text boxes in the display show the hydrophobic moment, the angle of the moment, and the sequence span for each helix (color coded), together with the start and end residues numbers. The mean amphipathy for the residues in the helix can be obtained by division of the hydrophobic moment by the number of residues.

    The display area is divided in two, each section holding 4 helices. In each area, on selection of Display 1 (for top area) or Display 2 (for bottom), the first four helices are displayed, then a message box asks if you want to display the next set. Hit Cancel to hold the current four helices, or OK to show the next set. AutoRedraw for each area can be separately toggled on/off.

    Color-coding is according to the index currently selected in the Profiles window. If an Sequence Index has been imported, the wheels can be color-coded for that index by choice of Imported under the Index menu option in this window. The index used is indicated in a text box at the bottom of each area. This facilty allows comparison of wheels color-coded by different indices.

    Print

    The helical wheel display can be sent to the Windows printer. Set AutoRedraw on before regenerating display in area(s) to be printed. If you are using a standard printer (black print on white paper) you should select the BW option for display because the light colors will not be printed. An alternative and more versatile plotting facility is provided under the Seq Plot option in the HardCopy menu.

    Helix Cylndrs

    Select this option to open the Helical Cylinders Window.

    Helical Cylinders are another 2-D representation of the helices. The helical cylinders are represented in two dimensions by cutting the cylinder along the long axis, and laying it out flat. The cylinders show circles for each residue, labelled and color coded as above. Helices over 30 residues long would go off-page. To prevent this, helices longer than 30 residues are continued in the next position. The window has room for six helices; if the model has more than six, a message box comes up to ask if you want to see the next set. Click on OK to see next, or on Cancel to hold current set. Menu options are similar to those for Helical Wheels.

    PREDICTION AIDS

    Click on this item to see a menu of graphical prediction aids.

    Eisenberg Diagram

    Pops up a window to draw Eisenberg Diagrams (5). These are plots of mean helical amphipathy (vertical axis) against mean hypropathy (horizontal axis), using the Eisenberg Consensus Index. The user first selects a span of residues from the sequence using the Cartoon, scroll bars, Sequence text area, etc. Values for each residue are plotted as small colored circles on the diagram. By selecting spans from different regions of the sequence, the physico-chemical environment of the residues in the span can be probed.

    Note: The algorithms are appropriate only if the structure is helical.

    The plot is divided into different areas corresponding to different environments, and labelled G (Globular), M (Membrane) and S (Surface). High values on the horizontal scale indicate buried residues, either in the protein interior or in the membrane lipid, while low values indicate a polar environment. Since amphipathy is calculated for the helical repeat, higher values on the vertical axis suggest amphipathic helices, and low values may indicate either non-helical structure, or symetrical helices. The diagonal line separating the S area is an indicator for amphipathic helices at interfaces between protein and water to the left down to protein-lipid at right. The Globular area therefore represents, on the lower left, hydrophilic structures which are non-helical; near the diagonal S line, amphipathic helices at the protein-water interface. The Membrane area represents, in the bottom of the left section, structures buried in the protein or internal membrane helices; close to the S line, but still in the left section, more hydrophobic amphipathic helices, possibly at the lipid-protein interface; in the right section, isolated membrane helices.

    Using the program.

    To find the likely environment of residues in a span (assuming a helical structure):

    Click on the Cartoon to select a portion of the sequence, and/or use the scroll bar to move the residue indicator along the sequence. Two methods can be used to select a span:

    Mouse Method: Use the mouse to select a span from the Sequence or Structure text boxes. Then click on Select Span, then on Mouse Select.

    Scroll Bar Method: Use the scroll-bar to scroll to the start of the span to be selected. Click on Select Span, then on Start Residue to identify the start of a span. Scroll to the end of the span, click on Select Span, and then on End Residue to show the end.

    Click on Color to drop down a list of colors. Select a color by clicking on an item in the list. The residue values will then be plotted. Select different colors for different spans.

    Finer-Moore

    Select this item to pop up a Window to display a Finer-Moore Diagram (6, 7). In this diagram, the moment using the current index is calculated at angles from 0 to 180 degrees (vertical scale), and the profile at each angle (with values indicated by color-coding) is plotted against residue number (horizontal scale). The resulting plot provides a summary of putative amphipathic structures.

    You should select an appropriate index and span before opening the window. If a Seq Index has been imported, that can be used in the calculations by selecting the Use Imp. Index option under the Imported menu. A span of 11 works best; spans of less than 7 are not accepted.

    When the window has opened, click on Display to start the show. If the span chosen is inappropriate, a message box will tell you so.

    The time taken to generate the display depends on length of sequence and on the power of your computer, and may be excessive. If you want to save the display for later viewing, toggle the AutoRedraw function on before clicking Display.

    The value for moment is color coded, with bright colors for higher values. Successive bands up the diagram are for increasing angle from 0 to 180o, with an increment of 4o for each band. At each angle, the program calculates the moment profile using the same routines as in the Profiles window. The range of values is normalized to fill the 0 - 15 range of color values. The coding of the colors is shown at the top of the completed display.

    Prediction

    Profiles

    This option pops up a window allowing the user to generate a set of profiles to aid revision of current structural cartoon.

    Click Cartn button to display the Structural Cartoon, and to activate the Structure Editing facilities.

    Click on Pdct to calculate and display the profiles.

    The profiles generated are:

    Surface sheet Span 3 Smooth 2 SmSpan 3

    Buried sheet Span 3 Smooth 2 SmSpan 3

    Surface helix Span 7 Smooth 2 SmSpan 7

    Buried helix Span 7 Smooth 2 SmSpan 7

    H-bond turns Span 3 Smooth 2 SmSpan 3

    Amphi@100 Span 11 Smooth 1 SmSpan 7

    Amphi@170 Span 5 Smooth 1 SmSpan 3

    The first five profiles use index values from the analysis of Walsh and Crofts (see Appendix). The two amphipathy profiles use the Kyte-Doolittle index (1), and search for amphipathy at the helical (100o) and sheet (170o) repeats.

    Structure Editing works just as in the Profiles window. Click anywhere on the cartoon, or on any of the 7 profile display areas to position the sequence pointer. The pointer is set to the residue indicated by the horizontal scale bar showing residue number in sequence. The position is indicated by a vertical line in the profiles, and a black "residue" in the cartoon. The Sequence Information text box shows residue number, type and currently assigned secondary structure. The secondary structure can be "changed" by typing in a new structure code from the H, E, T, C set, and this updates the Structural Cartoon, and the internal array of structure.

    If you want to examine the profiles after pushing or minimizing the window, toggle AutoRedraw on by clicking the Rdrw button before the Pdct button. There will be a lag before the profiles are displayed, but they will be preserved when the window is obscured.

    MPredict

    CAUTION.

    This is a seductively "useful" feature, but users are hereby warned that the predicted structures resulting from use of the program may bear little resemblance to the true structure of the subunit or protein represented by the sequence. The "best" predictions are about 60-65% accurate. The predictor should be used only as a convenience to generate a starting "structure". The program pSAAM is designed to allow the user to bring experience and intelligence to the analysis of sequences, by providing information about potential secondary and tertiary structures.

    MPredict is a combination of the Robson and Garnier prediction program (18), Predict, and the Rao and Argos Membrane Helix prediction algorithm (3). The algorithm for Predict, and the probability parameters, are taken from (18), and follow the authors suggestions exactly. The probability parameters were derived from known structures for soluble proteins, and are appropriate for prediction from sequences for such proteins, or for prediction of aqueous domains of membrane proteins. Because of a failure to distinguish between buried and surface structures (see Appendix for discussion of the Walsh and Crofts prediction parameters), the sheet prediction is biased towards hydrophobic domains, and the program tends to predict the hydrophobic spans of known membrane spanning helices as sheets. To counteract this tendency, the Membrane Helix option (if selected) identifies potential membrane spanning helices using the Rao and Argos parameters, and in these regions of the sequence, the membrane helix prediction over-rides the Predict values.

    Predict

    On first calling the module, the Membrane Helix option is off. Click on Predict, and the program proceeds with the Robson et al. prediction, stores all prediction parameters, and generates profiles showing the Helix, Sheet, Turn, and Coil predictions, and a structural model which is displayed as a structural cartoon.

    Membrane Helices

    The membrane helix prediction option (using the Rao and Argos membrane helix predictor, 3) is toggled on and off by clicking on Membrane Helix in the menu bar. The option is initially off. If selected, the membrane helix prediction takes precedence over the Robson prediction, so that all predicted membrane spanning helices are predicted as helix in the final prediction, and the display. The membrane helix prediction conforms to the suggestions of the authors, as follows:

    Rules (I), (II) and (III) (p. 203 of ref. [3]) are followed exactly, except that the peak heights have to be greater than 1.128 for selection. This value differs from the value of 1.13 suggested by the authors. The value of 1.13 was used so as to allow selection of the most weakly predicted helix of bacteriorhodopsin. However, the values generated in the profiles were rounded up from the slightly smaller true values (see Table III, p. 202, ibid.).

    Rules (IV) and (V) for helical termination are not followed exactly, but the over-ride of the Predict values is terminated when the membrane helix potential is less than 1.05, and this approximates the effect. However, users should take care to examine the sequence to see if any given helix can be extended in accordance with the authors' suggestions.

    Rules (VI) and (VII) are more subjective, and no attempt is made to follow them in this

    program.

    Save Structure

    The prediction generated in the module, and displayed as the structural cartoon, is stored in a temporary array. If you click on this option, the predicted structure is copied into the global array used in the rest of the program, and can be edited in the Profiles module of the Prediction window, or in the Seq Info corner of the main Profiles window. The structure shown in the structural cartoons used for selection of spans in the various hard copy options is specified in this global array.

    Clear

    Click on this option to Clear all the display areas.

    HARDCOPY

    Click on this item to drop down a menu to provide hardcopy options.

    Seq Plots

    Click on Seq Plots to pop up the Plots window.

    This window allows the user to compose a 2-D model of the sequence, using helical cylinders, helical wheels and coils. The structures can be positioned at any point on the page; if within the border shown, they will be plotted on the Windows printer on selection of the To Printer option under the Display menu (see below).

    The Span Selector and menu bar options allows the user select a span for a structure, set the size of the structure, select a structure, edit the display, regenerate the display, and send it to the printer.

    To select span

    Click the Cartoon Button to display the Structural Cartoon. Click on the cartoon at the position at which you want to start your structure, then on Start to select first residue; click on the cartoon again, then End to select the last residue of span. The text area under the cartoon will show the sequence and assigned structure of the selected span. (NB. The routines for selection of a Structure take no account of the assigned structure.)

    Size

    Selection of this item allows the user to set the size of the structures. The default size is 1", referred to a printer page size, assumed to be 8.5" x 11" with the printer in portrait mode. The size refers to the "circumference" of helical cylinders (represented by the horizontal width in vertical cylinders), and the radius of helical wheels. Residue circles are scaled so that they are the same size in all structures.

    Structures

    The drop-down menu provides three options:

    Cylndr

    Click on Cylndr to show a submenu of Helical Cylinder options. The helices are represented in two dimensions by cutting the cylinder along the long axis, and laying it out flat. The cylinders show circles for each residue, labelled and color coded according to the current Profiles window index. Cylinders can be drawn in vertical or horizontal config-uration. The helix can be drawn with the N- or C-terminus at the start by toggling Flip Over.

    Wheel

    Click on Wheel to show a submenu of Helical Wheel options. The wheels show the helix viewed end-on along its axis. Each residue is represented by a spoke terminating in a residue circle. Successive residues are displaced 100o round the circle. Wheels can be viewed from the N- or C-end. If N-view was selected, rotation is clockwise from the starting residue at 3 o'clock. This gives a view down the helix from the N-terminus. If C-view was selected, rotation is anti-clock-wise. Residue circles are displaced outwards after the first 18 residues. The Residue Circles in the helical wheels are labelled with the single letter code for the residue, and color-coded using the current index and the same code as for residue types in the Profiles window.

    Coil

    Non helical structures are plotted by positioning each residue circle using the mouse. The Residue Circles are labelled with the single letter code for the residue, and color-coded using the current index and the same code as for residue types in the Profiles window.

    Rotate

    The rotate scroll bars enable the user to rotate both cylinders and wheels. The value of the rotation offset is changed by using the scroll bar, and is displayed (in degrees) in the Rotate Text box.

    For wheels, the value shown in the rotate text box is added to the starting angle. For example, if the rotation has been set to 90o, the starting residue in the wheel will be at 6 o'clock instead of 3 o'clock.

    For cylinders, the position of the starting residue along the "circumference" is displaced by a fraction of the circumference given by [rotation value/360].

    To compose a Plot

    Select options for Structure, size, rotation, etc., then use the mouse to move the cursor arrow to the position at which you want to plot the structure. For cylinders and wheels, click on the left mouse button to draw the structure. For coils, see below. Simple editing functions are provided under the Display menu. Make sure your drawing stays within the boundary lines.

    Vertical cylinders are drawn starting with lower left corner at selected point. Horizontal cylinders are drawn starting with top left corner at selected point.

    The state of the Flip Over toggle is shown by the color of the Rotate text box. Magenta indicates that Cylinders will be shown with N-terminal end at start. Cyan indicates that helix will be displayed with C-terminal end at start. Click on Flip Over to change setting.

    Wheels are drawn with their center at selected point. In N-view, residues rotate clockwise from 3 o'clock; in C-view, residues rotate anti-clockwise.

    Coils are drawn residue by residue. To position the first residue of the coil, use the mouse to move the arrow cursor to the start position, then press the Shift key and the left mouse button. The first residue circle will be drawn with its center at the arrow point. Subsequent residue circles are drawn interactively. While pressing the Control (Ctrl) key, keep the left mouse button depressed while you use the mouse to position the "ghost" residue circle where you want the next residue. The residue circle will be displayed when you release the mouse button. Repeat till all residues are drawn. The program tells you when this happens, in case you lose count.

    Display

    Click on this item to drop down a sub menu of display options. These include some simple editing functions:

    To Erase Last cylinder or wheel, click on Erase Last, then click once on the display area at any position to erase the structure. Click a second time at a selected position to redraw the structure. If you want to change parameters, do so AFTER the erasing click and BEFORE the redraw click.

    To Erase Last coil residue, simply click on Erase Last.

    To Screen

    Click To Screen to redraw composition on screen. This option is useful if you want to review the current composition before sending it to the printer, or if you want to regenerate the present composition after pushing the window.

    To Printer

    Click To Printer to send composition out to Windows Printer. The program assumes that the printer is SetUp in Portrait Mode. Use Printers and SetUp under the Control Panel icon to change SetUp if necessary. Make sure your composition is within the boundary lines, otherwise the printer may eject the page and start a new one when the drawing goes off scale.

    The printer type (black and white or color printer) is assumed to be as selected in the Profiles (Main) Display window under Hard Copy and Printer sub menus. If a black and white printer is selected, the coding for residue circle values is shown by line thickness; if a color printer is selected, the colors printed will be the screen colors, but on the background color of the paper selected. Yellows are not very visible in this mode. In black and white mode, for hydropathy scales, line thickness increases in the order hydrophobic, polar-uncharged, polar-charged residues. For Seq Index scales, line thickness increases in the order low, intermediate, high range. In each case, the lighter line corresponds to residues in yellow on the screen, intermediate to green, heavy to blue (or blue and red for hydropathy scales).

    To File

    An Input Box pops up to ask you for the name of a file. Enter a file name (current path is assumed if none is supplied), then <Enter> or click OK. Click Cancel to abort. The current composition is filed.

    From File

    First select a file name by using the File I/O window.This will pop-up if no file name is currently selected. If you click on this option with a file name selected, the composition stored in the file will be read, and displayed.

    Clr Screen

    Selection of Clr Screen will erase the present composition. This also resets the temporary arrays used to store information for sending plots to the printer, screen or file. To start a new composition, first click on Clear.

    Erase Last

    This function operates differently for different structures.

    Autoredraw

    Puts the picture display area into autoredraw mode, so that the display is not lost when you overwrite the window with another window.

    Constructing Simple Stand-alone Helical Models

    The Seq Plot options can be used to construct simple 3-D helical models of proteins. These are useful for didactic purposes, looking at potential interaction locations, guessing at active sites, deciding on sites for mutation, etc.

    Select a bar of suitable diameter;- a lab stand bar works nicely, but bars made from balsa wood or plexiglass are better for permanent models. Measure the circumference (our lab stand bars are 1.65" in circumference). Select Vertical Cylinders as the Structure option. Use the Size option in the menu bar to select a size corresponding to the circumference of your bar, and plot out helical cylinders in the appropriate orientation. Use transparent tape to form these into true cylinders, using the bar as a former. If you are using a metal bar, slip the helix off the bar, make the next helix, etc. Helices can be strung together using corks or rubber stoppers, and strong wire. The connecting wires can be decorated using Coils to show intermediate structure.

    For prosthetic groups, it is convenient to find an appropriate structure in the Protein Data Bank, and abstract it from the file using a text processor. The structure can then be viewed using pdViewer (or other macromolecular display software), rotated to provide a good perspective, and plotted out. Xerox the structure onto stiff trancparency paper, cut it out, and insert into your paper model.

    Printer

    Use BW Printer or Use Color Printer

    The default selection is Use BW Printer.

    Select Use BW Printer (for any standard black on white printer) or Use Color Printer (for printers capable of color printing) before printing out any profiles or compositions. The program can send color pictures to a color printer only if you have has a suitable Windows driver installed. The selection made here applies to all other routines in the program.

    If you use a color printer, make sure that you have selected all apprpopriate options using Printer SetUp under Printers in the Control Panel (Main Group).

    Print Profiles

    To plot profiles to the Windows Printer, click on Printer, then on Print Profiles. Profiles in the current display will be printed.

    The program uses a rotating set of temporary arrays to store the profile data for plotting. The arrays are reset when Cls is used to Clear the Screen. Up to five profiles are stored. If more than five profiles have been generated since the last Cls, the last five are plotted.

    The scale and scale parameters shown are those for the first profile generated after Cls, or the first profile printed if more than five have been displayed. The name and profile parameters are printed for each profile below the plot. Successive profiles are plotted with a different line-type or color, as follows:

    Profile Line-type Color

    0 Solid Black/Gray

    1 Dash Blue

    2 Dot Green

    3 Dash-dot Cyan

    4 Dash-dot-dot Red

    The Structural Cartoon is plotted at the top of the plot to allow identification of profile features with assigned secondary structure.

    Residue circles

    If residue circles were toggled in for a profile, they are are also shown on the profile plots. If the printer is black and white, the coding for residue circle values is shown by line thickness. For hydropathy scales, line thickness increases in the order hydrophobic, polar-uncharged, polar-charged residues. For imported Seq Index scales, line thickness increases in the order low, intermediate, high range. In each case, the lighter line corresponds to residues in yellow on the screen, intermediate to green, heavy to blue (or blue and red for hydropathy scales).

    Ribosome

    This option allows you to make 3-D atomic models of the structure (as shown in the Structural Cartoon) for selected spans of the sequence. The models consist of the atomic coordinates for all non-H atoms of the amino acids in the span, written in the format of the Brookhaven Protein Data Bank (16, 17). Models are filed for later viewing using a suitable Molecular Graphics package. Under Windows, pdViewer for Windows is recommended. Up to 6 models can be made in a session.

    Overview

    The program uses the structural data array (as visualized in the Structural Cartoon) to decide on the dihedral angles of back-bone atoms of the amino acids in the peptide chain, and then adds side chain atoms using default angles.

    Because of the uncertainties in secondary structural prediction, and the greater uncertainty in translation of secondary to tertiary structure, it is unwise to try to construct a Model using the entire span of a polypeptide chain. Ribosome restricts models to 100 residues, but spans of 10-30 residues are more useful. Use these Model fragments and an interactive Molecular Graphics program to construct more complete models, positioning fragments using whatever additional knowledge you have of the tertiary structure.

    Selecting a span

    Click on the Cartoon button to display the Structural Cartoon showing the presently assigned secondary structure.

    Click on the Cartoon, then on Start Res, to select a Start Residue for the span you wish to Model. Click on Cartoon, then on End Res, to select End Residue. The text panel beneath the cartoon will show the span selected, and the structural code. The Start offset text panel shows the residue number before the start residue. This is an of

    fset to add to the numbering within the span selected, used to query dihedral angles, etc.

    Constructing a Model

    Click on Construct to construct a tertiary structural model. If you have selected Query Angles, the program allows you to change the back-bone dihedral angles, but not side chain angles. If you chose not to query angles, default values are used, and the construction of a Model proceeds automatically, unless the structure includes a turn. On encountering a turn, the program presents a menu of turn-types. Appropriate dihedral angles are supplied automatically once you select a type of turn, unless you have selected the Query Angles option.

    Query Angles

    The coordinates of the back-bone atoms of the polypeptide chain are determined by values for the dihedral angles, phi and psi. The program assigns default values for these, depending on the secondary structure. Default values are listed in the Appendix. Angles will be queried if you click on Query Angles before starting the Construction. When querying angles, the program uses the residue number within the span selected. This can be translated to residue number for the full sequence by adding the Start offset value shown in the Start offset text panel.

    To View a Model

    Click in View to run the pdVWIN Protein Viewer program. This option is available only if you have pdVWIN installed on your computer. The program finds the directory address of pdVWIN by loading it from the psaam.ini file. At run-time, the program checks the .ini file for a line similar to the following:

    %PDVWIN=c:\pdv\pdvwin.exe

    The information after the equal sign should be the path, directory, file name, etc. for the pdVWIN progam. The line above will be appropriate if you have installed pdVWIN using the suggested options.

    If the program finds this item in the .ini file, it assumes that the path and address are correct, and enables the viewer option in the Ribosome Window. Otherwise, the View option in the menu bar is Gray to indicate that it is disabled. The pdVWIN program can be obtained by anonymous ftp to nemo.life.uiuc.edu, and is in the pub/pdvwin directory. See pdVWIN Manual for further details.

    To File a Model

    Click on File to show four options.

    Make PDB Header

    This option pops up a window to allow you to generate an informational header for the PDB file of coordinate data produced by Ribosome. Ribosome generates files in the Brookhaven Protein Data Bank (PDB) format. In addition to data showing atomic coordinates for atoms, the standard PDB files also contain a header, or preamble of information about the structure in the file, including sequence, prosthetic groups and secondary structure. The Header generated in this window will contain lines starting SEQRES, which show the sequence, and will also contain lines starting HELIX, SHEET, or TURN, if your structural model contains these secondary structural elements. In addition, the program prompts you for a Chain Identifier and prosthetic groups. If you intend to construct a model containing more than one polypeptide chain, you should label each chain with a unique chain identifier consisting of a single alphanumeric character. If your model will contain prosthetic groups, you may include this information in the header by giving a non-zero number in response to the query for number of prosthetic groups. The program will then ask you for a name, Chain Identifier and residue number for each, and enter this information in lines starting with HET.

    The program will use the current sequence and structural information, or can alternatively get this information from file. The file input format is the same as for Models.

    The program produces a header containing information for the complete sequence. Thus, you need only make one heade for each chain. If you are making a number of sub-structures, you should generate a header only for the first. The header information is automatically prepended to the coodinate data if a header has been generated. To clear the header information, click Clear Header.

    Clear Header

    If you have already made a PDB header (see above), it will be automatically prepended to the coordinate data set when you file it. If you don't want this to happen, you must clear the header information before filing your coordinate data. Click on Clear Header.

    File Model

    Click on File Model to file current model to disk using an autogenerated filename. Text panel will show the file name and path used. The file is in standard Brookhaven Protein Data Bank format, but includes only ATOM, HETATM, TER and CONECT lines. If a Header has been generated (see above), that will be automatically prepended. An example is included in the Appendix. The program assigns residue numbers from the beginning of the sequence, assuming that the first residue is 1. Atom numbers are calculated assuming standard amino acid structures, and are also numbered from the beginning of the sequence. This facilitates the building of more complete models,- there is no need to renumber the lines when parts of the model are included in a single .PDB file.

    The routine automatically assigns a path and file name based on a name derived from the input file. The base name is the path and file name before the extension. Before use in Ribosome, the filename itself, if longer than 7 characters, is truncated to 7 chars, and a model number is appended. Then the extension .MDL added, and the Model is filed to the directory indicated by the path. For example, if the input file was c:\info\cytchrom.inf, the first Model to be filed would be called c:\info\cytchro1.mdl

    Change Filename

    To change name and/or path, click on Change Filename. This will bring up a message box showing you the current path and file name. You can retain these by clicking OK. Otherwise, the program pops up the File I/O window. Select a new directory if desired, then either select an existing file name by double-clicking it, or type a new name in the filename text box, and click OK. Then click on Filename from the menu of Input options. The new path and filename are treated within the Ribosome module in the same way as the default name, ie. the pre-extension part is used, the file name is truncated if greater than 7 characters, a model number and the extension .mdl are added. The same base name is used for all models until the Ribosome window is closed, or a new file name selected.

    Constructing more complex models.

    You can use pdViewer to construct models made up from several Ribosome fragments. Read a starting fragment into Model A, Center, and turn model Inactive, but Visible. Read the next fragment into Model B and Center. Use the Move and Rotate scroll bars to position model B with respect to model A. When a satisfactory alignment has been achieved, turn model A active. Rotate both models to check relative positions, and make any corrections necessary by Inactiv-ating one model and moving the other. With both models Active, output a .PDB file of all coordinates. If the two fragments are not contiguous, use a text processor (make sure it is in text mode) to insert a TER line between file entries from different fragments. Residues and Atoms in the fragments produced by Ribosome are already correctly numbered, so little or no additional editing should be necessay. Save the file under an appropriate name. Now use this new file as the starting fragment for loading into model A; you can iterate towards a larger model by loading a new fragment into model B, and repeating the above.

    Many common prosthetic groups can be found by browsing through the Brookhaven Protein Data Bank, and borrowed from existing structures. If you are adding HETATM and/or CONECT lines, put these at the end of the file. You may need to renumber atoms to avoid conflict with atom numbers in your model protein.

    REDRAW

    Click on Redraw to toggle AutoRredraw of displays after Profiles window is pushed, minimized or obscured.

    END

    Click on End to Exit the program, and close all open windows.

    APPENDIX

    Format of Files

    Information Files

    Information files from MCF or MPREDICT are in the format shown below. They can be converted to Model format files by using INF2MOD.EXE.

    93 
     1            S             -.8            0            C             -0.560
     2            K             -3.9           1            C             -0.827
     3            A              1.8           0            C             -0.383
     4            V              4.2           0            C             -0.623
     5            K             -3.9           1            E             -0.857
     6            Y             -1.3           0            E             -1.013
     7            Y             -1.3           0            E             -1.169
     8            T             -.7            0            H             -1.306
     9            L              3.8           0            H             -1.278
     10           E             -3.5          -1            H             -1.416
      .           .              .             .            .                .
      .           .              .             .            .                .
     92           E             -3.5          -1            C             -1.773
     93           S             -.8            0            C             -1.600
    

    The first line gives the number of residues in the sequence. All subsequent lines contain information about the sequence, as follows:

    col 1-3 number of the residue in the sequence

    col 15 single letter code identifying the residue

    col 29-32 hydrophobicity of the residue, generated by the source program

    col 43-45 charge of the residue

    col 57 single letter code identifying structural type

    col 71-76 average hydropathy of the sequence at this residue

    Sequence Files

    PSAAM reads files containing sequence data in the following formats:

    FASTA format. This is one of the options available for output from GeneBank. The first line (starting with >) is a comment line, and is ignored by the program; subsequent lines (till the end-of-file, or another >) contain the sequence in single letter code. Only the first sequence is used.

    NBRF/PIR format. This is the format of sequence files used by the Protein Information Resource. The first line (starting with >P1;) contains a protein name, and this is used for the protein name by pSAAM. The next line is a comment line, and is ignored by the program. Subsequent lines contain the sequence in single letter code, terminated by an asterisk (*). If more than one sequence is in the file, the program ignores all but the first.

    pSAAM format. The program expects a sequence as a one-line string of amino acid single-letter codes, ending with a <CR, LF>. The code can be in upper- or lower-case, but is translated to upper-case for use in the program.

    After reading in a sequence file, PSAAM generates values for internal arrays containing, for each residue, the amino-acid type, charge, hydropathy, average hydropathy or sequence index (using currently selected profile options), and structural type. In this last array, all residues are initially set to C (random coil), but can be updated to reflect predicted structure using the various program options descibed above. The display array is loaded with the sequence index data.

    Model files

    These files contain two lines, one for the sequence and one for the structure.

    Line 1 The sequence in single letter code as a single string terminated with <CR, LF> (carriage return and line feed characters,- the DOS end-of-line).

    Line 2 The structural type for each residue, coded as H for helix, E for extended sheet, T for turn, C for random coil, and formatted as a single string with no spaces.

    Line 2 may alternatively contain the structure coded as suggested by Walsh and Crofts (see below).

    After loading the data, the sequence information is treated as in the Sequence files, and the structural type array is loaded with information from the structure line.

    The ALLPRFLS.USR file

    In order to accomodate the plethora of prediction and hydropathy indices in the literature, and the requirements of individual users, the program reads in a file of indices for calculating profiles. A total of 30 user profiles can be included. The present ALLPRFLS.USR file contains an assortment of indices for hydropathy and prediction, which should serve most purposes. However, the user is free to add indices, and the information in this section should make this straightforward. Since the present file already has 30 profiles, it will be necessary to replace an Existing index.

    In order to add new indices, use a text editor to modify the file ALLPRFLS.USR, and note the following rules:

    line 1: The first line in the file is the number of profiles. To add a profile, change this number by adding one. The maximal number is 30; to add a profile when the file contains 30, delete an existing profile.

    The profiles are stored in the following format, where n is 1 if this is the first profile, or otherwise the last line of the previous profile:

    line n+1: The name of the profile, as you would like to see it displayed, in standard

    ASCII characters, with a <CR, LF> to terminate.

    lines (n+2)-(n+21): The index value for each amino acid. The values must correspond to the order of amino acids below:

    A E Q D N L G K S V R T P I M F Y C W H

    Each value must be on a separate line, ie. is terminated by <CR, LF>.

    line n+22: The name of the following profile.

    line (n+23)-(n+42): Index values for the profile.

    etc.

    After changing the value in the first line, move the cursor to the line immediately

    below the last item in the file, and add the name of your new profile, <Enter>, then the 20 index values in the order shown above, with <Enter> to terminate each value. Check that you have added 21 new lines.

    Notes on some hydropathy indices included in ALLPRFLS.USR

    The values for the Eisenberg Consensus Hydropathy (5) have been multiplied by 3 to bring them into a range similar to those of other indices scaled in free-energy (kcal/mol) units. Routines using this index are appropriately modified.

    The Rao and Argos Membrane Helix Parameter (3) is included in two formats,- the original set scaled to probability, and a set scaled to a range similar to as to those of other hydropathy indices scaled in free-energy (kcal/mol) units.

    Indices with names in capital letters are taken from the review by Cornette et al. (9).

    The SideChain Vol index is based on the values for amino acid volume given by Creighton (19), with the value for glycine subtracted from all other values. The values were then divided by 10, and normalized about a value of 5 to provide a suitable range for the routines. Residue circles are colored so that the smallest side chains are blue (Gly, Ala, Ser, Cys), through green (Glu, Gln, Asp, Asn, Val, Thr, Pro, His), to yellow for the largest side chains (Leu, Lys, Arg, Ile, Met, Phe, Tyr, Trp).

    Membrane spanning helix prediction

    The display and plotting of profiles generated using the Eisenberg consensus (5) or Kyte-Doolittle (1) hydropathy indices has been modified to aid in the determination of potential membrane spanning regions. The displays and plots of these profiles include a dashed line set at a hydropathy value of 1.26 for Consensus or 1.25 for Kyte hydropathy. The value of 1.26 is equal to 0.42 x 3; the value of 0.42 is that suggested by Eisenberg as the mean hydrophobicity over a span of 21 necessary for membrane associated helical spans, and the multiplier of 3 is used to scale the hydropathy values to a pseudo-free energy range similar to that of the other hydropathy indices (see Eisenberg review, 5). The value of 1.25 for the Kyte hydropathy was selected empirically so that all known membrane spanning helices would have mean hydropathies (peaks in the profiles) at or above this line using a variety of protocols for span and smoothing parameters. Kyte and Doolittle (1) recommend a value of 1.6, but this fails to identify several known membrane spanning helices.

    DIHEDRAL ANGLES

    Default values used for phi and psi angles in Ribosome calculations are:

    For Helix phi -57 psi -47

    For Extended chain (sheet) phi -139 psi 135

    For Coil phi -78 psi 149

    P in helix phi -54 psi -39 (Mean of 10 values from PDB database)

    P in coil phi -120 psi 120

    For Turns

    The routine expects turns to consist of 4 residues. The dihedral angles for the two central residues of the turn segment are treated as the the actual turn, and assigned values depending on turn type. The two surrounding "T" residues are assumed to be in extended chain conformation. If the sequence contains more than 4 consecutive residues in turn, the program picks off turns 4 residues at a time. If the turn contains less than 4 residues, it is treated as a non standard; the program uses the "middle" two residues to assign angles, and treats any others as extended chain.

    Turn type 1 phi1 -60 psi1 -30; phi2 -90 psi2 0

    Turn type 2 phi1 60 psi1 30; phi2 90 psi2 0

    Turn type 3 phi1 -60 psi1 120; phi2 80 psi2 0

    Turn type 4 phi1 60 psi1 -120; phi2 -80 psi2 0

    Turn type 5 phi1 -60 psi1 -30; phi2 -60 psi2 -30

    Turn type 6 phi1 60 psi1 30; phi2 60 psi2 30

    Turn type 7 phi1 -80 psi1 80; phi2 80 psi2 -80

    Turn type 8 phi1 80 psi1 -80; phi2 -80 psi2 80

    P in Turn (type 9) phi -83 psi 158

    P in Turn (type 10) phi -78 psi 149

    Example Ribosome output (.MDL) file

    ATOM    326  N   GLY    38      -0.687  -3.733  -2.986
    ATOM    327  CA  GLY    38       0.287  -4.908  -2.880
    ATOM    328  C   GLY    38      -0.411  -6.096  -2.368
    ATOM    329  O   GLY    38      -0.255  -7.189  -2.932
    ATOM    330  N   ILE    39      -1.184  -5.927  -1.312
    ATOM    331  CA  ILE    39      -1.944  -7.117  -0.722
    ATOM    332  C   ILE    39      -2.799  -7.728  -1.751
    ATOM    333  O   ILE    39      -2.797  -8.959  -1.897
    ATOM    334  CB  ILE    39      -2.759  -6.758   0.532
    ATOM    335  CG2 ILE    39      -2.682  -5.235   0.765
    ATOM    336  CB  ILE    39      -2.759  -6.758   0.532
    ATOM    337  CG1 ILE    39      -4.226  -7.154   0.308
    ATOM    338  CD1 ILE    39      -4.474  -7.783  -1.076
    ATOM    339  N   CYS    40      -3.538  -6.911  -2.478
    ATOM    340  CA  CYS    40      -4.453  -7.473  -3.567
    ATOM    341  C   CYS    40      -3.667  -8.262  -4.527
    ATOM    342  O   CYS    40      -4.070  -9.382  -4.875
    ATOM    343  CB  CYS    40      -5.276  -6.390  -4.287
    ATOM    344  SG  CYS    40      -6.267  -5.387  -3.156
    

    PREDICTION PARAMETERS FOR SECONDARY STRUCTURE

    A set of indices has been included in the ALLPRFLS.USR file which are probability parameters of the type calculated by Chou and Fasman (2), but using a new classification of the structural data base (Laura Walsh and Antony Crofts, in preparation). The new classification has taken account of the fact that buried structures are different in composition from surface structures.

    Files in the Brookhaven Protein Data Bank (16, 17) for protein structures with a resolution better than 2.5 Ao were analyzed using a modified Kabsch and Sander (14) program. The secondary structures were classified as surface, buried, exposed or indeterminate, using an algorithm in which the axis of the structure was determined, and the vector of exposure to the aqueous phase calculated along the axis at each atom in the polypedtide backbone. This made it possible to recognize amphipathic structures; buried structures could be readily identified as those which showed no amphipathic character, and no exposure; structures were classified as exposed if they showed no amphipathy, but a high exposure. Structures were classified as indeterminate if they did not fall into one of the other three classes.

    This new classification, and the new set of probability parameters, addresses a deficiency in the classical approach to structural prediction (in which the structural classification provided in the preamble to the data set in the .PDB files is used as the "target" for calculation of probability parameters). The classical approach fails to recognize that the partitioning of residue type to secondary structure is modulated (strongly) by the location in the protein,- buried structures are hydrophobic, exposed residues hydrophilic,- so that the probability parameters for surface and buried secondary structures of the same type are quite different.

    The new set of probability parameters should be treated with caution until tested by review, publication and use. The present set is based on "monomeric" subunits for polymeric proteins. A revised set is underway in which an updated data set of known structures is being used, and the native form of polymeric proteins has been generated. This revised set will be available with future releases.

    Walsh and Crofts structural codes

    A = surface sheet (amphipathic in terms of fractional exposure of residues)

    D = buried sheet ( > 90% buried, not amphipathic)

    E = sheets (uncharacterized)

    F = exposed sheet (partly exposed ( < 90% buried), not amphipathic)

    H = helix (uncharacterized)

    J = surface alpha helix

    K = buried alpha helix

    L = exposed alpha helix (< 90% buried, not amphipathic)

    G = 3-10 helix (uncharacterized)

    M = surface 3-10 helix

    N = buried 3-10 helix

    O = exposed 3-10 helix

    S = non-H-bonded turns

    T = H-bonded turns

    B = H-bonded residues not falling into classes above (links)

    C = unclassified secondary structure (coils)

    REFERENCES

    1). Kyte, J. and Doolittle, R.F. (1982) J. Mol. Biol. 157, 105-132.

    2). Chou, P.Y. and Fasman, G.D. (1978) Adv. in Enzymology, 47, 45-148.

    3). Rao, J.K. and Argos, P. (1986) Biochim. Biophys. Acta, 869, 197-214.

    4). Engelman, D.M., Steitz, T.A. and Goldman, A. (1986) Ann. Rev. Biophys. Biophys. Chem. 15, 321-353.

    5). Eisenberg, D. (1984) Ann. Rev. Biochem. 53, 595-523

    6). Finer-Moore, J. and Stroud, R.M. (1984) Proc. Natl. Acad. Sci. USA, 81, 155-159.

    7). De Loof, H., Rosseneu, M., Brasseur, R. and Ruysschaert, J.-M. (1987) Biochim. Biophys. Acta, 911, 45-52.

    8) von Heijne, G. (1981) European J. Biochem. 116, 419-422

    9) Cornette, J.L., Cease, K.B., Margalit, H., Spouge, J.L., Berzofsky, J.A. and DeLisi, C. (1987) J. Mol. Biol. 195, 659-685

    10) Komiya, H. Yeates, T.O., Rees, D.C, Allen, J.P. and Feher, G. (1988) Proc. Natl. Acad. Sci. USA 85, 9012-9016.

    11) Rees, D.C., DeAntonio, L. and Eisenberg, D. (1989) Science, 245, 510-513.

    12) Crofts, A.R., Yun, C.-H., Gennis, R.B. and Mahalingham, S. (1989) Proc. VIIIth. Internatl. Cong. Photosynth., in press.

    13) Shannon, C.E. and Weaver, W. (1949) Mathematical Theory of Communication. University of Illinois Press, Urbana.

    14) Kabsch, W. and Sander, C. (1983) Biopolymers 22, 2577-2637.

    15) Crofts, A.R. (1987-1992) SEQANAL Package, Copyright, University of Illinois. Available from The Biotechnology Center, University of Illinois, 901 S. Mathews, Urbana, IL 61801.

    16) Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M. (1977) "The Protein Data Bank: A Computer-based Archival File for Macromolecular Structures". J. Mol. Biol. 112, 535-542

    17) Abola, E.E., Bernstein, F.C., Bryant, S.H., Koetzle, T.F. and Weng, J. (1987) "Protein Data Bank" in Crystallographic Databases - Information Content, Software Systems, Scientific Applications, (Allen, F.H., Bergerhoff, G. and Sievers, R., eds.), pp. 107-132. Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester

    18) Robson, B., Douglas, G.M. and Garnier, J. (1983) in Computing in Biology (Geisow and Barret, eds.) pp.132-177. Elsevier Biomedical Press.

    19) Creighton, T.E. (1993) Proteins. Structures and Molecular Properties, Second Edition. 507 pp. W.H. Freeman and Co., New York