wwavePDB
Overview
wwavePDB Help Output
wwavePDB Tutorial
(WEB PAGE PDF)
wwavePDB Overview
wwavePDB identifies atoms in a PDB file within a distance range of other atoms.
wwavePDB is useful for identifying the common spatial occupancy of atoms.
Atom output may be further restricted by set properties (e.g., heterogeneity of chain identifiers, chain identifier content, heterogeneity of atom type)
and individual atom properties (e.g., atom type, atom chain, structural connectivity).
Output is atom-based and is written as PDB output to new PDB files.
wwavePDB identifies:
- features without initial target feature input
- contact interface atoms
- binding site features, including between highly divergent structures
- distributed features within a structure or between structures that are not related to contact or binding but represent a structural feature of a group of molecules
wwavePDB can be used to:
- rapidly discover features in a molecular structure or molecular interaction without pre-specifying a search feature
- deliver atom sets for creating common reference surfaces that can be used to compactly display conformational differences between structurally related molecules
- find structural relationships in highly divergent structures
- identify sets of atoms associated with particular molecular functions
- test whether particular common spatial occupancy of atoms in molecules are unique (i.e., not present in other data) and are consistent with their binding without pre-selecting atoms to be considered
wwavePDB Help Output (“wwavePDB -h” output)
NAME
wwavePDB -- identifies atoms in a PDB file within a distance range of other atoms
SYNOPSIS
wwavePDB [options] (version 1.0.4)
CHARACTER OPTION_____KEYWORD OPTION_________DESCRIPTION___________________________DEFAULT____
Options for input:
-i <filename> .... --inputfn=<filename> ... input pdb filename .................. required
-n # ............. --model=# .............. MODEL # within PDB to process ....... first model
-x ............... --read_water ........... include water atoms ................. no waters
-y ............... --read_hydrogen ........ include hydrogen atoms .............. no hydrogen
Options for output:
-o <prefix> ...... --output_prefix=<str> .. output file prefix .................. none
-w <filename> .... --wwfn=<filename> ...... complete filename for "_ww" file .... stdout
-s [<filename>] .. --sets ................. output "_sets" file, opt. w/name .... no output
--setsfn=<filename> .... output "_sets" file, opt. w/name .... no output
-l ............... --log .................. output execution summary to stderr .. no log
Options for distance range:
-a # ............. --min=# ................ minimum distance between atoms ...... 0.0
-z # ............. --max=# ................ maximum distance between atoms ...... infinity
Options for "_ww" output:
-r <suboptions> .. --restrict_set=<opt> ... set restrictions .................... 3 or "any"
[1|2|3]{1} pick one: restrict set chain id content ....... 3 or "any"
1 ........... "one_chain" ........... only homogeneous sets of chain ids
2 ........... "two_or_more_chains" .. only heterogeneous sets of chain ids
3 ........... "any_chains" .......... heterogeneous and/or homogeneous sets
[a|b|c]? pick none or one: restrict set atom content ........... none
a ........... "only_atoms" .......... only ATOMs
b ........... "only_hetatms" ........ only HETATMs
c ........... "atom_and_hetatm" ..... 1+ ATOMs and also 1+ HETATMs
[g|h|i]? pick one: restrict set reference atom type .... i or "ref_any_type"
g ........... "only_ref_atom" ....... only ATOMs
h ........... "only_ref_hetatm" ..... only HETATMs
i ........... "ref_any_type" ........ ATOMs and HETATMs
[w]? pick none or one: restrict set chain id content ....... none
w ........... "includes_all_cids" ... subset of atoms w/all '-c' chain ids
[x|y|z]? pick none or one: restrict set chain id content ....... none
x ........... "no_cids" ............. no atoms with a '-c' chain id
y ........... "one_or_more_cids" .... 1+ atoms with a '-c' chain id
z ........... "only_cids" ........... only atoms with a '-c' chain id
-f <suboptions> .. --filter=<option> ...... select output filters ............... 6 or "any"
[4|5|6]{1} pick one: filter set output by atom type ...... 6 or "any"
4 ........... "output_only_atoms" ... output only ATOMs
5 ........... "output_only_hetatms" . output only HETATMs
6 ........... "output_any_atom_type" output any ATOMS and/or HETATMs
[r|t]? pick none or one: filter set output by chain id ....... none
r ........... "output_only_cid" ..... output only atoms w/a '-c' chain id
t ........... "output_no_cid" ....... output only atoms w/no '-c' chain ids
-c <chainids> .... --chainids=<chainids> .. for '-r', '-f', '--restrict_set=' ... none
-m <suboptions> .. --make_ww=<opt> ........ outputs additional "_ww" files ...... none
[1-5r]+ ....... pick one or more:
1 ........... "charged" ............. make "_ww_charged" file
2 ........... "no_mainchain" ........ make "_ww_no_mainchain" file
3 ........... "charged_non_mainchain" make "_ww_charged_non_mainchain" file
4 ........... "helices_and_beta" .... make "_ww_helices_and_beta" file
5 ........... "non_mainchain_carbon" make "_ww_non_mainchain_carbon" file
12345 ....... "all" ................. make all the above additional files
r ........... "residue" ............. make residue copies of "_ww" files
Options for "_sets" output:
-q <suboptions> .. --column_order=<opt> ... column sort list by col.# or name ... 0 or "dist"
[[0-9]+|n] ..... pick the character string "none" or one or more:
0 ........... "dist" ................ distance
1 ........... "atype" ............... atom type
2 ........... "aname" ............... atom name
3 ........... "anum" ................ atom number
4 ........... "chain" ............... chain id
5 ........... "rname" ............... residue name
6 ........... "rnum" ................ residue number
7 ........... "x" ................... x coordinate
8 ........... "y" ................... y coordinate
9 ........... "z" ................... z coordinate
n ........... "none" ................ NO SORTING
Other options:
-h ............... --help ................. print more help (Enter 'wwavePDB -h' for help.)
<NO OPTIONS> .............................. shorter option synopsis (Just enter 'wwavePDB'.)
--license .............. prints license terms for wwavePDB.
DESCRIPTION
wwavePDB identifies atoms in a PDB file within a distance range of other atoms. Atom
output may be further restricted by set properties (heterogeneity of chain identifiers,
chain identifier content, heterogeneity of atom type) and individual atom properties
(atom type, atom chain, structural connectivity). Output is atom-based and is written
as PDB output to new PDB files.
wwavePDB is a program with a unix command line interface. Input is a file in pdb format.
The structural output files, containing the atoms that satisfy the specified conditions,
are also in the PDB format, and by default are named with the suffix "_ww". "_ww" files
are subsets of the original input pdb file, and may be viewed with existing PDB display
programs. It may be useful to display the original pdb input file with overlaid output
"_ww" files that are created with different distance ranges. By default, only the "_ww"
file is output.
An informational file showing the atoms sets may be output; this file is suffixed "_sets"
by default. The "_sets" file lists each atom separately with its related atoms that
satisfy the range and other specified set restriction and output requirements. The
"_sets" file is not in the PDB format. The "_sets" file is not output by default.
The 'OPTIONS' section below describes how to control an individual execution of wwavePDB;
this control is further broken down into subsections that reflect the basic functioning
of wwavePDB:
- INPUT SPECIFICATION
(specification of input atoms: PDB file, MODEL, whether water and hydrogens are read)
- DISTANCE RANGE AND REFERENCE ATOM TYPE SPECIFICATION
(specification of the range allowed between atoms to create the initial sets;
specification of the required atom type of each set’s ‘reference atom’.)
- SET RESTRICTION SPECIFICATION
(further selection of sets by the set properties: heterogeneity of chain identifiers,
chain identifier content, and heterogeneity of atom type)
- ATOM FILTERING SPECIFICATION
(selection of atoms to be output from the restricted sets: either all atoms or
by the atom properties atom type and atom chain)
- ALTERNATIVE "_WW" FILES SPECIFICATION: CHARGE AND STRUCTURAL RESTRICTIONS
(selection of additional output files with further structural restrictions: charged,
no main chain, charged non main chain, helices and beta, non main chain carbon;
and a separate option to make full residue copies of all structural files)
- INFORMATIONAL "_SETS" FILE SPECIFICATION
(optional informational output file with set information; column sort options)
- OUTPUT SPECIFICATION
(specification of output: location, file names, directory paths)
OPTIONS
SINGLE CHARACTER OPTIONS VS. KEYWORD OPTIONS
Single character options and keyword options are used on the command line to specify the
execution of that program: input, output, and the control of the data. Single character
options, and associated option values, are terse. Keyword options, and associated option
values, are descriptive.
Single character options start with a single hyphen (e.g., '-h'). If a single character
option has an associated value, a space must separate the option from the value (e.g.,
'-o run1').
Keyword options start with double hyphens (e.g., '--help'). If a keyword option has an
associated value, an '=' must immediately follow the keyword and be immediately followed
by the value; there are no included spaces (e.g., '--output_prefix=run1'). Most wwavePDB
keyword options can be shortened:
Keyword_Option_________Shortened_Keyword_Options_________
--chainids= .......... --chains ............... --cids=
--column_order= ...... --order=
--inputfn= ........... --input= ............... --in=
--make_ww= ........... --makeww= .............. --make=
--output_prefix= ..... --output= .............. --out=
--read_hydrogen= ..... --hydrogen= ............ --readh
--read_water= ........ --water= ............... --readw
--restrict_set= ...... --restrict=
--wwfn= .............. --ww=
Single character options and keyword options may be used together on the same command
line as long as they do not conflict with each other; wwavePDB will object with an
error if they conflict. All options accept terse or verbose option values (e.g., 'c'
or "atom_and_hetatm"). Options that have multiple option values (e.g., 'column_order='
('-q') or '--make_ww=' ('-m')) may have these option values be given as a list, with
either no intervening spaces (e.g., '-q 240') or separated by a non-alphanumeric
character (e.g., '--column_order=aname,chain,dist').
INPUT SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-i <filename> .... --inputfn=<filename> ... input pdb filename .................. required
-n # ............. --model=# .............. MODEL # within PDB to process ....... first model
-x ............... --read_water ........... include water atoms ................. no waters
-y ............... --read_hydrogen ........ include hydrogen atoms .............. no hydrogen
The input PDB file must be specified with the '--inputfn=' ('-i') option. If the input
PDB file has no MODEL records, then all atoms will be processed; if MODEL records exist,
the model specified with the '--model=' ('-n') option will be processed. If no model
is specified, then only the atoms in the first model will be processed. The option
'--read_water ('-x') specifies that water atoms are to be included in pdb atom input.
The option '--read_hydrogen' ('-y') specifies that hydrogen atoms are to be included in
pdb atom input. By default, no water or hydrogen atoms are read in as pdb atom input.
OUTPUT SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-o <prefix> ...... --output_prefix=<str> .. output file prefix .................. none
-w <filename> .... --wwfn=<filename> ...... complete filename for "_ww" file .... stdout
-s [<filename>] .. --sets ................. output "_sets" file, opt. w/name .... no output
--setsfn=<filename> .... output "_sets" file, opt. w/name .... no output
-l ............... --log .................. output execution summary to stderr .. no log
The option '--output_prefix=' ('-o') specifies an output filename prefix to be used for
the output files. If the '--output_prefix=' ('-o') option is used, output filenames will
start with "user specified prefix" and end with: "_sets" or "_ww". Different suffixes
will automatically be added reflecting "alternative structure" file content (e.g.,
"_non_mainchain_carbon").
The '--setsfn=' ('-s') and '--wwfn=' ('-w') options override the '--output_prefix=' ('-o')
option and specify complete names for the "_sets" and "_ww" output files, respectively.
If neither option '--output_prefix=' ('-o') nor option '--wwfn=' ('-w') is used, the "_ww"
file will be written to stdout. ('stdout' and 'stderr' are specifications for unix file
pointers that are normally output to your screen, but may be redirected.) If the option
'--sets' ('-s') is specified without a filename, and the option '--output_prefix=' ('-o')
is not used, then the "_sets" file will be output to stdout. wwavePDB will handle
directions to simultaneously output both "_sets" and "_ww" files to stdout as an error.
Errors and warnings go to stderr. stderr may be redirected to a new file by appending
' 2>filename' to your wwavePDB command line; this will overwrite the previously existing
file. stderr may be redirected to be appended to a possibly existing file, or to write
a new file if the file does not exist, by appending ' 2>>filename' to your wwavePDB command.
The option '--log' ('-l') specifies that an execution summary should be output to stderr.
A log file, containing all wwavePDB execution summaries and any existing errors and
warnings, may be made by using the '--log' ('-l') option and by appending ' 2>>filename'
to your wwavePDB command. (But change 'filename', above, to the filename of your log file.
Make sure to have a space separating your wwavePDB command arguments from the stderr
redirection.)
Output file locations specified using options '--output_prefix=' ('-o'), '--setsfn=' ('-s'),
or '--wwfn=' ('-w') may include directory paths; files do not have to be in the immediate
directory. Specified directories will be created if they do not already exist, assuming
appropriate user ownership and permissions. (So a specific directory may be specified for
program wwavePDB output by using the option '-output_prefix=<path>'; e.g., '-o N6/1w1x'
directs program wwavePDB output to be prefaced '1w1x' and placed in a directory named 'N6').
NOTE: All output files will overwrite identically named existing files.
DISTANCE RANGE SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-a # ............. --min=# ................ minimum distance between atoms ...... 0.0
-z # ............. --max=# ................ maximum distance between atoms ...... infinity
The keyword options '--min=' ('-a') and '--max=' ('-z') specify the required distance
range as restricted minimum and maximum distances between atoms. An associated numeric
value (either integer or real) is required for these keywords. If '--min=#' ('-a') is
not specified then no minimum is required; the default value of '0.0' will be used.
If '--max=#' ('-z') is not specified then no maximum is required; the default value of
infinity will be used. While all atom distances are allowed by default, useful output
requires specifying this distance range.
SET RESTRICTION SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-r <suboptions> .. --restrict_set=<opt> ... set restrictions .................... 3 or "any"
[1|2|3]{1} pick one: restrict set chain id content ....... 3 or "any"
1 ........... "one_chain" ........... only homogeneous sets of chain ids
2 ........... "two_or_more_chains" .. only heterogeneous sets of chain ids
3 ........... "any_chains" .......... heterogeneous and/or homogeneous sets
[a|b|c]? pick none or one: restrict set atom content ........... none
a ........... "only_atoms" .......... only ATOMs
b ........... "only_hetatms" ........ only HETATMs
c ........... "atom_and_hetatm" ..... 1+ ATOMs and also 1+ HETATMs
[g|h|i]? pick one: restrict set reference atom type .... i or "ref_any_type"
g ........... "only_ref_atom" ....... only ATOMs
h ........... "only_ref_hetatm" ..... only HETATMs
i ........... "ref_any_type" ........ ATOMs and HETATMs
[w]? pick none or one: restrict set chain id content ....... none
w ........... "includes_all_cids" ... subset of atoms w/all '-c' chain ids
[x|y|z]? pick none or one: restrict set chain id content ....... none
x ........... "no_cids" ............. no atoms with a '-c' chain id
y ........... "one_or_more_cids" .... 1+ atoms with a '-c' chain id
z ........... "only_cids" ........... only atoms with a '-c' chain id
-c <chainids> .... --chainids=<chainids> .. for '-r', '-f', '--restrict_set=' ... none
Each ‘reference’ atom and its associated atoms within the specified distance range, define
an initial set. These initial sets are solely based on distance. By default, i.e. if no
restrictive options other than '-min=' ('-a') and '--max=' ('-z') are used, the atoms in
these sets will be output to the "_ww" file. The keyword option '--restrict_set=' ('-r')
specifies further restriction of these initial "distance specified" sets; these initial
sets may be culled by a restriction of the reference atom type or by the set properties:
heterogeneity of chain identifiers, chain identifier content, and heterogeneity of atom type.
Restricting the Atom Type of the Reference Atom
By default, the reference atom (i.e., the atom in each set from which other atoms are measured)
may have an atom type of either 'ATOM' or 'HETATM'; this is the property defined by the
'--restrict_set=' ('-r') option value "ref_any_type" ("i"). The '--restrict_set=' ('-r')
option value "only_ref_atom" ("g") restricts reference atoms to those with the atom type 'ATOM'.
The '--restrict_set=' ('-r') option value "only_ref_hetatm" ("h") restricts reference atoms
to those with the atom type 'HETATM'. One, and only one, of these options must be selected
(if only by default).
Heterogeneity of Chain Identifiers
By default, there are no restrictions on sets based upon chain identifiers (i.e., either
all atoms in a set may have the same chain identifier or a set may have atoms with
different chain identifiers); this is the property defined by the '--restrict_set=' ('-r')
option value “any_chains” (“3”). The '--restrict_set=' ('-r') option value “one_chain”
(“1”) restricts sets to those sets that only have atoms with the same chain identifier.
The '--restrict_set=' ('-r') option value “two_or_more_chains” (“2”) restricts sets to
those sets that have at least two atoms with different chain identifiers. One, and only
one, of these options must be selected (if only by default).
Heterogeneity of Atom Type
By default, sets may have any number of atoms with either an ATOM or HETATM atom type.
Set restriction by atom type may be specified with the '--restrict_set=' ('-r') option
values: “only_atoms” (“a”) restricts sets to those sets with only ATOMs (and no HETATMs),
“only_hetatms” (“b”) restricts sets to those sets with only HETATMs (and no ATOMs),
“atom_and_hetatm” (“c”) restricts sets to those sets that have at least one ATOM and
also at least one HETATM. One or none of these options may be selected.
Specifying a List of Chain Identifiers
The option '--chainids=' ('-c') requires as an option value a list of chain identifiers.
This list of chain identifier characters should be consecutively listed (e.g., '-c ABC').
If characters other than alphanumerics are used as chain identifiers, the chain id
characters may be enclosed in single quotes (e.g., ' ABC' for chain identifiers ' ',
'A', 'B', and 'C'.) A backslash '\' may be used to quote a single quote (e.g., '\'')
or a blacklash (e.g., '\\') should such a character be used as a chain identifier.
This list of chain identifiers is used with the option '--restrict_set=' ('-r') and
the option '--filter=' ('-f').
Chain Identifier Content
By default, sets may have atoms with any chain identifier. Set restriction by chain
identifier content may be specified with specific chain identifiers listed as the
'--chainids=' ('-c') option value and with one of the following '--restrict_set=' ('-r')
option values: “no_cids” (“x”) restricts sets to those sets that contain no atoms having
chain identifiers in the chain identifier list, “one_or_more_cids” (“y”) restricts sets
to those sets that contain one or more atoms having chain identifiers in the chain
identifier list, “only_cids” (“z”) restricts sets to those sets that contain only atoms
having chain identifiers from the chain identifier list. One or none of these options
may be selected.
The '--restrict_set=' ('-r') option value “includes_all_cids” (“w”) further restricts
sets to those sets that contain a subset of atoms with all of the chain identifiers
listed with option '--chainids=' ('-c').
ATOM FILTERING SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-f <suboptions> .. --filter=<option> ...... select output filters ............... 6 or "any"
[4|5|6]{1} pick one: filter set output by atom type ...... 6 or "any"
4 ........... "output_only_atoms" ... output only ATOMs
5 ........... "output_only_hetatms" . output only HETATMs
6 ........... "output_any_atom_type" output any ATOMS and/or HETATMs
[r|t]? pick none or one: filter set output by chain id ....... none
r ........... "output_only_cid" ..... output only atoms w/a '-c' chain id
t ........... "output_no_cid" ....... output only atoms w/no '-c' chain ids
Options have been described above to define sets based upon distance range and to restrict
these sets by set properties. If no further options are used, the atoms in these sets will
be output to the "_ww" file. The keyword option '--filter=' ('-f') specifies which atoms of
these sets will be output to the "_ww" file by using filters based upon the atom properties
atom type and atom chain.
Filter by Atom Type
By default, all ATOMs or HETATMs of the sets satisfying the distance and the set restriction
options will be output to the "_ww" file; this is the property defined by the '--filter='
('-f') option value “output_any_atom_type” (“6”). The '--filter=' ('-f') option value
“output_only_atom” (“4”) restricts atom output to only atoms with the atom type 'ATOM'.
The '--filter=' ('-f') option value “output_only_hetatms” (“5”) restricts atom output to
only atoms with the atom type 'HETATM'. One, and only one, of these options must be
selected (if only by default).
Filter by Chain Identifier
By default, all atoms, with any chain identifier, of the sets satisfying the distance and
the set restriction options will be output to the "_ww" file. Atom output restriction by
chain identifier content may be specified with specific chain identifiers listed as the
'--chainids=' ('-c') option value and with either of the following '--filter=' ('-f')
option values: “output_only_cid” (“r”) restricts atom output to only those atoms having
chain identifiers specified in the chain identifier list, “output_no_cid” (“t”) restricts
atom output to only those atoms NOT having chain identifiers specified in the chain
identifier list. One or none of these options may be selected.
ALTERNATIVE "_WW" FILES SPECIFICATION: CHARGE AND STRUCTURAL RESTRICTIONS
Character_Option_____Keyword_Option_________Description___________________________Default____
-m <suboptions> .. --make_ww=<opt> ........ outputs additional "_ww" files ...... none
[1-5r]+ ....... pick one or more:
1 ........... "charged" ............. make "_ww_charged" file
2 ........... "no_mainchain" ........ make "_ww_no_mainchain" file
3 ........... "charged_non_mainchain" make "_ww_charged_non_mainchain" file
4 ........... "helices_and_beta" .... make "_ww_helices_and_beta" file
5 ........... "non_mainchain_carbon" make "_ww_non_mainchain_carbon" file
12345 ....... "all" ................. make all the above additional files
r ........... "residue" ............. make residue copies of "_ww" files
Options have been described above to define sets based upon distance range, to optionally
restrict these sets by set properties, and to optionally further restrict the atoms to be
output by atom properties. A file with the suffix "_ww" will be written with these atoms
in PDB format.
Additional "_ww" files can be created that have further structural or electrostatic
restrictions to the atom output. The option '--make_ww=' ('-m') creates these additional
files with the following option values, named by atom output restriction: “charged” (“1”),
“no_mainchain” (“2”), “charged_non_mainchain” (“3”), “helices_and_beta” (“4”), and
“non_mainchain_carbon” (“5”). These additional "_ww" files will be named with suffixes of
"_ww_" appended by the option value (e.g., "_ww_charged"). Note that '--make_ww=all' may
be used to make all of these alternative files without requiring individual specification.
Using the keyword option/value pair '--make_ww=residue' or the terse option '-m r' will
create "full residue" copies of all output "_ww" files. These files will have "_allres"
further added to the filename suffix. The entire residue of every atom output to the
original version of the specific "_ww" file will be output to the "full residue" version
of these "_ww" files.
INFORMATIONAL "_SETS" FILE SPECIFICATION
Character_Option_____Keyword_Option_________Description___________________________Default____
-q <suboptions> .. --column_order=<opt> ... column sort list by col.# or name ... 0 or "dist"
[[0-9]+|n] ..... pick the character string "none" or one or more:
0 ........... "dist" ................ distance
1 ........... "atype" ............... atom type
2 ........... "aname" ............... atom name
3 ........... "anum" ................ atom number
4 ........... "chain" ............... chain id
5 ........... "rname" ............... residue name
6 ........... "rnum" ................ residue number
7 ........... "x" ................... x coordinate
8 ........... "y" ................... y coordinate
9 ........... "z" ................... z coordinate
n ........... "none" ................ NO SORTING
The option '--sets' ('-s') creates an optional informational file, suffixed "_sets", that
lists sets of atoms that match the execution restrictions. These sets consist of one line
for each atom in that set, each atom line having 10 columns. These lines may be sorted by
the column order specified as a list with the option '--column_order=<column_list>' ('-q').
Columns in the column list may be either specified as zero-ordered column numbers (0 - 9)
or as short descriptors ("dist", "atype", "aname", "anum", "chain", "rname", "rnum", "x",
"y", or "z"). The first listed column specification in the '--column_order=' ('-q') list
will be the primary column that will be sorted. If there are atom lines in a set that have
identical values in that primary column, and a second column specification is listed in the
'--column_order=' ('-q') list, then the second column will be used as a secondary sort,
and so on for further column specifications. The column specification list may have no
delimiters (e.g., '-q 463') or it may be delimited by a non-alphanumeric character (e.g.,
'--column_order=chain,rnum,anum'). The first column, distance, is sorted upon by default;
this is equivalent to '--column_order=0'. To specify that no sorting is to be done, use
the sort descriptor "none" or the letter 'N'.
HELP
Character_Option_____Keyword_Option_________Description___________________________Default____
-h ............... --help ................. print more help (Enter 'wwavePDB -h' for help.)
<NO OPTIONS> .............................. shorter option synopsis (Just enter 'wwavePDB'.)
USE
While an individual execution of wwavePDB is simple, the use of wwavePDB may include
running wwavePDB multiple times on an individual PDB file, running wwavePDB on several
PDB files, and also combining wwavePDB output as input for additional wwavePDB executions.
Creating multiple "_ww" files with different distance range restrictions may be helpful
in identifying sets of atoms associated with particular molecular functions. Creating
multiple "_ww" files with different required distance ranges may be helpful in identifying
common spatial occupancy. Options '--min=' ('-a') and '--max=' ('-z') set the required
distance range between atoms. For example, run wwavePDB with the '--max=' ('-z') values:
'7.0', '6.5', '6.0', '5.0', '4.0', '3.5', '3.0', and '2.5'. A maximum around 3.0 to 4.0
may be useful for identifying contact interface atoms. Distributed features within a
structure are not necessarily restricted to a local cluster of atoms. For identifying
distributed features whose common spatial occupancy is found in two different structures,
the upper limit for a distance range would be the maximum distance between any of the atoms
of the smaller of the structures being examined. Compare the results at different ranges!
When multiple structures are found that share a common spatial occupancy (i.e., there
exists three or more specific atoms in each structure where the distances between each pair
of specific atoms within each structure matches the distances between each corresponding
pair of specific atoms in other structures), then there is the possibility of a shared
structural feature.
Existing molecular display programs can be used to visualize the shared common spatial
occupancy. Molecular display programs may have a "pair fitting" command to superposition
structures. Alternatively, the Weininger Works program 'twwistPDB' will map the atom
coordinates of one PDB file to the atom coordinates of a different PDB file given three
specific atoms from each PDB file.
The option '--restrict_set=atom_and_hetatm' ('-r c') specifies that, in each set of atoms
within a range of a selected atom, there must be at least one ATOM and at least one HETATM;
this may be useful if the initial input PDB file contains a substrate, and the resultant
"_ww" file is intended to contain identifying contact interface atoms. Conversely, lack
of the options '--restrict_set=only_atoms' ('-r a'), '--restrict_set=only_hetatms' ('-r b'),
and '--restrict_set=atom_and_hetatm' ('-r c') specifies that selected atom sets, in the
absence of other constraints, may include any number of ATOM or HETATM atoms; this may be
useful when you have no HETATMs present in the original PDB file, and the resultant "_ww"
file is intended to contain conserved structure. Distributed features across multiple
structures may be found by comparing the "_ww" files containing conserved structure.
The option '--chainids=' ('-c') can be used with the '--restrict_set=' ('-r') option and
'--filter=' ('-f') option to further restrict output atom sets based on chain identifiers.
This may be used to identify features related to one, or more than one chain. This may
also be used to compare different molecules. The chain identifiers of a PDB file (or of
wwavePDB output) can be easily changed with the Weininger Works program 'chainidPDB'.
EXAMPLES
The following example generates 2 output files, '1AIY_sets.txt' and '1AIY_ww.pdb',
where all output atoms must be within 5.0 Angstroms of another atom.
wwavePDB --inputfn=1AIY.pdb --out_prefix=1AIY --max=5.0 --sets
or
wwavePDB -i 1AIY.pdb -o 1AIY -z 5.0 -s
-i 1AIY.pdb .. --inputfn=1AIY.pdb ............... input is from file '1AIY.pdb'
-o 1AIY ...... --output_prefix=1AIY ............. output will be prepended with '1AIY'
-z 5.0 ....... --max=5.0 ........................ atoms must be within 5.0 A of each other
-s ........... --sets ........................... output '_sets' file
The following example is similar to the above, except that the files will only
include sets of atoms that that are a distance of 3.0 to 5.0 Angstroms apart and
where each set also has at least one ATOM and one HETATM.
wwavePDB --inputfn=1AIY.pdb --out_prefix=1AIY --min=3.0 --max=5.0 --sets \
--restrict_set=any_chains --restrict_set=atom_and_hetatm \
--filter=output_any_atom_type
or
wwavePDB -i 1AIY.pdb -o 1AIY -a 3.0 -z 5.0 -s -r 3c -f 6
as above, and:
-a 3.0 ....... --min=3.0 ........................ atoms must be further than 3.0 A apart
-r 3c ........................................... output "_ww" file, where sets have:
3 ......... --restrict_set=any_chains ........ any # of chain identifiers
c ......... --restrict_set=atom_and_hetatm ... 1+ ATOMs and 1+ HETATMs
-f 6 ............................................ output to "_ww" file:
6 ......... --filter=output_any_atom_type .... all ATOMs and HETATMs from matching sets
The following example is similar to the above, except that one "_sets" file and
12 "_ww" files will be output into a new, if not already existing, directory '1aiy':
1AIY_sets.txt,
1AIY_ww.pdb, 1AIY_ww_allres.pdb,
1AIY_ww_charged.pdb, 1AIY_ww_charged_allres.pdb,
1AIY_ww_no_mainchain.pdb, 1AIY_ww_no_mainchain_allres.pdb,
1AIY_ww_charged_non_mainchain.pdb, 1AIY_ww_charged_non_mainchain_allres.pdb,
1AIY_ww_helices_and_beta.pdb, 1AIY_ww_helices_and_beta_allres.pdb,
1AIY_ww_non_mainchain_carbon.pdb, 1AIY_ww_non_mainchain_carbon.pdb.
wwavePDB --inputfn=1AIY.pdb --out_prefix=1aiy/1AIY --min=3.0 --max=5.0 --sets \
--restrict_set=any_chains --restrict_set=atom_and_hetatm \
--filter=output_any_atom_type \
--make_ww=all --make_ww=residue
or
wwavePDB -i 1AIY.pdb -o 1aiy/1AIY -a 3.0 -z 5.0 -s -r 3c -f 6 -m 12345r
as above, and:
-o 1aiy/1AIY . --out_prefix=1aiy/1AIY ........... output to dir. '1aiy' with '1AIY' prefix
-m 12345 ..... --make_ww=all .................... output additional "_ww" files:
1 ............................................ "_ww_charged"
2 ............................................ "_ww_no_mainchain"
3 ............................................ "_ww_charged_non_mainchain"
4 ............................................ "_ww_helices_and_beta", and
5 ............................................ "_ww_non_mainchain_carbon"
r.......... --make_ww=residue ................ make "full residue" copies of "_ww" files
The following example generates the file '1AIY_KL_ww.pdb' containing atoms of
the chains 'K' and 'L' in 1AIY.pdb that are within a distance of 3.3 Angstroms,
and also generates a file '1AIY_KL_sets.txt' containing the sets that fulfill
the specified restrictions as lists ordered primarily by chain identifier,
secondarily by atom name, and lastly by distance.
wwavePDB --inputfn=1AIY.pdb --output_prefix=1AIY_KL \
--restrict_set=any_chains --restrict_set=includes_all_cids \
--filter=output_any_atom_type --filter=output_only_cid --chainids=KL \
--max=3.3 --sets --column_order=chain,aname,dist
or, more conscisely (using keyword abbreviations and option value lists)
wwavePDB --in=1AIY.pdb --out=1AIY_KL --restrict=any_chains,includes_all_cids \
--filter=output_any_atom_type,output_only_cid --chains=KL \
--max=3.3 --sets --order=chain,aname,dist
or, more succinctly (as above and leaving out defaults)
wwavePDB --in=1AIY.pdb --out=1AIY_KL --max=3.3 --order=chain,aname,dist \
--restrict=includes_all_cids --filter=output_only_cid --chains=KL
or, more tersely (using single character options)
wwavePDB -i 1AIY.pdb -o 1AIY_KL -r 3w -f 6r -c 'KL' -z 3.3 -s -q 420
-i 1AIY.pdb .. --inputfn=1AIY.pdb ............... input is from file '1AIY.pdb'
-o 1AIY ...... --output_prefix=1AIY ............. output will be prepended with '1AIY_KL'
-r 3w ........................................... output "_ww" file, where sets have:
3 ......... --restrict_set=any_chains ........ any # of chain identifiers
w ......... --restrict_set=includes_all_cids . at least one atom with each '-c' chain id
-f 6 ............................................ output to "_ww" file:
6 ......... --filter=output_any_atom_type .... all ATOMs and HETATMs from matching sets
r ......... --filter=output_only_cid ......... atoms restricted to any '-c' chain ids
-c 'KL' ...... --chainids=KL .................... chains 'K', 'L' for use with '-r' and '-f'
-z 3.3 ....... --max=3.3 ........................ atoms must be within 3.3 A from each other
-s ........... --sets ........................... output '_sets' file
-q 420 ....... --column_order=chain,aname,dist .. order '_sets' file:
4 ........... chain .......................... primarily by the 4th column (chain id)
2 ........... anum ........................... secondarily by the 2nd column (atom name)
0 ........... dist ........................... and lastly by the 0th column (distance)
IMPLEMENTATION
A doubly linked list is searched to return all atoms with an exact distance range of another atom.
This linked list is ordered in the Cartesian axis with the most extreme points (within an axis).
One node is allocated for each atom. The doubly linked list is created with a single bucket sort.
Consider the following example data structure:
typedef struct point { // data structure for range searching
double x, y, x; // coordinate values
struct point *orig; // singly linked list of points in original order
struct point *sort;        // singly linked list for bucket sort collisions
struct point *less, *more; // doubly linked list for range searching
} POINT;
POINT **point_array;         // array of pointers for coordinate bucket sort
A separate singly linked list (orig) is set on reading the points.
To efficiently create the doubly linked list:
(i) Translate the coordinates of all points to be positive.
(ii) Allocate memory for a one dimensional array (point_array) of node pointers,
sized to the largest truncated dimension of the newly translated set of points.
(iii) Set the pointers of the array to NULL.
(iv) Fill the array with pointers to the POINT structs by using the truncated coordinates
of the axis (with the largest truncated dimension) as an index. Create singly linked
lists on collisions (using the pointer 'sort' in the point struct.)
(iv) Order the sort array for each point array index as needed.
(v) Read the point array to fill an ordered doubly linked list (e.g., less_x and more_x).
While some sorting (e.g., quicksort moving linked list pointers) will be required in step (iv)
on the bucket sort collisions (i.e., when multiple nodes are assigned to the identical array index),
no sorting is required in step (v) when moving between the array head pointers.
Creating a list of points having the property of being a specific distance range apart from each
other consists of running two linear searches starting at each node in an axis-ordered doubly
linked list. Each of these linear searches can stop once the maximum specified distance between
points (in all dimensions) is exceeded in the dimension of the search of the current linked list.
Searching a single doubly-linked list (ordered in any axis) will provide a complete search.
It is assumed that the axis with the furthest extreme points will produce a search with the
fewest required node distance calculations for a specific range. (While this processing time
is not necessarily shorter, it is more likely the case.)
The construction of the data structure for this algorithm has a worse case scenario of having
all nodes indexing into a single point_array index; the subsequent collision reduces the
processing order of the data structure build time from O(N) to one dependent on the sorting
algorithm used for collisions (e.g., for the quicksort used here, 'O(2N log(N))' for a
random permutation or 'O(N-squared/2)' for a worse case ordered permutation). This worse
case scenario would never be seen with input from a single normal PDB file as atoms in a
single structure can not all share the same location. This might be possible if the input
PDB file represents atoms from multiple PDB files; but more likely only a few atoms would
share the same location.
This algorithm requires that most of the traditional calculation of a distance, the square
root of the sums of the squares of differences of the points, be performed for every node
examined in these linear searches; note that the final square root can be ignored and the
sum of the squares can be compared.
This algorithm has a worse case search of having to examine all points to find one single
point satisfying the range criteria; this will happen when all the points are spread out from
the reference point from which the distance range is calculated in the doubly linked list,
along one of the dimensional axes that are not the sorted axis of the doubly linked list being
searched. This is minimized by having the axis of the doubly linked list be the axis with the
most extreme points (in a single axis).
NOTES
Memory issues should not be seen with normal use of wwavePDB on modern computers for
standard pdb files. wwavePDB will fail with an error if a memory problem is encountered.
This version of wwavePDB does not handle multiple "SPLIT" PDB files.
This version of wwavePDB does not perform parallel processing.
This version of wwavePDB handles (or rather mishandles) PDB files with alternate location
indicators ("AltLoc" records) by ignoring, with a warning, any alternate atom or residue
other than the first alternate atom or residue of each alternate set.
This version of wwavePDB does not handle non-standard "Chimera" PDB files with 6 byte
atom serial numbers and 4 byte residue names.
As the defaults for the minimum and maximum ranges between atoms are zero and infinity,
respectively, not specifying both the minimum and maximum range constraints results in
a search that will return all the original atoms of the input file --not that useful
unless other constraints are made. Further requesting the "_sets" file for these range
defaults will create a file for N distance sets with a total of N^2 atoms (i.e., the
distance ranges between every atom and every other atom): a potentially large file.
'wwave', a predecessor program of wwavePDB, identifies atoms in a PDB file within a
"rough maximum distance" of other atoms. 'wwave' uses a different algorithm
(the "Collected Grid Algorithm") for finding an imperfect range of atom distances;
it uses the indexing of truncated coordinates to find both all atoms within a specific
distance and possibly also some atoms of a larger distance (the square root of 3 larger).
'wwave' has a O(N) processing time --regardless of how distributed the search range.
'wwave' scales linearly with the number of atoms. However, unlike wwavePDB, 'wwave' does
NOT identify individual sets of atoms, but instead identifies the superset of all atoms
that match the distance requirements. While 'wwave' and wwavePDB have similar interfaces,
'wwave' does NOT have the option '--restrict_set=' ('-r'). The set handling of wwavePDB
is necessary for solving certain problems. If your problem can be handled by your hardware
and wwavePDB, wwavePDB has exact range searching and set restrictions and filters that act
upon sets based on individual atoms.
LICENSE INFORMATION
wwavePDB is a software program from Arthur Weininger (www.weiningerworks.com).
wwavePDB is subject to a license; use the keyword option '--license' in order to view
the license terms. Your use of this software contitutes an agreement to the license
terms. Do not use this software if you do not agree to the license terms.
wwavePDB Tutorial
wwavePDB Tutorial Page gives examples of using wwavePDB.