Weininger Works™ - chainidPDB Overview

chainidPDB

chainidPDB Overview

chainidPDB is a PDB filter to change one or all chain identifiers.

chainidPDB Help Output (“chainidPDB -h” output)

NAME

  chainidPDB (version 1.1.0) -- PDB filter to change chain identifiers

SYNOPSIS

  chainidPDB [options]

  CHARACTER OPTION_____KEYWORD OPTION________DESCRIPTION_____________________________DEFAULT__________
  -i <filename> .... --input=<filename>  ... input PDB file ........................ stdin
  -a <model#> ...... --model=<model#> ...... MODEL # of input PDB file to process .. first model
  -o <filename> .... --output=<filename> ... output PDB file ....................... stdout
  -f <from_spec> ... --from=<from_spec> .... existing "from" target specification .. none
     <from_spec> is either: <chain_ids>      (simple list of chain ids with no ';' or ',')
                    or:     [[c],[r],[a];]+  (triplet list of chain id, residue #, atom #)
  -t <chain_ids> ... --to=<chain_ids> ...... replacement "to" chain identifiers .... required
  -m <filename> .... --map=<filename> ...... generate file of atom changes ......... no map
  -p ............... --nopositionchange .... do not change position of PDB records . positions changed
  -x ............... --noresiduechange ..... do not change residue serial numbers .. residues  changed

  -h ............... --help ................ prints help (Enter 'chainidPDB -h' for help.)
  <NO OPTIONS> ............................. shorter option synopsis (Enter 'chainidPDB'.)
                     --license ............. prints license terms for chainidPDB.

DESCRIPTION

  chainidPDB is a PDB filter to change one or all chain identifiers.

  chainidPDB reads an existing input PDB file and writes a new output PDB file.
  Records matching a user-supplied target specification will have chain identifiers
  replaced with user-supplied chain identifiers.

  Input is expected to be in PDB format. Option '--input=' ('-i') specifies the input
  filename. If option '--input=' ('-i') is not present, input will be read from stdin.

  By default, the first MODEL (perhaps implicit if no MODEL records exist) from the
  input file is processed. Option '--model=' ('-a') may be used to specify the MODEL
  number of the input file to process.

  Option '--output=' ('-o') specifies the output PDB filename. If option '--output=' ('-o')
  is not present, output will be written to stdout.

  New "REMARK 250" lines will be added to the output PDB specifying execution details.

  By default, residues will be assigned new, unique residue serial numbers, new chains will
  be output in a new chain order with changed atom records in new positions. By default,
  a PDB file will be output that includes just the header records and specific ATOM, HETATM,
  TER, and CONECT records for a single MODEL.

  [Use of chainidPDB can easily produce invalid chains, which may be intended if
   chainidPDB is being used to create temporary files with the chain identifiers
   representing sets of atoms. Certain molecular visualization programs require unique
   residue serial numbers within chains in order to display these residues and atoms.]

  By default, chainidPDB fixes chains that have residues with duplicate residue
  sequence numbers. Atoms will be written out sequentially by new alphanumeric chain order;
  new atom serial numbers will be assigned as required. The new chain order will consist of
  any specified chains in the following order:

     'A' - 'Z'
     '0' - '9'
     'a' - 'z'
     ' ' (space)
     < all other ASCII values from 1 to 127, inclusive >

  Option '--noresiduechange' ('-x') may be used to specify that input residue numbers are to
  remain unchanged; a warning will be generated when ambiguous residue numbers are output.
  When the '--nopositionchange' ('-p') option is used, all PDB file records for the entire
  MODEL being processed will be written to the output PDB file in the identical positions,
  with respect to each other, of the input PDB file.

  The option '--map=' ('-m') may be used to generate a text file listing atom changes.
  No map file is generated by default.

  Any errors and warnings will be written to stderr.

  The option '--to=' ('-t') specifies the new, replacement chain identifiers as a simple
  list of chain identifiers (e.g., "MNOP").

  If the option '--from=' ('-f') is not present, then all ATOM, HETATM, and TER records
  will be changed with each existing PDB chain identifier getting a successive user-
  supplied replacement chain identifier.

  If the option '--from=' ('-f') is present, then only ATOM, HETATM, and TER records
  having content that matches the target specification of the '--from=' ('-f') option
  value will be changed. The '--from=' ('-f') option value is either a simple list of
  chain identifiers ("Simple List Format") or it is a list of triplets that allow chain
  identifiers, residue sequence numbers, and atom serial numbers to be used to specify
  matching record targets ("Triplet List Format").

  Simple List Format

  The Simple List Format of the '--from=' ('-f') option value is a character string of
  chain identifiers. This character string may not contain semicolons (';') or commas (',')
  unless the semicolons and commas are preceeded with a single backslash character ('\')
  semicolons and commas not quoted by a backslash will cause the '--from=' ('-f') option
  value to be interpreted as a Triplet List Format. Examples of the Simple List Format follow:

     Terse option     Keyword option     Description
    --------------   ----------------   -------------------------------------
    -f ABC           --from=ABC         chain IDs 'A','B','C'
    -f 'ABC XYZ'     --from='ABC XYZ'   chain IDs 'A','B','C',' ','X','Y','Z'
    -f "ABC AAA"     --from="ABC AAA"   chain IDs 'A','B','C',' ','A','A','A'

  Note that a single quote (') or a double quote (") may be used to delimit an option value.
  The '--from=' ('-f') option values shown above are quoted in order to specify that a space (' ')
  be used as a character identifier.

  Note that use of the hyphen character ('-') in the Simple List Format will be interpreted as
  the value of a chain identifier. See “Triplet List Format” below for range specifications.

  Triplet List Format

  The Triplet List Format of the '--from=' ('-f') option value allows records to be specified
  for chain identifier replacement by presence of chain identifier, residue sequence number,
  and atom serial number. This triplet list is defined as a semicolon-delimited list of
  triplets, where each triplet consists of three comma-delimited fields:

     chain identifier, residue sequence number, atom serial number

  Any of the '--from=' ('-f') triplet fields may optionally have a range specification
  in one of the following formats:

             Format               Examples                 Description
  -----------------------------   --------   --------------------------------------------
  <lower_limit> - <upper_limit>     A-Z       Matches any value in between and including
                                  107-200     lower limit and upper limit.

                - <upper_limit>      -Z       Matches any value lower than and including
                                     -200     upper limit.

  <lower_limit> -                   A-        Matches any value higher than and including
                                  107-        lower limit.

  Any of the '--from=' ('-f') triplet fields may have a null value (i.e., an empty field).
  Null-valued triplet fields will match any record value of that field type. A PDB record
  has to match all non-null fields in a triplet in order for chain identifer replacement.
  Comma delimiters need only be supplied as needed to specify a field index. Semicolon
  delimiters are required with the exception of the last semicolon, which is optional.

  Examples of the triplet list format follow:

     Terse option           Keyword option            Description
    --------------------   ----------------------    --------------------------------------------
    -f 'A;B;C;'            --from='A;B;C;'           chain IDs (any residue # and any atom #)
    -f 'A;B;C'             --from='A;B;C'            chain IDs (any residue # and any atom #)
    -f 'A;B;C; ;X;Y;Z'     --from='A;B;C; ;X;Y;Z'    chain IDs (any residue # and any atom #)
    -f "A;B;C; ;A;A;A"     --from="A;B;C; ;A;A;A"    chain IDs (any residue # and any atom #)
    -f "A,,;"              --from="A,,;"             chain ID 'A' (any residue # and any atom #)
    -f "A,35,;"            --from="A,35,;"           residue 35 of chain ID 'A' (with any atom #)
    -f "A,35,293;"         --from="A,35,293;"        atom 293 of residue 35 of chain ID 'A'
    -f ",35,;"             --from=",35,;"            residue 35 (any chain ID and any atom #)
    -f ",,293;"            --from=",,293;"           atom 293 (any chain ID and any residue #)
    -f "A,,;,35,;,,293;"   --from="A,,;,35,;,,293;"  three triplets with complete delimiters
    -f "A;,35;,,293"       --from="A;,35;,,293"      three triplets with necessary delimiters
    -r ",-78;,110-;"       --from=",-78;,110-;"      all residue #s except 79 through 109

  Notes for both Simple List Format and Triplet List Format

  The order of'--from=' ('-f') and '--to=' ('-t') target specifications is important in that:
    (i)  PDB records matching '--from=' ('-f') specifications get replacement chain
         identifiers from identically indexed '--to=' ('-t') specifications, and
    (ii) The first matching target specification will be used to change a record when
         there are multiple triplets that specify identical records to be changed;
         PDB records are only modified once.

  If there are more replacement chain identifiers required than there are specified
  '--to=' ('-t') chain identifiers, then the '--to=' ('-t') chain identifiers will be
  reused; a warning will be given (to stderr) when this happens.

  Chain identifiers, specified in '--from=' ('-f') or '--to=' ('-t') option values, can be
  any non-NULL character. If semicolons (';'), commas (','), backslashes ('\'), or option
  value quotes (''' or '"') are used as chain identifier values in any --from=' ('-f') or
  '--to=' ('-t') option values then these characters need to be preceded with a single
  backslash character ('\').

  As the unix shell interprets semicolons (';') as command line separators, command line option
  values containing semicolons (';') are best handled by quoting the entire option value with
  either single quotes (') or double quotes (").

EXAMPLES

  The following will change only the ATOM, HETATM, or TER records having the
  chain identifier 'A' to similar records having the chain identifier 'B'.

     With keyword options:

        chainidPDB --input=infile.pdb --output=outfile.pdb --from=A --to=B

     With character options:

        chainidPDB -i infile.pdb -o outfile.pdb -f A -t B

  The following will change all ATOM, HETATM, or TER records to similar records
  having the chain identifier 'Z'. (No '--from=' ('-f') option was used.)

     With keyword options:

        chainidPDB --input=infile.pdb --output=outfile.pdb --to=Z

     With character options:

        chainidPDB -i infile.pdb -o outfile.pdb -t Z

  The following will change only the ATOM, HETATM, or TER records having the
  chain identifiers 'A', 'B', 'C', and 'D' to similar records having the
  chain identifier '1', '2', '3', and '4', respectively.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from="A;B;C;D" --to="1234

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f "A;B;C;D" -t 1234

  The following will change only the ATOM, HETATM, or TER records having the
  chain identifiers 'A', 'B', 'C', 'D', and ' ' (space) to similar records
  having the chain identifier '1', '2', '3', '4', and 'Z', respectively.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from="A;B;C;D; ;" --to="1234Z"

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f "A;B;C;D; ;" -t "1234Z"

  The following will change only the ATOM, HETATM, or TER records having the
  chain identifier 'A' and the residue number 36 to similar records having the
  chain identifier 'Y'.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from="A,36" --to="Y"

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f "A,36" -t "Y"

  The following will change only the ATOM, HETATM, or TER records having the
  residue sequence number 35 (and any chain identifier) to similar records having
  the chain identifier 'Q'.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from=",35" --to="Q"

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f ",35" -t "Q"

  The following will change only the ATOM, HETATM, or TER records having the
  atom serial number 293 (and any chain identifier or residue number) to similar
  records having the chain identifier 'Z'.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from=",,293" --to="Z"

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f ",,293" -t "Z"

  The following example shows a triplet list having multiple triplet specifications
  that may result in targeting identical records. The ATOM, HETATM, or TER records
  having the chain identifier 'A' are specified to be changed to similar records
  having the chain identifer 'X'. The ATOM, HETATM, or TER records having the
  residue number 35 are specified to be changed to similar records having the chain
  identifer 'Y'. The ATOM, HETATM, or TER records having the atom serial number 293
  are specified to be changed to similar records having the chain identifer 'Z'.

     With keyword options:

        chainidPDB --input=infile.pdb --output=temp1.pdb --from="A;,35;,,293" --to="XYZ"

     With character options:

        chainidPDB -i infile.pdb -o temp1.pdb -f "A;,35;,,293" -t "XYZ"

  If the following was part of the input "infile.pdb" of the above chainidPDB cammand:

    ATOM    287  CE2 TYR A  34      53.611  15.359   1.623  1.00 35.63           C
    ATOM    288  CZ  TYR A  34      54.658  15.283   2.521  1.00 36.11           C
    ATOM    289  OH  TYR A  34      54.406  15.463   3.861  1.00 35.18           O
    ATOM    290  N   ILE A  35      54.408  13.226  -4.218  1.00 38.28           N
    ATOM    291  CA  ILE A  35      54.395  13.169  -5.673  1.00 40.04           C
    ATOM    292  C   ILE A  35      53.691  14.397  -6.223  1.00 39.27           C
    ATOM    293  O   ILE A  35      52.849  15.000  -5.546  1.00 39.06           O
    ATOM    294  CB  ILE A  35      53.657  11.910  -6.210  1.00 41.31           C
    ATOM    295  CG1 ILE A  35      52.229  11.858  -5.666  1.00 42.67           C
    ATOM    296  CG2 ILE A  35      54.420  10.649  -5.838  1.00 42.60           C
    ATOM    297  CD1 ILE A  35      51.227  12.665  -6.474  1.00 44.23           C
    ATOM    298  N   TRP A  36      54.062  14.775  -7.440  1.00 38.06           N
    ATOM    299  CA  TRP A  36      53.444  15.900  -8.121  1.00 37.24           C
    ATOM    300  C   TRP A  36      52.385  15.275  -9.024  1.00 37.32           C

  Then the output would show that the PDB records having the chain identifier 'A' were
  changed to similar records having the chain identifier 'X' unless the target records
  had the residue number 35 in which case the output records were given the chain
  identifier 'Y' unless the target record had the atom serial number 293 in which case
  the output record was given the chain identifier 'Z':

    ATOM    287  CE2 TYR X  34      53.611  15.359   1.623  1.00 35.63           C
    ATOM    288  CZ  TYR X  34      54.658  15.283   2.521  1.00 36.11           C
    ATOM    289  OH  TYR X  34      54.406  15.463   3.861  1.00 35.18           O
    ATOM    290  N   ILE Y  35      54.408  13.226  -4.218  1.00 38.28           N
    ATOM    291  CA  ILE Y  35      54.395  13.169  -5.673  1.00 40.04           C
    ATOM    292  C   ILE Y  35      53.691  14.397  -6.223  1.00 39.27           C
    ATOM    293  O   ILE Z  35      52.849  15.000  -5.546  1.00 39.06           O
    ATOM    294  CB  ILE Y  35      53.657  11.910  -6.210  1.00 41.31           C
    ATOM    295  CG1 ILE Y  35      52.229  11.858  -5.666  1.00 42.67           C
    ATOM    296  CG2 ILE Y  35      54.420  10.649  -5.838  1.00 42.60           C
    ATOM    297  CD1 ILE Y  35      51.227  12.665  -6.474  1.00 44.23           C
    ATOM    298  N   TRP X  36      54.062  14.775  -7.440  1.00 38.06           N
    ATOM    299  CA  TRP X  36      53.444  15.900  -8.121  1.00 37.24           C
    ATOM    300  C   TRP X  36      52.385  15.275  -9.024  1.00 37.32           C

  The above example was used to show multiple triplet specifications targeting identical
  records. The triplet list used above ("A;,35;,,293") has a middle triplet (",35;")
  that would change records having a residue 35 in ALL chains. The triplet ("A,35;")
  would be used to only change records for residue 35 in chain A.

  The following will change only the ATOM, HETATM, or TER records having:
    chain identifier A, or
    chain identifier B, or
    both chain identifier C and a residue sequence number >= 1 and <= 49, or
    both chain identifier C and a residue sequence number >= 63, or
    chain identifier D.
  The changed records will be given the chain identifer 'E'.

  With keyword options:

    chainidPDB --input=1BBT.pdb --output=temp1.pdb --from=“A;B;C,1-49;C,63-;D;” --to=“E”

  With character options:

    chainidPDB -i 1BBT.pdb -o temp1.pdb -f “A;B;C,1-49;C,63-;D;” -t “E”

LICENSE INFORMATION

  chainidPDB is a software program from Arthur Weininger (weiningerworks.com).
  chainidPDB is subject to a license; use the keyword option '--license' in order to view
  the license terms. Your use of this software contitutes an agreement to the license terms.
  Do not use this software if you do not agree to the license terms.

Back to TOC

chainidPDB Tutorial

The Picornavirus Monograph Superposition Shell Script gives examples of using chainidPDB.

Back to TOC

chainidPDB

Overview

chainidPDB Help Output

chainidPDB Tutorial

(WEB PAGE PDF)

chainidPDB Overview

chainidPDB Help Output (“chainidPDB -h” output)

chainidPDB Tutorial