4.3.1. Hydrogen Bond analysis — MDAnalysis.analysis.hbonds.hbond_analysis

Author

David Caplan, Lukas Grossar, Oliver Beckstein

Year

2010-2017

Copyright

GNU Public License v3

Given a Universe (simulation trajectory with 1 or more frames) measure all hydrogen bonds for each frame between selections 1 and 2.

The HydrogenBondAnalysis class is modeled after the VMD HBONDS plugin.

Options:
  • update_selections (True): update selections at each frame?

  • selection_1_type (“both”): selection 1 is the: “donor”, “acceptor”, “both”

  • donor-acceptor distance (Å): 3.0

  • Angle cutoff (degrees): 120.0

  • forcefield to switch between default values for different force fields

  • donors and acceptors atom types (to add additional atom names)

4.3.1.1. Output

The results are
  • the identities of donor and acceptor heavy-atoms,

  • the distance between the heavy atom acceptor atom and the hydrogen atom that is bonded to the heavy atom donor,

  • the angle donor-hydrogen-acceptor angle (180º is linear).

Hydrogen bond data are returned per frame, which is stored in HydrogenBondAnalysis.timeseries (In the following description, # indicates comments that are not part of the output.):

results = [
    [ # frame 1
       [ # hbond 1
          <donor index (0-based)>,
          <acceptor index (0-based)>, <donor string>, <acceptor string>,
          <distance>, <angle>
       ],
       [ # hbond 2
          <donor index (0-based)>,
          <acceptor index (0-based)>, <donor string>, <acceptor string>,
          <distance>, <angle>
       ],
       ....
    ],
    [ # frame 2
      [ ... ], [ ... ], ...
    ],
    ...
]

Using the HydrogenBondAnalysis.generate_table() method one can reformat the results as a flat “normalised” table that is easier to import into a database or dataframe for further processing. The table itself is a numpy.recarray.

4.3.1.2. Detection of hydrogen bonds

Hydrogen bonds are recorded based on a geometric criterion:

  1. The distance between acceptor and hydrogen is less than or equal to distance (default is 3 Å).

  2. The angle between donor-hydrogen-acceptor is greater than or equal to angle (default is 120º).

The cut-off values angle and distance can be set as keywords to HydrogenBondAnalysis.

Donor and acceptor heavy atoms are detected from atom names. The current defaults are appropriate for the CHARMM27 and GLYCAM06 force fields as defined in Table Default atom names for hydrogen bonding analysis.

Hydrogen atoms bonded to a donor are searched with one of two algorithms, selected with the detect_hydrogens keyword.

“distance”

Searches for all hydrogens (name “H*” or name “[123]H” or type “H”) in the same residue as the donor atom within a cut-off distance of 1.2 Å.

“heuristic”

Looks at the next three atoms in the list of atoms following the donor and selects any atom whose name matches (name “H*” or name “[123]H”). For

The “distance” search is more rigorous but slower and is set as the default. Until release 0.7.6, only the heuristic search was implemented.

Changed in version 0.7.6: Distance search added (see HydrogenBondAnalysis._get_bonded_hydrogens_dist()) and heuristic search improved (HydrogenBondAnalysis._get_bonded_hydrogens_list())

Default heavy atom names for CHARMM27 force field.

group

donor

acceptor

comments

main chain

N

O, OC1, OC2

OC1, OC2 from amber99sb-ildn (Gromacs)

water

OH2, OW

OH2, OW

SPC, TIP3P, TIP4P (CHARMM27,Gromacs)

ARG

NE, NH1, NH2

ASN

ND2

OD1

ASP

OD1, OD2

CYS

SG

CYH

SG

possible false positives for CYS

GLN

NE2

OE1

GLU

OE1, OE2

HIS

ND1, NE2

ND1, NE2

presence of H determines if donor

HSD

ND1

NE2

HSE

NE2

ND1

HSP

ND1, NE2

LYS

NZ

MET

SD

see e.g. [Gregoret1991]

SER

OG

OG

THR

OG1

OG1

TRP

NE1

TYR

OH

OH

Heavy atom types for GLYCAM06 force field.

element

donor

acceptor

N

N,NT,N3

N,NT

O

OH,OW

O,O2,OH,OS,OW,OY

S

SM

Donor and acceptor names for the CHARMM27 force field will also work for e.g. OPLS/AA (tested in Gromacs). Residue names in the table are for information only and are not taken into account when determining acceptors and donors. This can potentially lead to some ambiguity in the assignment of donors/acceptors for residues such as histidine or cytosine.

For more information about the naming convention in GLYCAM06 have a look at the Carbohydrate Naming Convention in Glycam.

The lists of donor and acceptor names can be extended by providing lists of atom names in the donors and acceptors keywords to HydrogenBondAnalysis. If the lists are entirely inappropriate (e.g. when analysing simulations done with a force field that uses very different atom names) then one should either use the value “other” for forcefield to set no default values, or derive a new class and set the default list oneself:

class HydrogenBondAnalysis_OtherFF(HydrogenBondAnalysis):
      DEFAULT_DONORS = {"OtherFF": tuple(set([...]))}
      DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))}

Then simply use the new class instead of the parent class and call it with forcefield = “OtherFF”. Please also consider to contribute the list of heavy atom names to MDAnalysis.

References

Gregoret1991

L.M. Gregoret, S.D. Rader, R.J. Fletterick, and F.E. Cohen. Hydrogen bonds involving sulfur atoms in proteins. Proteins, 9(2):99–107, 1991. 10.1002/prot.340090204.

4.3.1.3. How to perform HydrogenBondAnalysis

All protein-water hydrogen bonds can be analysed with

import MDAnalysis
import MDAnalysis.analysis.hbonds

u = MDAnalysis.Universe('topology', 'trajectory')
h = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u, 'protein', 'resname HOH', distance=3.0, angle=120.0)
h.run()

Note

Due to the way HydrogenBondAnalysis is implemented, it is more efficient to have the second selection (selection2) be the larger group, e.g. the water when looking at water-protein H-bonds or the whole protein when looking at ligand-protein interactions.

The results are stored as the attribute HydrogenBondAnalysis.timeseries; see Output for the format and further options.

A number of convenience functions are provided to process the timeseries according to varying criteria:

count_by_time()

time series of the number of hydrogen bonds per time step

count_by_type()

data structure with the frequency of each observed hydrogen bond

timesteps_by_type()

data structure with a list of time steps for each observed hydrogen bond

For further data analysis it is convenient to process the timeseries data into a normalized table with the generate_table() method, which creates a new data structure HydrogenBondAnalysis.table that contains one row for each observation of a hydrogen bond:

h.generate_table()

This table can then be easily turned into, e.g., a pandas.DataFrame, and further analyzed:

import pandas as pd
df = pd.DataFrame.from_records(h.table)

For example, plotting a histogram of the hydrogen bond angles and lengths is as simple as

df.hist(column=["angle", "distance"])

4.3.1.4. Classes

class MDAnalysis.analysis.hbonds.hbond_analysis.HydrogenBondAnalysis(universe, selection1='protein', selection2='all', selection1_type='both', update_selection1=True, update_selection2=True, filter_first=True, distance_type='hydrogen', distance=3.0, angle=120.0, forcefield='CHARMM27', donors=None, acceptors=None, debug=False, detect_hydrogens='distance', verbose=False, pbc=False, **kwargs)[source]

Perform a hydrogen bond analysis

The analysis of the trajectory is performed with the HydrogenBondAnalysis.run() method. The result is stored in HydrogenBondAnalysis.timeseries. See run() for the format.

The default atom names are taken from the CHARMM 27 force field files, which will also work for e.g. OPLS/AA in Gromacs, and GLYCAM06.

Donors (associated hydrogens are deduced from topology)
CHARMM 27

N of the main chain, water OH2/OW, ARG NE/NH1/NH2, ASN ND2, HIS ND1/NE2, SER OG, TYR OH, CYS SG, THR OG1, GLN NE2, LYS NZ, TRP NE1

GLYCAM06

N,NT,N3,OH,OW

Acceptors
CHARMM 27

O of the main chain, water OH2/OW, ASN OD1, ASP OD1/OD2, CYH SG, GLN OE1, GLU OE1/OE2, HIS ND1/NE2, MET SD, SER OG, THR OG1, TYR OH

GLYCAM06

N,NT,O,O2,OH,OS,OW,OY,P,S,SM

amber99sb-ildn(Gromacs)

OC1, OC2 of the main chain

Changed in version 0.7.6: DEFAULT_DONORS/ACCEPTORS is now embedded in a dict to switch between default values for different force fields.

Set up calculation of hydrogen bonds between two selections in a universe.

The timeseries is accessible as the attribute HydrogenBondAnalysis.timeseries.

Some initial checks are performed. If there are no atoms selected by selection1 or selection2 or if no donor hydrogens or acceptor atoms are found then a SelectionError is raised for any selection that does not update (update_selection1 and update_selection2 keywords). For selections that are set to update, only a warning is logged because it is assumed that the selection might contain atoms at a later frame (e.g. for distance based selections).

If no hydrogen bonds are detected or if the initial check fails, look at the log output (enable with MDAnalysis.start_logging() and set verbose =True). It is likely that the default names for donors and acceptors are not suitable (especially for non-standard ligands). In this case, either change the forcefield or use customized donors and/or acceptors.

Parameters
  • universe (Universe) – Universe object

  • selection1 (str (optional)) – Selection string for first selection [‘protein’]

  • selection2 (str (optional)) – Selection string for second selection [‘all’]

  • selection1_type ({"donor", "acceptor", "both"} (optional)) – Selection 1 can be ‘donor’, ‘acceptor’ or ‘both’. Note that the value for selection1_type automatically determines how selection2 handles donors and acceptors: If selection1 contains ‘both’ then selection2 will also contain ‘both’. If selection1 is set to ‘donor’ then selection2 is ‘acceptor’ (and vice versa). [‘both’].

  • update_selection1 (bool (optional)) – Update selection 1 at each frame? Setting to False is recommended for any static selection to increase performance. [True]

  • update_selection2 (bool (optional)) – Update selection 2 at each frame? Setting to False is recommended for any static selection to increase performance. [True]

  • filter_first (bool (optional)) – Filter selection 2 first to only atoms 3 * distance away [True]

  • distance (float (optional)) – Distance cutoff for hydrogen bonds; only interactions with a H-A distance <= distance (and the appropriate D-H-A angle, see angle) are recorded. (Note: distance_type can change this to the D-A distance.) [3.0]

  • angle (float (optional)) – Angle cutoff for hydrogen bonds; an ideal H-bond has an angle of 180º. A hydrogen bond is only recorded if the D-H-A angle is >= angle. The default of 120º also finds fairly non-specific hydrogen interactions and a possibly better value is 150º. [120.0]

  • forcefield ({"CHARMM27", "GLYCAM06", "other"} (optional)) – Name of the forcefield used. Switches between different DEFAULT_DONORS and DEFAULT_ACCEPTORS values. [“CHARMM27”]

  • donors (sequence (optional)) – Extra H donor atom types (in addition to those in DEFAULT_DONORS), must be a sequence.

  • acceptors (sequence (optional)) – Extra H acceptor atom types (in addition to those in DEFAULT_ACCEPTORS), must be a sequence.

  • detect_hydrogens ({"distance", "heuristic"} (optional)) – Determine the algorithm to find hydrogens connected to donor atoms. Can be “distance” (default; finds all hydrogens in the donor’s residue within a cutoff of the donor) or “heuristic” (looks for the next few atoms in the atom list). “distance” should always give the correct answer but “heuristic” is faster, especially when the donor list is updated each for each frame. [“distance”]

  • distance_type ({"hydrogen", "heavy"} (optional)) – Measure hydrogen bond lengths between donor and acceptor heavy attoms (“heavy”) or between donor hydrogen and acceptor heavy atom (“hydrogen”). If using “heavy” then one should set the distance cutoff to a higher value such as 3.5 Å. [“hydrogen”]

  • debug (bool (optional)) – If set to True enables per-frame debug logging. This is disabled by default because it generates a very large amount of output in the log file. (Note that a logger must have been started to see the output, e.g. using MDAnalysis.start_logging().) [False]

  • verbose (bool (optional)) – Toggle progress output. (Can also be given as keyword argument to run().)

  • pbc (bool (optional)) – Whether to consider periodic boundaries in calculations [False]

Raises

SelectionError – is raised for each static selection without the required donors and/or acceptors.

Notes

In order to speed up processing, atoms are filtered by a coarse distance criterion before a detailed hydrogen bonding analysis is performed (filter_first = True). If one of your selections is e.g. the solvent then update_selection1 (or update_selection2) must also be True so that the list of candidate atoms is updated at each step: this is now the default.

If your selections will essentially remain the same for all time steps (i.e. residues are not moving farther than 3 x distance), for instance, if neither water nor large conformational changes are involved or if the optimization is disabled (filter_first = False) then you can improve performance by setting the update_selection1 and/or update_selection2 keywords to False.

Changed in version 0.7.6: New verbose keyword (and per-frame debug logging disabled by default).

New detect_hydrogens keyword to switch between two different algorithms to detect hydrogens bonded to donor. “distance” is a new, rigorous distance search within the residue of the donor atom, “heuristic” is the previous list scan (improved with an additional distance check).

New forcefield keyword to switch between different values of DEFAULT_DONORS/ACCEPTORS to accomodate different force fields. Also has an option “other” for no default values.

Changed in version 0.8: The new default for update_selection1 and update_selection2 is now True (see Issue 138). Set to False if your selections only need to be determined once (will increase performance).

Changed in version 0.9.0: New keyword distance_type to select between calculation between heavy atoms or hydrogen-acceptor. It defaults to the previous behavior (i.e. “hydrogen”).

Changed in version 0.11.0: Initial checks for selections that potentially raise SelectionError.

Changed in version 0.17.0: use 0-based indexing

Deprecated since version 0.16: The previous verbose keyword argument was replaced by debug. Note that the verbose keyword argument is now consistently used to toggle progress meters throughout the library.

timesteps

List of the times of each timestep. This can be used together with timeseries to find the specific time point of a hydrogen bond existence, or see table.

table

A normalised table of the data in HydrogenBondAnalysis.timeseries, generated by HydrogenBondAnalysis.generate_table(). It is a numpy.recarray with the following columns:

  1. “time”

  2. “donor_index”

  3. “acceptor_index”

  4. “donor_resnm”

  5. “donor_resid”

  6. “donor_atom”

  7. “acceptor_resnm”

  8. “acceptor_resid”

  9. “acceptor_atom”

  10. “distance”

  11. “angle”

It takes up more space than timeseries but it is easier to analyze and to import into databases or dataframes.

Example

For example, to create a pandas.DataFrame from h.table:

import pandas as pd
df = pd.DataFrame.from_records(h.table)

Changed in version 0.17.0: The 1-based donor and acceptor indices (donor_idx and acceptor_idx) are deprecated in favor of 0-based indices.

_get_bonded_hydrogens(atom, **kwargs)[source]

Find hydrogens bonded to atom.

This method is typically not called by a user but it is documented to facilitate understanding of the internals of HydrogenBondAnalysis.

Parameters
  • atom (groups.Atom) – heavy atom

  • **kwargs – passed through to the calculation method that was selected with the detect_hydrogens kwarg of HydrogenBondAnalysis.

Returns

hydrogen_atoms – list of hydrogens (can be a AtomGroup) or empty list [] if none were found.

Return type

AtomGroup or []

Changed in version 0.7.6: Can switch algorithm by using the detect_hydrogens keyword to the constructor. kwargs can be used to supply arguments for algorithm.

_get_bonded_hydrogens_dist(atom)[source]

Find hydrogens bonded within cutoff to atom.

Hydrogens are detected by either name (“H*”, “[123]H*”) or type (“H”); this is not fool-proof as the atom type is not always a character but the name pattern should catch most typical occurrences.

The distance from atom is calculated for all hydrogens in the residue and only those within a cutoff are kept. The cutoff depends on the heavy atom (more precisely, on its element, which is taken as the first letter of its name atom.name[0]) and is parameterized in HydrogenBondAnalysis.r_cov. If no match is found then the default of 1.5 Å is used.

Parameters

atom (groups.Atom) – heavy atom

Returns

hydrogen_atoms – list of hydrogens (can be a AtomGroup) or empty list [] if none were found.

Return type

AtomGroup or []

Notes

The performance of this implementation could be improved once the topology always contains bonded information; it currently uses the selection parser with an “around” selection.

New in version 0.7.6.

_get_bonded_hydrogens_list(atom, **kwargs)[source]

Find “bonded” hydrogens to the donor atom.

At the moment this relies on the assumption that the hydrogens are listed directly after the heavy atom in the topology. If this is not the case then this function will fail.

Hydrogens are detected by name H*, [123]H* and they have to be within a maximum distance from the heavy atom. The cutoff distance depends on the heavy atom and is parameterized in HydrogenBondAnalysis.r_cov.

Parameters
  • atom (groups.Atom) – heavy atom

  • **kwargs – ignored

Returns

hydrogen_atoms – list of hydrogens (can be a AtomGroup) or empty list [] if none were found.

Return type

AtomGroup or []

Changed in version 0.7.6: Added detection of [123]H and additional check that a selected hydrogen is bonded to the donor atom (i.e. its distance to the donor is less than the covalent radius stored in HydrogenBondAnalysis.r_cov or the default 1.5 Å).

Changed name to _get_bonded_hydrogens_list() and added kwargs so that it can be used instead of _get_bonded_hydrogens_dist().

DEFAULT_ACCEPTORS = {'CHARMM27': ('OH2', 'OC2', 'OW', 'OG', 'OH', 'ND1', 'NE2', 'OD1', 'OC1', 'OG1', 'SD', 'O', 'OD2', 'OE1', 'OE2', 'SG'), 'GLYCAM06': ('OY', 'O2', 'OH', 'OW', 'SM', 'O', 'N', 'NT', 'OS'), 'other': ()}

default atom names that are treated as hydrogen acceptors (see Default heavy atom names for CHARMM27 force field.); use the keyword acceptors to add a list of additional acceptor names.

DEFAULT_DONORS = {'CHARMM27': ('OH2', 'ND2', 'OW', 'NH2', 'NZ', 'ND1', 'OG', 'OH', 'NE2', 'NH1', 'OG1', 'NE1', 'N', 'SG', 'NE'), 'GLYCAM06': ('OW', 'OH', 'N3', 'N', 'NT'), 'other': ()}

default heavy atom names whose hydrogens are treated as donors (see Default heavy atom names for CHARMM27 force field.); use the keyword donors to add a list of additional donor names.

count_by_time()[source]

Counts the number of hydrogen bonds per timestep.

Processes HydrogenBondAnalysis._timeseries into the time series N(t) where N is the total number of observed hydrogen bonds at time t.

Returns

counts – The resulting array can be thought of as rows (time, N) where time is the time (in ps) of the time step and N is the total number of hydrogen bonds.

Return type

numpy.recarray

count_by_type()[source]

Counts the frequency of hydrogen bonds of a specific type.

Processes HydrogenBondAnalysis._timeseries and returns a numpy.recarray containing atom indices, residue names, residue numbers (for donors and acceptors) and the fraction of the total time during which the hydrogen bond was detected.

Returns

counts – Each row of the array contains data to define a unique hydrogen bond together with the frequency (fraction of the total time) that it has been observed.

Return type

numpy.recarray

Changed in version 0.17.0: The 1-based indices “donor_idx” and “acceptor_idx” are being deprecated in favor of zero-based indices.

generate_table()[source]

Generate a normalised table of the results.

The table is stored as a numpy.recarray in the attribute table.

r_cov = {'N': 1.31, 'O': 1.31, 'P': 1.58, 'S': 1.55}

A collections.defaultdict of covalent radii of common donors (used in :meth`_get_bonded_hydrogens_list` to check if a hydrogen is sufficiently close to its donor heavy atom). Values are stored for N, O, P, and S. Any other heavy atoms are assumed to have hydrogens covalently bound at a maximum distance of 1.5 Å.

run(start=None, stop=None, step=None, verbose=None, **kwargs)[source]

Analyze trajectory and produce timeseries.

Stores the hydrogen bond data per frame as HydrogenBondAnalysis.timeseries (see there for output format).

Parameters
  • start (int (optional)) – starting frame-index for analysis, None is the first one, 0. start and stop are 0-based frame indices and are used to slice the trajectory (if supported) [None]

  • stop (int (optional)) – last trajectory frame for analysis, None is the last one [None]

  • step (int (optional)) – read every step between start (included) and stop (excluded), None selects 1. [None]

  • verbose (bool (optional)) – toggle progress meter output ProgressMeter [True]

  • debug (bool (optional)) – enable detailed logging of debugging information; this can create very big log files so it is disabled (False) by default; setting debug toggles the debug status for HydrogenBondAnalysis, namely the value of HydrogenBondAnalysis.debug.

Other Parameters

remove_duplicates (bool (optional)) – duplicate hydrogen bonds are removed from output if set to the default value True; normally, this should not be changed.

See also

HydrogenBondAnalysis.generate_table()

processing the data into a different format.

Changed in version 0.7.6: Results are not returned, only stored in timeseries and duplicate hydrogen bonds are removed from output (can be suppressed with remove_duplicates = False)

Changed in version 0.11.0: Accept quiet keyword. Analysis will now proceed through frames even if no donors or acceptors were found in a particular frame.

Deprecated since version 0.16: The quiet keyword argument is deprecated in favor of the verbose one. Previous use of verbose now corresponds to the new keyword argument debug.

save_table(**kwds)

save_table is deprecated!

Saves table to a pickled file.

If table does not exist yet, generate_table() is called first.

Parameters

filename (str (optional)) – path to the filename

Example

Load with

import cPickle
table = cPickle.load(open(filename))

Deprecated since version 0.19.0: You can instead use np.save(filename, HydrogendBondAnalysis.table). save_table will be removed in release 1.0.0.

property timeseries

Time series of hydrogen bonds.

The results of the hydrogen bond analysis can be accessed as a list of list of list:

  1. timeseries[i]: data for the i-th trajectory frame (at time timesteps[i], see timesteps)

  2. timeseries[i][j]: j-th hydrogen bond that was detected at the i-th frame.

  3. donor_index, acceptor_index, donor_name_str, acceptor_name_str, distance, angle = timeseries[i][j]: structure of one hydrogen bond data item

In the following description, # indicates comments that are not part of the output:

results = [
    [ # frame 1
       [ # hbond 1
          <donor index (0-based)>, <acceptor index (0-based)>,
          <donor string>, <acceptor string>, <distance>, <angle>
       ],
       [ # hbond 2
          <donor index (0-based)>, <acceptor index (0-based)>,
          <donor string>, <acceptor string>, <distance>, <angle>
       ],
       ....
    ],
    [ # frame 2
      [ ... ], [ ... ], ...
    ],
    ...
]

The time of each step is not stored with each hydrogen bond frame but in timesteps.

Note

For instance, to find an acceptor atom in Universe.atoms by index one would use u.atoms[acceptor_index].

The timeseries is a managed attribute and it is generated from the underlying data in _timeseries every time the attribute is accessed. It is therefore costly to call and if timeseries is needed repeatedly it is recommended that you assign to a variable:

h = HydrogenBondAnalysis(u)
h.run()
timeseries = h.timeseries

See also

table

structured array of the data

Changed in version 0.16.1: timeseries has become a managed attribute and is generated from the stored _timeseries when needed. _timeseries contains the donor atom and acceptor atom specifiers as tuples (resname, resid, atomid) instead of strings.

Changed in version 0.17.0: The 1-based indices “donor_idx” and “acceptor_idx” are being removed in favor of the 0-based indices “donor_index” and “acceptor_index”.

timesteps_by_type()[source]

Frames during which each hydrogen bond existed, sorted by hydrogen bond.

Processes HydrogenBondAnalysis._timeseries and returns a numpy.recarray containing atom indices, residue names, residue numbers (for donors and acceptors) and each timestep at which the hydrogen bond was detected.

In principle, this is the same as table but sorted by hydrogen bond and with additional data for the donor_heavy_atom and angle and distance omitted.

Returns

data

Return type

numpy.recarray

Changed in version 0.17.0: The 1-based indices “donor_idx” and “acceptor_idx” are being replaced in favor of zero-based indices.