4.8.3. Water dynamics analysis — MDAnalysis.analysis.waterdynamics

Author:

Alejandro Bernardin

Year:

2014-2015

Copyright:

GNU Public License v3

New in version 0.11.0.

This module provides functions to analyze water dynamics trajectories and water interactions with other molecules. The functions in this module are: water orientational relaxation (WOR) [1], hydrogen bond lifetimes (HBL) [2], angular distribution (AD) [3], mean square displacement (MSD) [4] and survival probability (SP) [5].

For more information about this type of analysis please refer to [6] (water in a protein cavity) and [7] (water in a nanopore).

References

4.8.3.1. Example use of the analysis classes

4.8.3.1.1. HydrogenBondLifetimes

To analyse hydrogen bond lifetime, use MDAnalysis.analysis.hydrogenbonds.hbond_analysis.HydrogenBondAnalysis.lifetime().

4.8.3.1.2. WaterOrientationalRelaxation

Analyzing water orientational relaxation (WOR) WaterOrientationalRelaxation. In this case we are analyzing “how fast” water molecules are rotating/changing direction. If WOR is very stable we can assume that water molecules are rotating/changing direction very slow, on the other hand, if WOR decay very fast, we can assume that water molecules are rotating/changing direction very fast:

import MDAnalysis
from MDAnalysis.analysis.waterdynamics import WaterOrientationalRelaxation as WOR

u = MDAnalysis.Universe(pdb, trajectory)
select = "byres name OH2 and sphzone 6.0 protein and resid 42"
WOR_analysis = WOR(universe, select, 0, 1000, 20)
WOR_analysis.run()
time = 0
#now we print the data ready to plot. The first two columns are WOR_OH vs t plot,
#the second two columns are WOR_HH vs t graph and the third two columns are WOR_dip vs t graph
for WOR_OH, WOR_HH, WOR_dip in WOR_analysis.timeseries:
      print("{time} {WOR_OH} {time} {WOR_HH} {time} {WOR_dip}".format(time=time, WOR_OH=WOR_OH, WOR_HH=WOR_HH,WOR_dip=WOR_dip))
      time += 1

#now, if we want, we can plot our data
plt.figure(1,figsize=(18, 6))

#WOR OH
plt.subplot(131)
plt.xlabel('time')
plt.ylabel('WOR')
plt.title('WOR OH')
plt.plot(range(0,time),[column[0] for column in WOR_analysis.timeseries])

#WOR HH
plt.subplot(132)
plt.xlabel('time')
plt.ylabel('WOR')
plt.title('WOR HH')
plt.plot(range(0,time),[column[1] for column in WOR_analysis.timeseries])

#WOR dip
plt.subplot(133)
plt.xlabel('time')
plt.ylabel('WOR')
plt.title('WOR dip')
plt.plot(range(0,time),[column[2] for column in WOR_analysis.timeseries])

plt.show()

where t0 = 0, tf = 1000 and dtmax = 20. In this way we create 20 windows timesteps (20 values in the x axis), the first window is created with 1000 timestep average (1000/1), the second window is created with 500 timestep average(1000/2), the third window is created with 333 timestep average (1000/3) and so on.

4.8.3.1.3. AngularDistribution

Analyzing angular distribution (AD) AngularDistribution for OH vector, HH vector and dipole vector. It returns a line histogram with vector orientation preference. A straight line in the output plot means no preferential orientation in water molecules. In this case we are analyzing if water molecules have some orientational preference, in this way we can see if water molecules are under an electric field or if they are interacting with something (residue, protein, etc):

import MDAnalysis
from MDAnalysis.analysis.waterdynamics import AngularDistribution as AD

u = MDAnalysis.Universe(pdb, trajectory)
selection = "byres name OH2 and sphzone 6.0 (protein and (resid 42 or resid 26) )"
bins = 30
AD_analysis = AD(universe,selection,bins)
AD_analysis.run()
#now we print data ready to graph. The first two columns are P(cos(theta)) vs cos(theta) for OH vector ,
#the seconds two columns are P(cos(theta)) vs cos(theta) for HH vector and thirds two columns
#are P(cos(theta)) vs cos(theta) for dipole vector
for bin in range(bins):
      print("{AD_analysisOH} {AD_analysisHH} {AD_analysisDip}".format(AD_analysis.graph0=AD_analysis.graph[0][bin], AD_analysis.graph1=AD_analysis.graph[1][bin],AD_analysis.graph2=AD_analysis.graph[2][bin]))

#and if we want to graph our results
plt.figure(1,figsize=(18, 6))

#AD OH
plt.subplot(131)
plt.xlabel('cos theta')
plt.ylabel('P(cos theta)')
plt.title('PDF cos theta for OH')
plt.plot([float(column.split()[0]) for column in AD_analysis.graph[0][:-1]],[float(column.split()[1]) for column in AD_analysis.graph[0][:-1]])

#AD HH
plt.subplot(132)
plt.xlabel('cos theta')
plt.ylabel('P(cos theta)')
plt.title('PDF cos theta for HH')
plt.plot([float(column.split()[0]) for column in AD_analysis.graph[1][:-1]],[float(column.split()[1]) for column in AD_analysis.graph[1][:-1]])

#AD dip
plt.subplot(133)
plt.xlabel('cos theta')
plt.ylabel('P(cos theta)')
plt.title('PDF cos theta for dipole')
plt.plot([float(column.split()[0]) for column in AD_analysis.graph[2][:-1]],[float(column.split()[1]) for column in AD_analysis.graph[2][:-1]])

plt.show()

where P(cos(theta)) is the angular distribution or angular probabilities.

4.8.3.1.4. MeanSquareDisplacement

Analyzing mean square displacement (MSD) MeanSquareDisplacement for water molecules. In this case we are analyzing the average distance that water molecules travels inside protein in XYZ direction (cylindric zone of radius 11[nm], Zmax 4.0[nm] and Zmin -8.0[nm]). A strong rise mean a fast movement of water molecules, a weak rise mean slow movement of particles:

import MDAnalysis
from MDAnalysis.analysis.waterdynamics import MeanSquareDisplacement as MSD

u = MDAnalysis.Universe(pdb, trajectory)
select = "byres name OH2 and cyzone 11.0 4.0 -8.0 protein"
MSD_analysis = MSD(universe, select, 0, 1000, 20)
MSD_analysis.run()
#now we print data ready to graph. The graph
#represents MSD vs t
time = 0
for msd in MSD_analysis.timeseries:
      print("{time} {msd}".format(time=time, msd=msd))
      time += 1

#Plot
plt.xlabel('time')
plt.ylabel('MSD')
plt.title('MSD')
plt.plot(range(0,time),MSD_analysis.timeseries)
plt.show()

4.8.3.1.5. SurvivalProbability

Analyzing survival probability (SP) SurvivalProbability of molecules. In this case we are analyzing how long water molecules remain in a sphere of radius 12.3 centered in the geometrical center of resid 42 and 26. A slow decay of SP means a long permanence time of water molecules in the zone, on the other hand, a fast decay means a short permanence time:

import MDAnalysis
from MDAnalysis.analysis.waterdynamics import SurvivalProbability as SP
import matplotlib.pyplot as plt

universe = MDAnalysis.Universe(pdb, trajectory)
select = "byres name OH2 and sphzone 12.3 (resid 42 or resid 26) "
sp = SP(universe, select, verbose=True)
sp.run(start=0, stop=101, tau_max=20)
tau_timeseries = sp.tau_timeseries
sp_timeseries = sp.sp_timeseries

# print in console
for tau, sp in zip(tau_timeseries, sp_timeseries):
      print("{time} {sp}".format(time=tau, sp=sp))

# plot
plt.xlabel('Time')
plt.ylabel('SP')
plt.title('Survival Probability')
plt.plot(tau_timeseries, sp_timeseries)
plt.show()

One should note that the stop keyword as used in the above example has an exclusive behaviour, i.e. here the final frame used will be 100 not 101. This behaviour is aligned with AnalysisBase but currently differs from other MDAnalysis.analysis.waterdynamics classes, which all exhibit inclusive behaviour for their final frame selections.

Another example applies to the situation where you work with many different “residues”. Here we calculate the SP of a potassium ion around each lipid in a membrane and average the results. In this example, if the SP analysis were run without treating each lipid separately, potassium ions may hop from one lipid to another and still be counted as remaining in the specified region. That is, the survival probability of the potassium ion around the entire membrane will be calculated.

Note, for this example, it is advisable to use Universe(in_memory=True) to ensure that the simulation is not being reloaded into memory for each lipid:

import MDAnalysis as mda
from MDAnalysis.analysis.waterdynamics import SurvivalProbability as SP
import numpy as np

u = mda.Universe("md.gro", "md100ns.xtc", in_memory=True)
lipids = u.select_atoms('resname LIPIDS')
joined_sp_timeseries = [[] for _ in range(20)]
for lipid in lipids.residues:
    print("Lipid ID: %d" % lipid.resid)

    select = "resname POTASSIUM and around 3.5 (resid %d and name O13 O14) " % lipid.resid
    sp = SP(u, select, verbose=True)
    sp.run(tau_max=20)

    # Raw SP points for each tau:
    for sps, new_sps in zip(joined_sp_timeseries, sp.sp_timeseries_data):
        sps.extend(new_sps)

# average all SP datapoints
sp_data = [np.mean(sp) for sp in joined_sp_timeseries]

for tau, sp in zip(range(1, tau_max + 1), sp_data):
    print("{time} {sp}".format(time=tau, sp=sp))

4.8.3.2. Output

4.8.3.2.1. WaterOrientationalRelaxation

Water orientational relaxation (WOR) data is returned per window timestep, which is stored in WaterOrientationalRelaxation.timeseries:

results = [
    [ # time t0
        <WOR_OH>, <WOR_HH>, <WOR_dip>
    ],
    [ # time t1
        <WOR_OH>, <WOR_HH>, <WOR_dip>
    ],
    ...
 ]

4.8.3.2.2. AngularDistribution

Angular distribution (AD) data is returned per vector, which is stored in AngularDistribution.graph. In fact, AngularDistribution returns a histogram:

results = [
    [ # OH vector values
      # the values are order in this way: <x_axis  y_axis>
        <cos_theta0 ang_distr0>, <cos_theta1 ang_distr1>, ...
    ],
    [ # HH vector values
        <cos_theta0 ang_distr0>, <cos_theta1 ang_distr1>, ...
    ],
    [ # dip vector values
       <cos_theta0 ang_distr0>, <cos_theta1 ang_distr1>, ...
    ],
 ]

4.8.3.2.3. MeanSquareDisplacement

Mean Square Displacement (MSD) data is returned in a list, which each element represents a MSD value in its respective window timestep. Data is stored in MeanSquareDisplacement.timeseries:

results = [
     #MSD values orders by window timestep
        <MSD_t0>, <MSD_t1>, ...
 ]

4.8.3.2.4. SurvivalProbability

Survival Probability (SP) computes two lists: a list of taus (SurvivalProbability.tau_timeseries) and a list of the corresponding survival probabilities (SurvivalProbability.sp_timeseries).

results = [ tau1, tau2, …, tau_n ], [ sp_tau1, sp_tau2, …, sp_tau_n]

Additionally, a list SurvivalProbability.sp_timeseries_data, is provided which contains a list of all SPs calculated for each tau. This can be used to compute the distribution or time dependence of SP, etc.

4.8.3.3. Classes

class MDAnalysis.analysis.waterdynamics.WaterOrientationalRelaxation(universe, select, t0, tf, dtmax, nproc=1)[source]

Water orientation relaxation analysis

Function to evaluate the Water Orientational Relaxation proposed by Yu-ling Yeh and Chung-Yuan Mou [1]. WaterOrientationalRelaxation indicates “how fast” water molecules are rotating or changing direction. This is a time correlation function given by:

\[C_{\hat u}(\tau)=\langle \mathit{P}_2[\mathbf{\hat{u}}(t_0)\cdot\mathbf{\hat{u}}(t_0+\tau)]\rangle\]

where \(P_2=(3x^2-1)/2\) is the second-order Legendre polynomial and \(\hat{u}\) is a unit vector along HH, OH or dipole vector.

Parameters:
  • universe (Universe) – Universe object

  • selection (str) – Selection string for water [‘byres name OH2’].

  • t0 (int) – frame where analysis begins

  • tf (int) – frame where analysis ends

  • dtmax (int) – Maximum dt size, dtmax < tf or it will crash.

New in version 0.11.0.

Changed in version 1.0.0: Changed selection keyword to select

static lg2(x)[source]

Second Legendre polynomial

run(**kwargs)[source]

Analyze trajectory and produce timeseries

class MDAnalysis.analysis.waterdynamics.AngularDistribution(universe, select, bins=40, nproc=1, axis='z')[source]

Angular distribution function analysis

The angular distribution function (AD) is defined as the distribution probability of the cosine of the \(\theta\) angle formed by the OH vector, HH vector or dipolar vector of water molecules and a vector \(\hat n\) parallel to chosen axis (z is the default value). The cosine is define as \(\cos \theta = \hat u \cdot \hat n\), where \(\hat u\) is OH, HH or dipole vector. It creates a histogram and returns a list of lists, see Output. The AD is also know as Angular Probability (AP).

Parameters:
  • universe (Universe) – Universe object

  • select (str) – Selection string to evaluate its angular distribution [‘byres name OH2’]

  • bins (int (optional)) – Number of bins to create the histogram by means of numpy.histogram()

  • axis ({'x', 'y', 'z'} (optional)) – Axis to create angle with the vector (HH, OH or dipole) and calculate cosine theta [‘z’].

New in version 0.11.0.

Changed in version 1.0.0: Changed selection keyword to select

run(**kwargs)[source]

Function to evaluate the angular distribution of cos(theta)

class MDAnalysis.analysis.waterdynamics.MeanSquareDisplacement(universe, select, t0, tf, dtmax, nproc=1)[source]

Mean square displacement analysis

Function to evaluate the Mean Square Displacement (MSD). The MSD gives the average distance that particles travels. The MSD is given by:

\[\langle\Delta r(t)^2\rangle = 2nDt\]

where \(r(t)\) is the position of particle in time \(t\), \(\Delta r(t)\) is the displacement after time lag \(t\), \(n\) is the dimensionality, in this case \(n=3\), \(D\) is the diffusion coefficient and \(t\) is the time.

Parameters:
  • universe (Universe) – Universe object

  • select (str) – Selection string for water [‘byres name OH2’].

  • t0 (int) – frame where analysis begins

  • tf (int) – frame where analysis ends

  • dtmax (int) – Maximum dt size, dtmax < tf or it will crash.

New in version 0.11.0.

Changed in version 1.0.0: Changed selection keyword to select

run(**kwargs)[source]

Analyze trajectory and produce timeseries

class MDAnalysis.analysis.waterdynamics.SurvivalProbability(universe, select, verbose=False)[source]

Survival Probability (SP) gives the probability for a group of particles to remain in a certain region. The SP is given by:

\[P(\tau) = \langle \frac{ N(t, t + \tau )} { N(t) }\rangle\]

where \(\tau\) is the timestep, \(N(t)\) the number of particles at time \(t\), and \(N(t, t+\tau)\) is the number of particles at every frame from \(t\) to \(t + \tau\). The angular brackets represent an average over all time origins, \(t\). See MDAnalysis.lib.correlations.autocorrelation() for technical details.

Parameters:
  • universe (Universe) – Universe object

  • select (str) – Selection string; any selection is allowed. With this selection you define the region/zone where to analyze, e.g.: “resname SOL and around 5 (resid 10)”. See SP-examples.

  • verbose (Boolean, optional) – When True, prints progress and comments to the console.

Notes

Currently SurvivalProbability is the only on in MDAnalysis.analysis.waterdynamics to support an exclusive behaviour (i.e. similar to the current behaviour of AnalysisBase to the stop keyword passed to SurvivalProbability.run(). Unlike other MDAnalysis.analysis.waterdynamics final frame definitions which are inclusive.

New in version 0.11.0.

Changed in version 1.0.0: Using the MDAnalysis.lib.correlations.py to carry out the intermittency and autocorrelation calculations. Changed selection keyword to select. Removed support for the deprecated t0, tf, and dtmax keywords. These should instead be passed to SurvivalProbability.run() as the start, stop, and tau_max keywords respectively. The stop keyword as passed to SurvivalProbability.run() has now changed behaviour and will act in an exclusive manner (instead of it’s previous inclusive behaviour),

Changed in version 2.7.0: Updated docs to align with discrete autocorrelation function.

run(tau_max=20, start=None, stop=None, step=None, residues=False, intermittency=0, verbose=False)[source]

Computes and returns the Survival Probability (SP) timeseries

Parameters:
  • start (int, optional) – Zero-based index of the first frame to be analysed, Default: None (first frame).

  • stop (int, optional) – Zero-based index of the last frame to be analysed (exclusive), Default: None (last frame).

  • step (int, optional) – Jump every step-th frame. This is compatible but independant of the taus used, and it is good to consider using the step equal to tau_max to remove the overlap. Note that step and tau_max work consistently with intermittency. Default: None (use every frame).

  • tau_max (int, optional) – Survival probability is calculated for the range 1 <= tau <= tau_max.

  • residues (Boolean, optional) – If true, the analysis will be carried out on the residues (.resids) rather than on atom (.ids). A single atom is sufficient to classify the residue as within the distance.

  • intermittency (int, optional) – The maximum number of consecutive frames for which an atom can leave but be counted as present if it returns at the next frame. An intermittency of 0 is equivalent to a continuous survival probability, which does not allow for the leaving and returning of atoms. For example, for intermittency=2, any given atom may leave a region of interest for up to two consecutive frames yet be treated as being present at all frames. The default is continuous (0).

  • verbose (Boolean, optional) – Print the progress to the console.

Returns:

  • tau_timeseries (list) – tau from 1 to tau_max. Saved in the field tau_timeseries.

  • sp_timeseries (list) – survival probability for each value of tau. Saved in the field sp_timeseries.

  • sp_timeseries_data (list) – raw datapoints from which the average is taken (sp_timeseries). Time dependancy and distribution can be extracted.

Changed in version 1.0.0: To math other analysis methods, the stop keyword is now exclusive rather than inclusive.