4.1.1. Analysis building blocks — MDAnalysis.analysis.base

MDAnalysis provides building blocks for creating analysis classes. One can think of each analysis class as a “tool” that performs a specific analysis over the trajectory frames and stores the results in the tool.

Analysis classes are derived from AnalysisBase by subclassing. This inheritance provides a common workflow and API for users and makes many additional features automatically available (such as frame selections and a verbose progressbar). The important points for analysis classes are:

  1. Analysis tools are Python classes derived from AnalysisBase.

  2. When instantiating an analysis, the Universe or AtomGroup that the analysis operates on is provided together with any other parameters that are kept fixed for the specific analysis.

  3. The analysis is performed with run() method. It has a common set of arguments such as being able to select the frames the analysis is performed on. The verbose keyword argument enables additional output. A progressbar is shown by default that also shows an estimate for the remaining time until the end of the analysis.

  4. Results are always stored in the attribute AnalysisBase.results, which is an instance of Results, a kind of dictionary that allows allows item access via attributes. Each analysis class decides what and how to store in Results and needs to document it. For time series, the AnalysisBase.times contains the time stamps of the analyzed frames.

4.1.1.1. Example of using a standard analysis tool

For example, the MDAnalysis.analysis.rms.RMSD performs a root-mean-square distance analysis in the following way:

import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC

from MDAnalysis.analysis import rms

u = mda.Universe(TPR, XTC)

# (2) instantiate analysis
rmsd = rms.RMSD(u, select='name CA')

# (3) the run() method can select frames in different ways
# run on all frames (with progressbar)
rmsd.run(verbose=True)

# or start, stop, and step can be used
rmsd.run(start=2, stop=8, step=2)

# a list of frames to run the analysis on can be passed
rmsd.run(frames=[0,2,3,6,9])

# a list of booleans the same length of the trajectory can be used
rmsd.run(frames=[True, False, True, True, False, False, True, False,
                 False, True])

# (4) analyze the results, e.g., plot
t = rmsd.times
y = rmsd.results.rmsd[:, 2]   # RMSD at column index 2, see docs

import matplotlib.pyplot as plt
plt.plot(t, y)
plt.xlabel("time (ps)")
plt.ylabel("RMSD (Å)")

4.1.1.2. Writing new analysis tools

In order to write new analysis tools, derive a class from AnalysisBase and define at least the _single_frame() method, as described in AnalysisBase.

See also

The chapter Writing your own trajectory analysis in the User Guide contains a step-by-step example for writing analysis tools with AnalysisBase.

4.1.1.3. Classes

The Results and AnalysisBase classes are the essential building blocks for almost all MDAnalysis tools in the MDAnalysis.analysis module. They aim to be easily useable and extendable.

AnalysisFromFunction and the analysis_class() functions are simple wrappers that make it even easier to create fully-featured analysis tools if only the single-frame analysis function needs to be written.

class MDAnalysis.analysis.base.AnalysisBase(trajectory, verbose=False, **kwargs)[source]

Base class for defining multi-frame analysis

The class is designed as a template for creating multi-frame analyses. This class will automatically take care of setting up the trajectory reader for iterating, and it offers to show a progress meter. Computed results are stored inside the results attribute.

To define a new Analysis, AnalysisBase needs to be subclassed and _single_frame() must be defined. It is also possible to define _prepare() and _conclude() for pre- and post-processing. All results should be stored as attributes of the Results container.

Parameters
times

array of Timestep times. Only exists after calling AnalysisBase.run()

Type

numpy.ndarray

frames

array of Timestep frame indices. Only exists after calling AnalysisBase.run()

Type

numpy.ndarray

results

results of calculation are stored after call to AnalysisBase.run()

Type

Results

Example

from MDAnalysis.analysis.base import AnalysisBase

class NewAnalysis(AnalysisBase):
    def __init__(self, atomgroup, parameter, **kwargs):
        super(NewAnalysis, self).__init__(atomgroup.universe.trajectory,
                                          **kwargs)
        self._parameter = parameter
        self._ag = atomgroup

    def _prepare(self):
        # OPTIONAL
        # Called before iteration on the trajectory has begun.
        # Data structures can be set up at this time
        self.results.example_result = []

    def _single_frame(self):
        # REQUIRED
        # Called after the trajectory is moved onto each new frame.
        # store an example_result of `some_function` for a single frame
        self.results.example_result.append(some_function(self._ag,
                                                         self._parameter))

    def _conclude(self):
        # OPTIONAL
        # Called once iteration on the trajectory is finished.
        # Apply normalisation and averaging to results here.
        self.results.example_result = np.asarray(self.example_result)
        self.results.example_result /=  np.sum(self.result)

Afterwards the new analysis can be run like this

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF, DCD

u = mda.Universe(PSF, DCD)

na = NewAnalysis(u.select_atoms('name CA'), 35)
na.run(start=10, stop=20)
print(na.results.example_result)
# results can also be accessed by key
print(na.results["example_result"])

Changed in version 1.0.0: Support for setting start, stop, and step has been removed. These should now be directly passed to AnalysisBase.run().

Changed in version 2.0.0: Added results

run(start=None, stop=None, step=None, frames=None, verbose=None)[source]

Perform the calculation

Parameters
  • start (int, optional) – start frame of analysis

  • stop (int, optional) – stop frame of analysis

  • step (int, optional) – number of frames to skip between each analysed frame

  • frames (array_like, optional) –

    array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a ValueError.

    New in version 2.2.0.

  • verbose (bool, optional) – Turn on verbosity

Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.

class MDAnalysis.analysis.base.AnalysisFromFunction(function, trajectory=None, *args, **kwargs)[source]

Create an AnalysisBase from a function working on AtomGroups

Parameters
  • function (callable) – function to evaluate at each frame

  • trajectory (MDAnalysis.coordinates.Reader, optional) – trajectory to iterate over. If None the first AtomGroup found in args and kwargs is used as a source for the trajectory.

  • *args (list) – arguments for function

  • **kwargs (dict) – arguments for function and AnalysisBase

results.frames

simulation frames used in analysis

Type

numpy.ndarray

results.times

simulation times used in analysis

Type

numpy.ndarray

results.timeseries

Results for each frame of the wrapped function, stored after call to AnalysisFromFunction.run().

Type

numpy.ndarray

Raises

ValueError – if function has the same kwargs as AnalysisBase

Example

def rotation_matrix(mobile, ref):
    return mda.analysis.align.rotation_matrix(mobile, ref)[0]

rot = AnalysisFromFunction(rotation_matrix, trajectory,
                            mobile, ref).run()
print(rot.results.timeseries)

Changed in version 1.0.0: Support for directly passing the start, stop, and step arguments has been removed. These should instead be passed to AnalysisFromFunction.run().

Changed in version 2.0.0: Former results are now stored as results.timeseries

class MDAnalysis.analysis.base.Results(*args, **kwargs)[source]

Container object for storing results.

Results are dictionaries that provide two ways by which values can be accessed: by dictionary key results["value_key"] or by object attribute, results.value_key. Results stores all results obtained from an analysis after calling run().

The implementation is similar to the sklearn.utils.Bunch class in scikit-learn.

Raises
  • AttributeError – If an assigned attribute has the same name as a default attribute.

  • ValueError – If a key is not of type str and therefore is not able to be accessed by attribute.

Examples

>>> from MDAnalysis.analysis.base import Results
>>> results = Results(a=1, b=2)
>>> results['b']
2
>>> results.b
2
>>> results.a = 3
>>> results['a']
3
>>> results.c = [1, 2, 3, 4]
>>> results['c']
[1, 2, 3, 4]

New in version 2.0.0.

MDAnalysis.analysis.base.analysis_class(function)[source]

Transform a function operating on a single frame to an AnalysisBase class.

Parameters

function (callable) – function to evaluate at each frame

results.frames

simulation frames used in analysis

Type

numpy.ndarray

results.times

simulation times used in analysis

Type

numpy.ndarray

results.timeseries

Results for each frame of the wrapped function, stored after call to AnalysisFromFunction.run().

Type

numpy.ndarray

Raises

ValueError – if function has the same kwargs as AnalysisBase

Examples

For use in a library, we recommend the following style

def rotation_matrix(mobile, ref):
    return mda.analysis.align.rotation_matrix(mobile, ref)[0]
RotationMatrix = analysis_class(rotation_matrix)

It can also be used as a decorator

@analysis_class
def RotationMatrix(mobile, ref):
    return mda.analysis.align.rotation_matrix(mobile, ref)[0]

rot = RotationMatrix(u.trajectory, mobile, ref).run(step=2)
print(rot.results.timeseries)

Changed in version 2.0.0: Former results are now stored as results.timeseries