4.1.1. Analysis building blocks — MDAnalysis.analysis.base
MDAnalysis provides building blocks for creating analysis classes. One can think of each analysis class as a “tool” that performs a specific analysis over the trajectory frames and stores the results in the tool.
Analysis classes are derived from AnalysisBase
by subclassing. This
inheritance provides a common workflow and API for users and makes many
additional features automatically available (such as frame selections and a
verbose progressbar). The important points for analysis classes are:
Analysis tools are Python classes derived from
AnalysisBase
.When instantiating an analysis, the
Universe
orAtomGroup
that the analysis operates on is provided together with any other parameters that are kept fixed for the specific analysis.The analysis is performed with
run()
method. It has a common set of arguments such as being able to select the frames the analysis is performed on. The verbose keyword argument enables additional output. A progressbar is shown by default that also shows an estimate for the remaining time until the end of the analysis.Results are always stored in the attribute
AnalysisBase.results
, which is an instance ofResults
, a kind of dictionary that allows allows item access via attributes. Each analysis class decides what and how to store inResults
and needs to document it. For time series, theAnalysisBase.times
contains the time stamps of the analyzed frames.
4.1.1.1. Example of using a standard analysis tool
For example, the MDAnalysis.analysis.rms.RMSD
performs a
root-mean-square distance analysis in the following way:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC
from MDAnalysis.analysis import rms
u = mda.Universe(TPR, XTC)
# (2) instantiate analysis
rmsd = rms.RMSD(u, select='name CA')
# (3) the run() method can select frames in different ways
# run on all frames (with progressbar)
rmsd.run(verbose=True)
# or start, stop, and step can be used
rmsd.run(start=2, stop=8, step=2)
# a list of frames to run the analysis on can be passed
rmsd.run(frames=[0,2,3,6,9])
# a list of booleans the same length of the trajectory can be used
rmsd.run(frames=[True, False, True, True, False, False, True, False,
False, True])
# (4) analyze the results, e.g., plot
t = rmsd.times
y = rmsd.results.rmsd[:, 2] # RMSD at column index 2, see docs
import matplotlib.pyplot as plt
plt.plot(t, y)
plt.xlabel("time (ps)")
plt.ylabel("RMSD (Å)")
4.1.1.2. Writing new analysis tools
In order to write new analysis tools, derive a class from AnalysisBase
and define at least the _single_frame()
method, as described in
AnalysisBase
.
See also
The chapter Writing your own trajectory analysis in the User Guide
contains a step-by-step example for writing analysis tools with
AnalysisBase
.
4.1.1.3. Classes
The Results
and AnalysisBase
classes are the essential
building blocks for almost all MDAnalysis tools in the
MDAnalysis.analysis
module. They aim to be easily useable and
extendable.
AnalysisFromFunction
and the analysis_class()
functions are
simple wrappers that make it even easier to create fully-featured analysis
tools if only the single-frame analysis function needs to be written.
- class MDAnalysis.analysis.base.AnalysisBase(trajectory, verbose=False, **kwargs)[source]
Base class for defining multi-frame analysis
The class is designed as a template for creating multi-frame analyses. This class will automatically take care of setting up the trajectory reader for iterating, and it offers to show a progress meter. Computed results are stored inside the
results
attribute.To define a new Analysis,
AnalysisBase
needs to be subclassed and_single_frame()
must be defined. It is also possible to define_prepare()
and_conclude()
for pre- and post-processing. All results should be stored as attributes of theResults
container.- Parameters
trajectory (MDAnalysis.coordinates.base.ReaderBase) – A trajectory Reader
verbose (bool, optional) – Turn on more logging and debugging
- times
array of Timestep times. Only exists after calling
AnalysisBase.run()
- Type
- frames
array of Timestep frame indices. Only exists after calling
AnalysisBase.run()
- Type
- results
results of calculation are stored after call to
AnalysisBase.run()
- Type
Example
from MDAnalysis.analysis.base import AnalysisBase class NewAnalysis(AnalysisBase): def __init__(self, atomgroup, parameter, **kwargs): super(NewAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs) self._parameter = parameter self._ag = atomgroup def _prepare(self): # OPTIONAL # Called before iteration on the trajectory has begun. # Data structures can be set up at this time self.results.example_result = [] def _single_frame(self): # REQUIRED # Called after the trajectory is moved onto each new frame. # store an example_result of `some_function` for a single frame self.results.example_result.append(some_function(self._ag, self._parameter)) def _conclude(self): # OPTIONAL # Called once iteration on the trajectory is finished. # Apply normalisation and averaging to results here. self.results.example_result = np.asarray(self.example_result) self.results.example_result /= np.sum(self.result)
Afterwards the new analysis can be run like this
import MDAnalysis as mda from MDAnalysisTests.datafiles import PSF, DCD u = mda.Universe(PSF, DCD) na = NewAnalysis(u.select_atoms('name CA'), 35) na.run(start=10, stop=20) print(na.results.example_result) # results can also be accessed by key print(na.results["example_result"])
Changed in version 1.0.0: Support for setting start, stop, and step has been removed. These should now be directly passed to
AnalysisBase.run()
.Changed in version 2.0.0: Added
results
- run(start=None, stop=None, step=None, frames=None, verbose=None, *, progressbar_kwargs={})[source]
Perform the calculation
- Parameters
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a
ValueError
.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
- class MDAnalysis.analysis.base.AnalysisFromFunction(function, trajectory=None, *args, **kwargs)[source]
Create an
AnalysisBase
from a function working on AtomGroups- Parameters
function (callable) – function to evaluate at each frame
trajectory (MDAnalysis.coordinates.Reader, optional) – trajectory to iterate over. If
None
the first AtomGroup found in args and kwargs is used as a source for the trajectory.*args (list) – arguments for function
**kwargs (dict) – arguments for function and
AnalysisBase
- results.frames
simulation frames used in analysis
- Type
- results.times
simulation times used in analysis
- Type
- results.timeseries
Results for each frame of the wrapped function, stored after call to
AnalysisFromFunction.run()
.- Type
- Raises
ValueError – if function has the same kwargs as
AnalysisBase
Example
def rotation_matrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] rot = AnalysisFromFunction(rotation_matrix, trajectory, mobile, ref).run() print(rot.results.timeseries)
Changed in version 1.0.0: Support for directly passing the start, stop, and step arguments has been removed. These should instead be passed to
AnalysisFromFunction.run()
.Changed in version 2.0.0: Former
results
are now stored asresults.timeseries
- class MDAnalysis.analysis.base.Results(*args, **kwargs)[source]
Container object for storing results.
Results
are dictionaries that provide two ways by which values can be accessed: by dictionary keyresults["value_key"]
or by object attribute,results.value_key
.Results
stores all results obtained from an analysis after callingrun()
.The implementation is similar to the
sklearn.utils.Bunch
class in scikit-learn.- Raises
AttributeError – If an assigned attribute has the same name as a default attribute.
ValueError – If a key is not of type
str
and therefore is not able to be accessed by attribute.
Examples
>>> from MDAnalysis.analysis.base import Results >>> results = Results(a=1, b=2) >>> results['b'] 2 >>> results.b 2 >>> results.a = 3 >>> results['a'] 3 >>> results.c = [1, 2, 3, 4] >>> results['c'] [1, 2, 3, 4]
New in version 2.0.0.
- MDAnalysis.analysis.base.analysis_class(function)[source]
Transform a function operating on a single frame to an
AnalysisBase
class.- Parameters
function (callable) – function to evaluate at each frame
- results.frames
simulation frames used in analysis
- Type
- results.times
simulation times used in analysis
- Type
- results.timeseries
Results for each frame of the wrapped function, stored after call to
AnalysisFromFunction.run()
.- Type
- Raises
ValueError – if function has the same kwargs as
AnalysisBase
Examples
For use in a library, we recommend the following style
def rotation_matrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] RotationMatrix = analysis_class(rotation_matrix)
It can also be used as a decorator
@analysis_class def RotationMatrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] rot = RotationMatrix(u.trajectory, mobile, ref).run(step=2) print(rot.results.timeseries)
Changed in version 2.0.0: Former
results
are now stored asresults.timeseries