4.2.1. Analysis building blocks — MDAnalysis.analysis.base
MDAnalysis provides building blocks for creating analysis classes. One can think of each analysis class as a “tool” that performs a specific analysis over the trajectory frames and stores the results in the tool.
Analysis classes are derived from AnalysisBase
by subclassing. This
inheritance provides a common workflow and API for users and makes many
additional features automatically available (such as frame selections and a
verbose progressbar). The important points for analysis classes are:
Analysis tools are Python classes derived from
AnalysisBase
.When instantiating an analysis, the
Universe
orAtomGroup
that the analysis operates on is provided together with any other parameters that are kept fixed for the specific analysis.The analysis is performed with
run()
method. It has a common set of arguments such as being able to select the frames the analysis is performed on. The verbose keyword argument enables additional output. A progressbar is shown by default that also shows an estimate for the remaining time until the end of the analysis.Results are always stored in the attribute
AnalysisBase.results
, which is an instance ofResults
, a kind of dictionary that allows allows item access via attributes. Each analysis class decides what and how to store inResults
and needs to document it. For time series, theAnalysisBase.times
contains the time stamps of the analyzed frames.
4.2.1.1. Example of using a standard analysis tool
For example, the MDAnalysis.analysis.rms.RMSD
performs a
root-mean-square distance analysis in the following way:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC
from MDAnalysis.analysis import rms
u = mda.Universe(TPR, XTC)
# (2) instantiate analysis
rmsd = rms.RMSD(u, select='name CA')
# (3) the run() method can select frames in different ways
# run on all frames (with progressbar)
rmsd.run(verbose=True)
# or start, stop, and step can be used
rmsd.run(start=2, stop=8, step=2)
# a list of frames to run the analysis on can be passed
rmsd.run(frames=[0,2,3,6,9])
# a list of booleans the same length of the trajectory can be used
rmsd.run(frames=[True, False, True, True, False, False, True, False,
False, True])
# (4) analyze the results, e.g., plot
t = rmsd.times
y = rmsd.results.rmsd[:, 2] # RMSD at column index 2, see docs
import matplotlib.pyplot as plt
plt.plot(t, y)
plt.xlabel("time (ps)")
plt.ylabel("RMSD (Å)")
4.2.1.2. Writing new analysis tools
In order to write new analysis tools, derive a class from AnalysisBase
and define at least the _single_frame()
method, as described in
AnalysisBase
.
See also
The chapter Writing your own trajectory analysis in the User Guide
contains a step-by-step example for writing analysis tools with
AnalysisBase
.
If your analysis is operating independently on each frame, you might consider
making it parallelizable via adding a get_supported_backends()
method,
and appropriate aggregation function for each of its results. For example, if
you have your _single_frame()
method storing important values under
self.results.timeseries
, you will write:
class MyAnalysis(AnalysisBase):
_analysis_algorithm_is_parallelizable = True
@classmethod
def get_supported_backends(cls):
return ('serial', 'multiprocessing', 'dask',)
def _get_aggregator(self):
return ResultsGroup(lookup={'timeseries': ResultsGroup.ndarray_vstack})
See MDAnalysis.analysis.results
for more on aggregating results.
See also
4.2.1.3. Classes
The MDAnalysis.results.Results
and AnalysisBase
classes
are the essential building blocks for almost all MDAnalysis tools in the
MDAnalysis.analysis
module. They aim to be easily useable and
extendable.
AnalysisFromFunction
and the analysis_class()
functions are
simple wrappers that make it even easier to create fully-featured analysis
tools if only the single-frame analysis function needs to be written.
- class MDAnalysis.analysis.base.AnalysisBase(trajectory, verbose=False, **kwargs)[source]
Base class for defining multi-frame analysis
The class is designed as a template for creating multi-frame analyses. This class will automatically take care of setting up the trajectory reader for iterating, and it offers to show a progress meter. Computed results are stored inside the
results
attribute.To define a new Analysis,
AnalysisBase
needs to be subclassed and_single_frame()
must be defined. It is also possible to define_prepare()
and_conclude()
for pre- and post-processing. All results should be stored as attributes of theMDAnalysis.analysis.results.Results
container.- Parameters:
trajectory (MDAnalysis.coordinates.base.ReaderBase) – A trajectory Reader
verbose (bool, optional) – Turn on more logging and debugging
- times
array of Timestep times. Only exists after calling
AnalysisBase.run()
- Type:
- frames
array of Timestep frame indices. Only exists after calling
AnalysisBase.run()
- Type:
- results
results of calculation are stored after call to
AnalysisBase.run()
- Type:
Results
Example
from MDAnalysis.analysis.base import AnalysisBase class NewAnalysis(AnalysisBase): def __init__(self, atomgroup, parameter, **kwargs): super(NewAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs) self._parameter = parameter self._ag = atomgroup def _prepare(self): # OPTIONAL # Called before iteration on the trajectory has begun. # Data structures can be set up at this time self.results.example_result = [] def _single_frame(self): # REQUIRED # Called after the trajectory is moved onto each new frame. # store an example_result of `some_function` for a single frame self.results.example_result.append(some_function(self._ag, self._parameter)) def _conclude(self): # OPTIONAL # Called once iteration on the trajectory is finished. # Apply normalisation and averaging to results here. self.results.example_result = np.asarray(self.example_result) self.results.example_result /= np.sum(self.result)
Afterwards the new analysis can be run like this
import MDAnalysis as mda from MDAnalysisTests.datafiles import PSF, DCD u = mda.Universe(PSF, DCD) na = NewAnalysis(u.select_atoms('name CA'), 35) na.run(start=10, stop=20) print(na.results.example_result) # results can also be accessed by key print(na.results["example_result"])
Changed in version 1.0.0: Support for setting start, stop, and step has been removed. These should now be directly passed to
AnalysisBase.run()
.Changed in version 2.0.0: Added
results
Changed in version 2.8.0: Added ability to run analysis in parallel using either a built-in backend (multiprocessing or dask) or a custom backends.BackendBase instance with an implemented apply method that is used to run the computations.
- _compute(indexed_frames: ndarray, verbose: bool | None = None, *, progressbar_kwargs={}) AnalysisBase [source]
Perform the calculation on a balanced slice of frames that have been setup prior to that using _setup_computation_groups()
- Parameters:
indexed_frames (np.ndarray) – np.ndarray of (n, 2) shape, where first column is frame iteration indices and second is frame numbers
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Added in version 2.8.0.
- _conclude()[source]
Finalize the results you’ve gathered.
Called at the end of the
run()
method to finish everything up.Notes
Aggregation of results from individual workers happens in
self.run()
, so here you have to implement everything as if you had a non-parallel run. If you want to enable proper aggregation for parallel runs for you analysis class, implementself._get_aggregator
and checkMDAnalysis.analysis.results
for how to use it.
- _configure_backend(backend: str | BackendBase, n_workers: int, unsupported_backend: bool = False) BackendBase [source]
Matches a passed backend string value with class attributes
parallelizable
andget_supported_backends()
to check if downstream calculations can be performed.- Parameters:
backend (Union[str, BackendBase]) –
- backend to be used:
str
is matched to a builtin backend (one of “serial”, “multiprocessing” and “dask”)BackendBase
subclass is checked for the presence of anapply()
method
n_workers (int) – positive integer with number of workers (processes, in case of built-in backends) to split the work between
unsupported_backend (bool, optional) – if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default
False
- Returns:
instance of a
BackendBase
class that will be used for computations- Return type:
- Raises:
ValueError – if
parallelizable
is set toFalse
but backend is notserial
ValueError – if
parallelizable
isTrue
and custom backend instance is used without specifyingunsupported_backend=True
ValueError – if your trajectory has associated parallelizable transformations but backend is not serial
ValueError – if
n_workers
was specified twice – in the run() method and durin__init__
of a custom backendValueError – if your backend object instance doesn’t have an
apply
method
Added in version 2.8.0.
- _define_run_frames(trajectory, start=None, stop=None, step=None, frames=None) slice | ndarray [source]
Defines limits for the whole run, as passed by self.run() arguments
- Parameters:
trajectory (mda.Reader) – a trajectory Reader
start (int, optional) – start frame of analysis, by default None
stop (int, optional) – stop frame of analysis, by default None
step (int, optional) – number of frames to skip between each analysed frame, by default None
frames (array_like, optional) – array of integers or booleans to slice trajectory; cannot be combined with
start
,stop
,step
; by default None
- Returns:
Appropriate slicer for the trajectory that would give correct iteraction order via trajectory[slicer]
- Return type:
Union[slice, np.ndarray]
- Raises:
ValueError – if both frames and at least one of
start
,stop
, orstep
is provided (i.e. set to notNone
value).
Added in version 2.8.0.
- _get_aggregator() ResultsGroup [source]
Returns a default aggregator that takes entire results if there is a single object, and raises ValueError otherwise
- Returns:
aggregating object
- Return type:
Added in version 2.8.0.
- _prepare()[source]
Set things up before the analysis loop begins.
Notes
self.results
is initialized already inself.__init__()
with an empty instance ofMDAnalysis.analysis.results.Results
object. You can still call your attributes as if they were usual ones,Results
just keeps track of that to be able to run a proper aggregation after a parallel run, if necessary.
- _prepare_sliced_trajectory(slicer: slice | ndarray)[source]
Prepares sliced trajectory for use in subsequent parallel computations: namely, assigns self._sliced_trajectory and its appropriate attributes, self.n_frames, self.frames and self.times.
- Parameters:
slicer (Union[slice, np.ndarray]) – appropriate slicer for the trajectory
Added in version 2.8.0.
- _setup_computation_groups(n_parts: int, start: int | None = None, stop: int | None = None, step: int | None = None, frames: slice | ndarray | None = None) list[ndarray] [source]
Splits the trajectory frames, defined by
start/stop/step
orframes
, inton_parts
even groups, preserving their indices.- Parameters:
n_parts (int) – number of parts to split the workload into
start (int, optional) – start frame
stop (int, optional) – stop frame
step (int, optional) – step size for analysis (1 means to read every frame)
frames (array_like, optional) – array of integers or booleans to slice trajectory;
frames
can only be used instead ofstart
,stop
, andstep
. Setting bothframes
and at least one ofstart
,stop
,step
to a non-default value will raise aValueError
.
- Raises:
ValueError – if both
frames
and at least one ofstart
,stop
, orframes
is provided (i.e., set to another value thanNone
)- Returns:
computation_groups – list of (n, 2) shaped np.ndarrays with frame indices and numbers
- Return type:
list[np.ndarray]
Added in version 2.8.0.
- _setup_frames(trajectory, start=None, stop=None, step=None, frames=None)[source]
Pass a Reader object and define the desired iteration pattern through the trajectory
- Parameters:
trajectory (mda.Reader) – A trajectory Reader
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; cannot be combined with
start
,stop
,step
Added in version 2.2.0.
- Raises:
ValueError – if both frames and at least one of
start
,stop
, orframes
is provided (i.e., set to another value thanNone
)
Changed in version 1.0.0: Added .frames and .times arrays as attributes
Changed in version 2.2.0: Added ability to iterate through trajectory by passing a list of frame indices in the frames keyword argument
Changed in version 2.8.0: Split function into two:
_define_run_frames()
and_prepare_sliced_trajectory()
: first one defines the limits for the whole run and is executed once duringrun()
in_setup_frames()
, second one prepares sliced trajectory for each of the workers and gets executed twice: one time in_setup_frames()
for the whole trajectory, second time in_compute()
for each of the computation groups.
- _single_frame()[source]
Calculate data from a single frame of trajectory
Don’t worry about normalising, just deal with a single frame. Attributes accessible during your calculations:
self._frame_index
: index of the frame in results arrayself._ts
– Timestep instanceself._sliced_trajectory
– trajectory that you’re iterating overself.results
–MDAnalysis.analysis.results.Results
instance holding run results initialized in_prepare()
.
- classmethod get_supported_backends()[source]
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...
torun()
method, or a custom object that hasapply
method (see documentation forrun()
):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBase
subclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True
.- Returns:
names of built-in backends that can be used in
run(backend=...)()
- Return type:
Added in version 2.8.0.
- property parallelizable
Boolean mark showing that a given class can be parallelizable with split-apply-combine procedure. Namely, if we can safely distribute
_single_frame()
to multiple workers and then combine them with a proper_conclude()
call. If set toFalse
, no backends except forserial
are supported.Note
If you want to check parallelizability of the whole class, without explicitly creating an instance of the class, see
_analysis_algorithm_is_parallelizable
. Note that you setting it to other value will break things if the algorithm behind the analysis is not trivially parallelizable.- Returns:
if a given
AnalysisBase
subclass instance is parallelizable with split-apply-combine, or not- Return type:
Added in version 2.8.0.
- run(start: int | None = None, stop: int | None = None, step: int | None = None, frames: Iterable | None = None, verbose: bool | None = None, n_workers: int | None = None, n_parts: int | None = None, backend: str | BackendBase | None = None, *, unsupported_backend: bool = False, progressbar_kwargs=None)[source]
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory;
frames
can only be used instead ofstart
,stop
, andstep
. Setting bothframes
and at least one ofstart
,stop
,step
to a non-default value will raise aValueError
.Added in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list. Available only forbackend='serial'
backend (Union[str, BackendBase], optional) –
By default, performs calculations in a serial fashion. Otherwise, user can choose a backend:
str
is matched to a builtin backend (one ofserial
,multiprocessing
anddask
), or aMDAnalysis.analysis.results.BackendBase
subclass.Added in version 2.8.0.
n_workers (int) –
positive integer with number of workers (processes, in case of built-in backends) to split the work between
Added in version 2.8.0.
n_parts (int, optional) –
number of parts to split computations across. Can be more than number of workers.
Added in version 2.8.0.
unsupported_backend (bool, optional) –
if you want to run your custom backend on a parallelizable class that has not been tested by developers, by default False
Added in version 2.8.0.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
Changed in version 2.8.0: Introduced
backend
,n_workers
,n_parts
andunsupported_backend
keywords, and refactored the method logic to support parallelizable execution.
- class MDAnalysis.analysis.base.AnalysisFromFunction(function, trajectory=None, *args, **kwargs)[source]
Create an
AnalysisBase
from a function working on AtomGroups- Parameters:
function (callable) – function to evaluate at each frame
trajectory (MDAnalysis.coordinates.Reader, optional) – trajectory to iterate over. If
None
the first AtomGroup found in args and kwargs is used as a source for the trajectory.*args (list) – arguments for function
**kwargs (dict) – arguments for function and
AnalysisBase
- results.frames
simulation frames used in analysis
- Type:
- results.times
simulation times used in analysis
- Type:
- results.timeseries
Results for each frame of the wrapped function, stored after call to
AnalysisFromFunction.run()
.- Type:
- Raises:
ValueError – if function has the same kwargs as
AnalysisBase
Example
def rotation_matrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] rot = AnalysisFromFunction(rotation_matrix, trajectory, mobile, ref).run() print(rot.results.timeseries)
Changed in version 1.0.0: Support for directly passing the start, stop, and step arguments has been removed. These should instead be passed to
AnalysisFromFunction.run()
.Changed in version 2.0.0: Former
results
are now stored asresults.timeseries
Changed in version 2.8.0: Added
get_supported_backends()
, introducing ‘serial’, ‘multiprocessing’ and ‘dask’ backends.- _conclude()[source]
Finalize the results you’ve gathered.
Called at the end of the
run()
method to finish everything up.Notes
Aggregation of results from individual workers happens in
self.run()
, so here you have to implement everything as if you had a non-parallel run. If you want to enable proper aggregation for parallel runs for you analysis class, implementself._get_aggregator
and checkMDAnalysis.analysis.results
for how to use it.
- _get_aggregator()[source]
Returns a default aggregator that takes entire results if there is a single object, and raises ValueError otherwise
- Returns:
aggregating object
- Return type:
Added in version 2.8.0.
- _prepare()[source]
Set things up before the analysis loop begins.
Notes
self.results
is initialized already inself.__init__()
with an empty instance ofMDAnalysis.analysis.results.Results
object. You can still call your attributes as if they were usual ones,Results
just keeps track of that to be able to run a proper aggregation after a parallel run, if necessary.
- _single_frame()[source]
Calculate data from a single frame of trajectory
Don’t worry about normalising, just deal with a single frame. Attributes accessible during your calculations:
self._frame_index
: index of the frame in results arrayself._ts
– Timestep instanceself._sliced_trajectory
– trajectory that you’re iterating overself.results
–MDAnalysis.analysis.results.Results
instance holding run results initialized in_prepare()
.
- classmethod get_supported_backends()[source]
Tuple with backends supported by the core library for a given class. User can pass either one of these values as
backend=...
torun()
method, or a custom object that hasapply
method (see documentation forrun()
):‘serial’: no parallelization
‘multiprocessing’: parallelization using multiprocessing.Pool
‘dask’: parallelization using dask.delayed.compute(). Requires installation of mdanalysis[dask]
If you want to add your own backend to an existing class, pass a
backends.BackendBase
subclass (see its documentation to learn how to implement it properly), and specifyunsupported_backend=True
.- Returns:
names of built-in backends that can be used in
run(backend=...)()
- Return type:
Added in version 2.8.0.
- MDAnalysis.analysis.base._filter_baseanalysis_kwargs(function, kwargs)[source]
Create two dictionaries with kwargs separated for function and
AnalysisBase
- Parameters:
function (callable) – function to be called
kwargs (dict) – keyword argument dictionary
- Returns:
base_args (dict) – dictionary of AnalysisBase kwargs
kwargs (dict) – kwargs without AnalysisBase kwargs
- Raises:
ValueError – if function has the same kwargs as
AnalysisBase
- MDAnalysis.analysis.base.analysis_class(function)[source]
Transform a function operating on a single frame to an
AnalysisBase
class.- Parameters:
function (callable) – function to evaluate at each frame
- results.frames
simulation frames used in analysis
- Type:
- results.times
simulation times used in analysis
- Type:
- results.timeseries
Results for each frame of the wrapped function, stored after call to
AnalysisFromFunction.run()
.- Type:
- Raises:
ValueError – if function has the same kwargs as
AnalysisBase
Examples
For use in a library, we recommend the following style
def rotation_matrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] RotationMatrix = analysis_class(rotation_matrix)
It can also be used as a decorator
@analysis_class def RotationMatrix(mobile, ref): return mda.analysis.align.rotation_matrix(mobile, ref)[0] rot = RotationMatrix(u.trajectory, mobile, ref).run(step=2) print(rot.results.timeseries)
Changed in version 2.0.0: Former
results
are now stored asresults.timeseries