4. Analysis modules
The MDAnalysis.analysis
module provides a wide collection of analysis tools for
molecular dynamics trajectories. These modules build upon MDAnalysis core functionality
(trajectory I/O, selections, etc.) and are designed both for reuse in research workflows
and as examples of using the MDAnalysis API. Each module typically defines an analysis
class that follows a standard interface.
See the User Guide Analysis section for interactive examples and additional context.
4.1. Getting started with analysis
4.1.1. General usage pattern
Most analysis tools are implemented as single classes and follow this usage pattern:
Import the module (e.g.,
MDAnalysis.analysis.rms
).Initialize the analysis class with the required arguments.
Run the analysis with
run()
.Access results via the
results
attribute.
from MDAnalysis.analysis import ExampleAnalysisModule # (e.g. RMSD)
analysis_obj = ExampleAnalysisModule.AnalysisClass(universe, ...)
analysis_obj.run(start=start_frame, stop=stop_frame, step=step)
print(analysis_obj.results)
Please see the individual module documentation for any specific caveats and also read and cite the reference papers associated with these algorithms.
4.1.2. Using parallelization for analysis tools
Added in version 2.8.0.
Many analysis tools (based on AnalysisBase
)
can be run in parallel using a simple
split-apply-combine scheme whereby slices of the trajectory (“split”) are analyzed in
parallel (“apply” the analysis function) and the data from the parallel executions
are “combined” at the end.
MDAnalysis supports different backends for the parallel execution such as
multiprocessing
or dask (see MDAnalysis.analysis.backends
).
As a special case, serial execution is handled by the default backend='serial'
, i.e.,
by default, none of the analysis tools run in parallel and one has to explicitly request
parallel execution. Without any additionally installed dependencies, only one parallel backend
is supported – Python multiprocessing
(which is available in the Python standard
library), which processes each slice of a trajectory by running a separate process on a
different core of a multi-core CPU.
Note
Not all analysis tools in MDAnalysis can be parallelized and others have not yet been updated to make use of the parallelization framework, which was introduced in release 2.8.0. MDAnalysis aims to have parallelization enabled for all analysis tools that support it by release 3.0.
In order to use parallelization, add backend='multiprocessing'
to the arguments of the
run()
method together with n_workers=N
where
N
is the number of CPUs that you want to use for parallelization.
(You can use multiprocessing.cpu_count()
to get the maximum available number of CPUs on your
machine but this may not always lead to the best performance because of computational overheads and
the fact that parallel access to a single trajectory file is often a performance bottleneck.) As an
example we show how to run an RMSD calculation in parallel:
import multiprocessing
import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF, DCD
from MDAnalysis.analysis.rms import RMSD
from MDAnalysis.analysis.align import AverageStructure
# initialize the universe
u = mda.Universe(PSF, DCD)
# calculate average structure for reference
avg = AverageStructure(mobile=u).run()
ref = avg.results.universe
# initialize RMSD run
rmsd = RMSD(u, ref, select='backbone')
rmsd.run(backend='multiprocessing', n_workers=multiprocessing.cpu_count())
Be explicit and specify both backend
and n_workers
. Choosing too many
workers or using large trajectory frames may lead to an out-of-memory error.
You can also implement your own backends – see MDAnalysis.analysis.backends
.
See also
Parallel analysis for technical details
4.1.3. Additional dependencies
Some of the modules in MDAnalysis.analysis
require additional Python
packages to enable full functionality. For example,
MDAnalysis.analysis.encore
provides more options if scikit-learn is
installed. If you installed MDAnalysis with pip (see
Installing and using MDAnalysis) these packages are not automatically
installed although one can add the [analysis]
tag to the pip
command to force their installation. If you installed MDAnalysis with
conda then a full set of dependencies is automatically installed.
Other modules require external programs. For instance, the
MDAnalysis.analysis.hole2
module requires an installation of the HOLE
suite of programs. You will need to install these external dependencies by
following their installation instructions before you can use the corresponding
MDAnalysis module.
4.2. Building blocks for Analysis
The building block for the analysis modules is
MDAnalysis.analysis.base.AnalysisBase
.
To build your own analysis class start by reading the documentation.
4.3. Distances and contacts
- 4.3.1. Coordinate fitting and alignment —
MDAnalysis.analysis.align
- 4.3.2. Native contacts analysis —
MDAnalysis.analysis.contacts
- 4.3.3. Distance analysis —
MDAnalysis.analysis.distances
- 4.3.4. Simple atomic distance analysis —
MDAnalysis.analysis.atomicdistances
- 4.3.5. Calculating root mean square quantities —
MDAnalysis.analysis.rms
- 4.3.6. Calculating path similarity —
MDAnalysis.analysis.psa
- 4.3.7. ENCORE Ensemble Similarity Calculations —
MDAnalysis.analysis.encore
- 4.3.8. Bond-Angle-Torsion coordinates analysis —
MDAnalysis.analysis.bat
4.4. Hydrogen bonding
Deprecated modules:
4.5. Membranes and membrane proteins
4.6. Nucleic acids
4.7. Polymers
4.8. Structure
4.8.1. Macromolecules
4.8.2. Liquids
4.9. Volumetric analysis
4.10. Dimensionality Reduction
4.11. Legacy analysis modules
The MDAnalysis.analysis.legacy
module contains code that for a
range of reasons is not as well maintained and tested as the other
analysis modules. Use with care.