4. Analysis modules
The MDAnalysis.analysis
module contains code to carry out specific
analysis functionality for MD trajectories. It is based on the core
functionality (i.e. trajectory I/O, selections etc). The analysis modules can
be used as examples for how to use MDAnalysis but also as working code for
research projects; typically all contributed code has been used by the authors
in their own work.
4.1. Getting started with analysis
See also
The User Guide: Analysis contains extensive documentation of the analysis capabilities with user-friendly examples.
4.1.1. Using the analysis classes
Most analysis tools in MDAnalysis are written as a single class. An analysis usually follows the same pattern:
Import the desired module, since analysis modules are not imported by default.
Initialize the analysis class instance from the previously imported module.
Run the analysis with the
run()
method, optionally for specific trajectory slices.Access the analysis from the
results
attribute
from MDAnalysis.analysis import ExampleAnalysisModule # (e.g. RMSD)
analysis_obj = ExampleAnalysisModule.AnalysisClass(universe, ...)
analysis_obj.run(start=start_frame, stop=stop_frame, step=step)
print(analysis_obj.results)
Please see the individual module documentation for any specific caveats and also read and cite the reference papers associated with these algorithms.
4.1.2. Using parallelization for built-in analysis runs
Added in version 2.8.0.
AnalysisBase
subclasses can run on a backend
that supports parallelization (see MDAnalysis.analysis.backends
). All
analysis runs use backend='serial'
by default, i.e., they do not use
parallelization by default, which has been standard before release 2.8.0 of
MDAnalysis.
Without any dependencies, only one backend is supported – built-in
multiprocessing
, that processes parts of a trajectory running separate
processes, i.e. utilizing multi-core processors properly.
Note
For now, parallelization has only been added to
MDAnalysis.analysis.rms.RMSD
, but by release 3.0 version it will be
introduced to all subclasses that can support it.
In order to use that feature, simply add backend='multiprocessing'
to your
run, and supply it with proper n_workers
(use multiprocessing.cpu_count()
for maximum available on your machine):
import multiprocessing
import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF, DCD
from MDAnalysis.analysis.rms import RMSD
from MDAnalysis.analysis.align import AverageStructure
# initialize the universe
u = mda.Universe(PSF, DCD)
# calculate average structure for reference
avg = AverageStructure(mobile=u).run()
ref = avg.results.universe
# initialize RMSD run
rmsd = RMSD(u, ref, select='backbone')
rmsd.run(backend='multiprocessing', n_workers=multiprocessing.cpu_count())
For now, you have to be explicit and specify both backend
and n_workers
,
since the feature is new and there are no good defaults for it. For example,
if you specify a too big n_workers, and your trajectory frames are big,
you might get and out-of-memory error when executing your run.
You can also implement your own backends – see MDAnalysis.analysis.backends
.
4.1.3. Additional dependencies
Some of the modules in MDAnalysis.analysis
require additional Python
packages to enable full functionality. For example,
MDAnalysis.analysis.encore
provides more options if scikit-learn is
installed. If you installed MDAnalysis with pip (see
Installing MDAnalysis) these packages are not automatically
installed although one can add the [analysis]
tag to the pip
command to force their installation. If you installed MDAnalysis with
conda then a full set of dependencies is automatically installed.
Other modules require external programs. For instance, the
MDAnalysis.analysis.hole2
module requires an installation of the HOLE
suite of programs. You will need to install these external dependencies by
following their installation instructions before you can use the corresponding
MDAnalysis module.
4.2. Building blocks for Analysis
The building block for the analysis modules is
MDAnalysis.analysis.base.AnalysisBase
.
To build your own analysis class start by reading the documentation.
4.3. Distances and contacts
- 4.3.1. Coordinate fitting and alignment —
MDAnalysis.analysis.align
- 4.3.2. Native contacts analysis —
MDAnalysis.analysis.contacts
- 4.3.3. Distance analysis —
MDAnalysis.analysis.distances
- 4.3.4. Simple atomic distance analysis —
MDAnalysis.analysis.atomicdistances
- 4.3.5. Calculating root mean square quantities —
MDAnalysis.analysis.rms
- 4.3.6. Calculating path similarity —
MDAnalysis.analysis.psa
- 4.3.7. ENCORE Ensemble Similarity Calculations —
MDAnalysis.analysis.encore
- 4.3.8. Bond-Angle-Torsion coordinates analysis —
MDAnalysis.analysis.bat
4.4. Hydrogen bonding
Deprecated modules:
4.5. Membranes and membrane proteins
4.6. Nucleic acids
4.7. Polymers
4.8. Structure
4.8.1. Macromolecules
4.8.2. Liquids
4.9. Volumetric analysis
4.10. Dimensionality Reduction
4.11. Legacy analysis modules
The MDAnalysis.analysis.legacy
module contains code that for a
range of reasons is not as well maintained and tested as the other
analysis modules. Use with care.