1. Overview over MDAnalysis
MDAnalysis is a Python package that provides classes to access
data in molecular dynamics trajectories. It is object oriented so it
treats atoms, groups of atoms, trajectories, etc as different
objects. Each object has a number of operations defined on itself
(also known as “methods”) and also contains values describing the
object (“attributes”). For example, a
AtomGroup
object has a
center_of_mass()
method that
returns the center of mass of the group of atoms. It also contains an
attribute called residues
that lists all the residues that belong to the group. Using methods
such as select_atoms()
(which uses CHARMM-style atom Selection commands) one
can create new objects (in this case, another
AtomGroup
).
A typical usage pattern is to iterate through a trajectory and analyze coordinates for every frame. In the following example the end-to-end distance of a protein and the radius of gyration of the backbone atoms are calculated:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF,DCD # test trajectory
import numpy.linalg
u = mda.Universe(PSF,DCD) # always start with a Universe
nterm = u.select_atoms('segid 4AKE and name N')[0] # can access structure via segid (s4AKE) and atom name
cterm = u.select_atoms('segid 4AKE and name C')[-1] # ... takes the last atom named 'C'
bb = u.select_atoms('protein and backbone') # a selection (a AtomGroup)
for ts in u.trajectory: # iterate through all frames
r = cterm.pos - nterm.pos # end-to-end vector from atom positions
d = numpy.linalg.norm(r) # end-to-end distance
rgyr = bb.radius_of_gyration() # method of a AtomGroup; updates with each frame
print(f"frame = {ts.frame}: d = {d} Angstroem, Rgyr = {rgyr} Angstroem")
1.1. Using MDAnalysis in python
If you’ve installed MDAnalysis in the standard python modules location, load from within the interpreter:
from MDAnalysis import *
or
import MDAnalysis as mda
The idea behind MDAnalysis is to get trajectory data into NumPy
numpy.ndarray
arrays, where it can then be easily manipulated using
all the power in NumPy and SciPy.
MDAnalysis works well both in scripts and in interactive use. The developers very much recommend using MDAnalysis from within the IPython Python shell. It allows one to interactively explore the objects (using TAB-completion and online help), do analysis and immediately plot results. The examples in this manual are typically run from an interactive ipython session.
Invariably, a MDAnalysis session starts with loading data into the
Universe
class (which can be accessed
as MDAnalysis.Universe
):
import MDAnalysis as mda
universe = mda.Universe(topology, trajectory)
The topology file lists the atoms and residues (and also their connectivity). It can be a CHARMM/XPLOR/NAMD PSF file or a coordinate file such as a Protein Databank Brookhaven PDB file, a CHARMM card coordinate file (CRD), or a GROMOS/Gromacs GRO file.
The trajectory contains a list of coordinates in the order defined in the topology. It can either be a single frame (PDB, CRD, and GRO are all read) or a time series of coordinate frames such as a CHARMM/NAMD/LAMMPS DCD binary file, a Gromacs XTC/TRR trajectory, or a XYZ trajectory (possibly compressed with gzip or bzip2).
For the remainder of this introduction we are using a short example trajectory that is provided with MDAnalysis (as part of the MDAnalysis test suite). The trajectory is loaded with
>>> from MDAnalysis import Universe
>>> from MDAnalysis.tests.datafiles import PSF,DCD
>>> u = Universe(PSF, DCD)
(The >>>
signs are the Python input prompt and are not to be typed; they
just make clear in the examples what is input and what is output.)
The Universe
contains a number of important attributes,
the most important ones of which is
atoms
:
>>> print(u.atoms)
<AtomGroup with 3341 atoms>
Universe.atoms
is a
AtomGroup
and can be thought of as
list consisting of Atom
objects. The Atom
is the
elementary and fundamental object in MDAnalysis.
The MDAnalysis.Universe.trajectory
attribute gives access to the coordinates
over time:
>>> print(u.trajectory)
< DCDReader '/..../MDAnalysis/tests/data/adk_dims.dcd' with 98 frames of 3341 atoms (0 fixed) >
Finally, the MDAnalysis.Universe.select_atoms()
method generates a new
AtomGroup
according to a selection criterion
>>> calphas = u.select_atoms("name CA")
>>> print(calphas)
<AtomGroup with 214 atoms>
as described in Selection commands.
1.2. Examples
The easiest way to get started with MDAnalysis is to read this introduction and the chapters on The topology system and Selection commands, then explore the package interactively in IPython or another interactive Python interpreter.
1.2.1. Included trajectories
MDAnalysis comes with a number of real trajectories for testing. You can also use them to explore the functionality and ensure that everything is working properly:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF,DCD, PDB,XTC
u_dims_adk = mda.Universe(PSF,DCD)
u_eq_adk = mda.Universe(PDB, XTC)
The PSF and DCD file are a closed-form-to-open-form transition of Adenylate Kinase (from [Beckstein2009]) and the PDB+XTC file are ten frames from a Gromacs simulation of AdK solvated in TIP4P water with the OPLS/AA force field.
- Beckstein2009
O. Beckstein, E.J. Denning, J.R. Perilla, and T.B. Woolf. Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of Open <–> Closed Transitions. J Mol Biol 394 (2009), 160–176, doi:10.1016/j.jmb.2009.09.009
1.2.2. Code snippets
The source code distribution comes with a directory examples that contains a number of code snippets that show how to use certain aspects of MDAnalysis.
For instance, there is code that shows how to
fit a trajectory to a reference structure using the QCP RMSD-alignment code in
MDAnalysis.core.qcprot
(rmsfit_qcp.py);do a block-averaging error analysis (blocks.py);
calculate a potential profile across a membrane (potential_profile.py);
do a native contact analysis using
MDAnalysis.analysis.contacts
(nativecontacts.py)get the lipid composition of the individual leaflets of a bilayer using
MDAnalysis.analysis.leaflet
(membrane-leaflets.py);define the multimeric states of a number of transmembrane peptides via clustering (multimers-analysis.py);
convert between trajectory formats (e.g. dcd2xtc.py or amber2dcd.py)
use MDAnalysis for simple model building (make_MthK_tetramer.py);
and more.