1. Overview over MDAnalysis

MDAnalysis is a Python package that provides classes to access data in molecular dynamics trajectories. It is object oriented so it treats atoms, groups of atoms, trajectories, etc as different objects. Each object has a number of operations defined on itself (also known as “methods”) and also contains values describing the object (“attributes”). For example, a AtomGroup object has a center_of_mass() method that returns the center of mass of the group of atoms. It also contains an attribute called residues that lists all the residues that belong to the group. Using methods such as select_atoms() (which uses CHARMM-style atom Selection commands) one can create new objects (in this case, another AtomGroup).

A typical usage pattern is to iterate through a trajectory and analyze coordinates for every frame. In the following example the end-to-end distance of a protein and the radius of gyration of the backbone atoms are calculated:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF,DCD  # test trajectory
import numpy.linalg
u = mda.Universe(PSF,DCD)  # always start with a Universe
nterm = u.select_atoms('segid 4AKE and name N')[0]  # can access structure via segid (s4AKE) and atom name
cterm = u.select_atoms('segid 4AKE and name C')[-1]  # ... takes the last atom named 'C'
bb = u.select_atoms('protein and backbone')  # a selection (a AtomGroup)
for ts in u.trajectory:  # iterate through all frames
    r = cterm.position - nterm.position  # end-to-end vector from atom positions
    d = numpy.linalg.norm(r)   # end-to-end distance
    rgyr = bb.radius_of_gyration()  # method of a AtomGroup; updates with each frame
    print(f"frame = {ts.frame}: d = {d} Angstroem, Rgyr = {rgyr} Angstroem")

1.1. Using MDAnalysis in python

If you’ve installed MDAnalysis in the standard python modules location, load from within the interpreter:

from MDAnalysis import *

or

import MDAnalysis as mda

The idea behind MDAnalysis is to get trajectory data into NumPy numpy.ndarray arrays, where it can then be easily manipulated using all the power in NumPy and SciPy.

MDAnalysis works well both in scripts and in interactive use. The developers very much recommend using MDAnalysis from within the IPython Python shell. It allows one to interactively explore the objects (using TAB-completion and online help), do analysis and immediately plot results. The examples in this manual are typically run from an interactive ipython session.

Invariably, a MDAnalysis session starts with loading data into the Universe class (which can be accessed as MDAnalysis.Universe):

import MDAnalysis as mda
universe = mda.Universe(topology, trajectory)
  • The topology file lists the atoms and residues (and also their connectivity). It can be a CHARMM/XPLOR/NAMD PSF file or a coordinate file such as a Protein Databank Brookhaven PDB file, a CHARMM card coordinate file (CRD), or a GROMOS/Gromacs GRO file.

  • The trajectory contains a list of coordinates in the order defined in the topology. It can either be a single frame (PDB, CRD, and GRO are all read) or a time series of coordinate frames such as a CHARMM/NAMD/LAMMPS DCD binary file, a Gromacs XTC/TRR trajectory, or a XYZ trajectory (possibly compressed with gzip or bzip2).

For the remainder of this introduction we are using a short example trajectory that is provided with MDAnalysis (as part of the MDAnalysis test suite). The trajectory is loaded with

>>> from MDAnalysis import Universe
>>> from MDAnalysis.tests.datafiles import PSF,DCD
>>> u = Universe(PSF, DCD)

(The >>> signs are the Python input prompt and are not to be typed; they just make clear in the examples what is input and what is output.)

The Universe contains a number of important attributes, the most important ones of which is atoms:

>>> print(u.atoms)
<AtomGroup with 3341 atoms>

Universe.atoms is a AtomGroup and can be thought of as list consisting of Atom objects. The Atom is the elementary and fundamental object in MDAnalysis.

The MDAnalysis.Universe.trajectory attribute gives access to the coordinates over time:

>>> print(u.trajectory)
< DCDReader '/..../MDAnalysis/tests/data/adk_dims.dcd' with 98 frames of 3341 atoms (0 fixed) >

Finally, the MDAnalysis.Universe.select_atoms() method generates a new AtomGroup according to a selection criterion:

>>> calphas = u.select_atoms("name CA")
>>> print(calphas)
<AtomGroup with 214 atoms>

as described in Selection commands.

1.2. Examples

The easiest way to get started with MDAnalysis is to read this introduction and the chapters on The topology system and Selection commands, then explore the package interactively in IPython or another interactive Python interpreter.

1.2.1. Included trajectories

MDAnalysis comes with a number of real trajectories for testing. You can also use them to explore the functionality and ensure that everything is working properly:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF,DCD, PDB,XTC
u_dims_adk = mda.Universe(PSF,DCD)
u_eq_adk = mda.Universe(PDB, XTC)

The PSF and DCD file are a closed-form-to-open-form transition of Adenylate Kinase (from [Beckstein2009]) and the PDB+XTC file are ten frames from a Gromacs simulation of AdK solvated in TIP4P water with the OPLS/AA force field.

[Beckstein2009]

O. Beckstein, E.J. Denning, J.R. Perilla, and T.B. Woolf. Zipping and Unzipping of Adenylate Kinase: Atomistic Insights into the Ensemble of Open <–> Closed Transitions. J Mol Biol 394 (2009), 160–176, doi:10.1016/j.jmb.2009.09.009

1.2.2. Code snippets

The source code distribution comes with a directory examples that contains a number of code snippets that show how to use certain aspects of MDAnalysis.

For instance, there is code that shows how to

and more.