5.1. Topology readers — MDAnalysis.topology
This submodule contains the topology readers. A topology file supplies the list of atoms in the system, their connectivity and possibly additional information such as B-factors, partial charges, etc. The details depend on the file format and not every topology file provides all (or even any) additional data. This data is made accessible through AtomGroup properties.
As a minimum, all topology parsers will provide atom ids, atom types, masses, resids, resnums and segids as well as assigning all atoms to residues and all residues to segments. For systems without residues and segments, this results in there being a single residue and segment to which all atoms belong. Often when data is not provided by a file, it will be guessed based on other data in the file. In the event that this happens, a UserWarning will always be issued.
The following table lists the currently supported topology formats along with the attributes they provide.
Name |
extension |
attributes |
remarks |
---|---|---|---|
CHARMM/XPLOR PSF |
psf |
resnames, names, types, charges, bonds, angles, dihedrals, impropers |
CHARMM/XPLOR/NAMD topology format;
|
CHARMM CARD 1 |
crd |
names, tempfactors, resnames, |
“CARD” coordinate output from CHARMM; deals with
either standard or EXTended format;
|
Brookhaven 1 |
pdb/ent |
names, bonds, resids, resnums, types, chainids, occupancies, tempfactors, resids, icodes, resnames, segids, |
a simplified PDB format (as used in MD simulations) is read by default |
XPDB 1 |
pdb |
As PDB except icodes |
Extended PDB format as used by e.g., NAMD
(can use 5-digit residue numbers). To use, specify
the format “XPBD” explicitly:
|
PQR 1 |
pqr |
names, charges, types, radii, resids, resnames, icodes, segids |
PDB-like but whitespace-separated files with charge
and radius information as used by, e.g., APBS.
|
PDBQT 1 |
pdbqt |
names, types, altLocs, charges, resnames, resids, icodes, occupancies, tempfactors, segids, |
file format used by AutoDock with atom types and
partial charges. Module:
|
GROMOS96 1 |
gro |
names, resids, resnames, |
GROMOS96 coordinate file (used e.g., by Gromacs)
|
Amber |
top, prmtop, parm7 |
names, charges type_indices, types, resnames, |
simple Amber format reader (only supports a subset
of flags);
|
DESRES 1 |
dms |
names, numbers, masses, charges, chainids, resids, resnames, segids, radii, |
DESRES molecular structure reader (only supports
the atom and bond records) as used by Desmond and Anton;
|
TPR 2 |
tpr |
names, types, resids, resnames, charges, bonds, masses, moltypes, molnums |
Gromacs portable run input reader (limited
experimental support for some of the more recent
versions of the file format);
|
ITP |
itp |
names, types, resids, resnames, charges, bonds, masses, segids, moltypes, chargegroups |
Gromacs include topology file;
|
MOL2 1 |
mol2 |
ids, names, types, resids, charges, bonds, resnames, |
Tripos MOL2 molecular structure format;
|
LAMMPS 1 |
data |
ids, types, masses, charges, resids, bonds, angles, dihedrals |
LAMMPS Data file parser
|
LAMMPS 1 |
lammpsdump |
id, masses |
LAMMPS ascii dump file reader
|
XYZ 1 |
xyz |
names |
XYZ File Parser. Reads only the labels from atoms
and constructs minimal topology data.
|
TXYZ 1 |
txyz, arc |
names, atomids, masses, types, bonds |
Tinker XYZ File Parser. Reads atom labels, numbers
and connectivity; masses are guessed from atoms names.
|
GAMESS 1 |
gms, log |
names, atomic charges, |
GAMESS output parser. Read only atoms of assembly
section (atom, elems and coords) and construct
topology.
|
DL_POLY 1 |
config, history |
ids, names |
DL_POLY CONFIG or HISTORY file. Reads only the
atom names. If atoms are written out of order, will
correct the order.
|
Hoomd XML |
xml |
types, charges, radii, masses bonds, angles, dihedrals |
HOOMD XML topology file. Reads atom types,
masses, and charges if possible. Also reads bonds,
angles, and dihedrals.
|
GSD 1 |
gsd |
types, charges, radii, masses bonds, angles, dihedrals |
HOOMD GSD topology file. Reads atom types,
masses, and charges if possible. Also reads bonds,
angles, and dihedrals.
|
MMTF 1 |
mmtf |
altLocs, tempfactors, charges, masses, names, bonds, occupancies, types, icodes, resnames, resids, segids, models |
Macromolecular Transmission Format (MMTF). An efficient compact format for biomolecular structures. |
FHIAIMS 1 |
in |
names |
FHI-AIMS File Parser. Reads only the labels from
atoms and constructs minimal topology data.
|
- 1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)
This format can also be used to provide coordinates so that it is possible to create a full
Universe
by simply providing a file of this format as the sole argument toUniverse
:u = Universe(filename)
- 2
The Gromacs TPR format contains coordinate information but parsing coordinates from a TPR file is currently not implemented in
TPRParser
.
Note
Coordinates modules with the Table of supported coordinate formats
5.1.1. Developer Notes
New in version 0.8.
Changed in version 0.16.0: The new array-based topology system completely replaced the old system that was based on a list of Atom instances.
Changed in version 2.0.0: The ParmEdParser was moved to the converters
module
Topology information consists of data that do not change over time, i.e. information that is the same for all time steps of a trajectory. This includes
identity of atoms (name, type, number, partial charge, …) and to which residue and segment they belong; atoms are identified in MDAnalysis by their
index
, an integer number starting at 0 and incremented in the order of atoms found in the topology.bonds (pairs of atoms)
angles (triplets of atoms)
dihedral angles (quadruplets of atoms) — proper and improper dihedrals should be treated separately
Topology readers are generally called “parsers” in MDAnalysis (for
historical reasons and in order to distinguish them from coordinate
“readers”). All parsers are derived from
MDAnalysis.topology.base.TopologyReaderBase
and have a
parse()
method that
returns a MDAnalysis.core.topology.Topology
instance.
5.1.1.1. atoms
The atoms appear to the user as an array of
Atom
instances. However, under the
hood this is essentially only an array of atom indices that are used
to index the various components of the topology database
Topology
. The parser needs to
initialize the Topology
with the
data read from the topology file.
See also
5.1.1.2. bonds
Bonds are represented as a tuple
of tuple
. Each tuple
contains two atom numbers, which indicate the atoms between which the
bond is formed. Only one of the two permutations is stored, typically
the one with the lower atom number first.
5.1.1.3. bondorder
Some bonds have additional information called order. When available this is stored in a dictionary of format {bondtuple:order}. This extra information is then passed to Bond initialisation in u._init_bonds()
5.1.1.4. angles
Angles are represented by a list
of tuple
. Each
tuple contains three atom numbers. The second of these numbers
represents the apex of the angle.
5.1.1.5. dihedrals
Proper dihedral angles are represented by a list
of tuple
. Each
tuple contains four atom numbers. The angle of the torsion
is defined by the angle between the planes formed by atoms 1, 2, and 3,
and 2, 3, and 4.
5.1.1.6. impropers
Improper dihedral angles are represented by a list
of tuple
. Each
tuple contains four atom numbers. The angle of the improper torsion
is again defined by the angle between the planes formed by atoms 1, 2, and 3,
and 2, 3, and 4. Improper dihedrals differ from regular dihedrals as the
four atoms need not be sequentially bonded, and are instead often all bonded
to the second atom.