5.1. Topology readers — MDAnalysis.topology

This submodule contains the topology readers. A topology file supplies the list of atoms in the system, their connectivity and possibly additional information such as B-factors, partial charges, etc. The details depend on the file format and not every topology file provides all (or even any) additional data. This data is made accessible through AtomGroup properties.

As a minimum, all topology parsers will provide atom ids, atom types, masses, resids, resnums and segids as well as assigning all atoms to residues and all residues to segments. For systems without residues and segments, this results in there being a single residue and segment to which all atoms belong. Often when data is not provided by a file, it will be guessed based on other data in the file. In the event that this happens, a UserWarning will always be issued.

The following table lists the currently supported topology formats along with the attributes they provide.

Table of Supported Topology Formats

Name

extension

attributes

remarks

CHARMM/XPLOR PSF

psf

resnames, names, types, charges, bonds, angles, dihedrals, impropers

MDAnalysis.topology.PSFParser

CHARMM CARD 1

crd

names, tempfactors, resnames,

“CARD” coordinate output from CHARMM; deals with either standard or EXTended format; MDAnalysis.topology.CRDParser

Brookhaven 1

pdb/ent

names, bonds, resids, resnums, types, chainids, occupancies, bfactors, resids, icodes, resnames, segids,

a simplified PDB format (as used in MD simulations) is read by default

XPDB 1

pdb

As PDB except icodes

Extended PDB format (can use 5-digit residue numbers). To use, specify the format “XPBD” explicitly: Universe(..., topology_format="XPDB"). Module MDAnalysis.coordinates.PDB

PQR 1

pqr

names, charges, types, radii, resids, resnames, icodes, segids

PDB-like but whitespace-separated files with charge and radius information; MDAnalysis.topology.PQRParser

PDBQT 1

pdbqt

names, types, altLocs, charges, resnames, resids, icodes, occupancies, tempfactors, segids,

file format used by AutoDock with atom types and partial charges. Module: MDAnalysis.topology.PDBQTParser

GROMOS96 1

gro

names, resids, resnames,

GROMOS96 coordinate file; MDAnalysis.topology.GROParser

AMBER

top, prmtop, parm7

names, charges type_indices, types, resnames,

simple AMBER format reader (only supports a subset of flags); MDAnalysis.topology.TOPParser

DESRES 1

dms

names, numbers, masses, charges, chainids, resids, resnames, segids, radii,

DESRES molecular sturcture reader (only supports the atom and bond records); MDAnalysis.topology.DMSParser

TPR 2

tpr

names, types, resids, resnames, charges, bonds, masses, moltypes, molnums

Gromacs portable run input reader (limited experimental support for some of the more recent versions of the file format); MDAnalysis.topology.TPRParser

MOL2 1

mol2

ids, names, types, resids, charges, bonds, resnames,

Tripos MOL2 molecular structure format; MDAnalysis.topology.MOL2Parser

LAMMPS 1

data

ids, types, masses, charges, resids, bonds, angles, dihedrals

LAMMPS Data file parser MDAnalysis.topology.LAMMPSParser

LAMMPS 1

lammpsdump

id, masses

LAMMPS ascii dump file reader MDAnalysis.topology.LAMMPSParser

XYZ 1

xyz

names

XYZ File Parser. Reads only the labels from atoms and constructs minimal topology data. MDAnalysis.topology.XYZParser

TXYZ 1

txyz, arc

names, atomids, masses, types, bonds

Tinker XYZ File Parser. Reads atom labels, numbers and connectivity; masses are guessed from atoms names. MDAnalysis.topology.TXYZParser

GAMESS 1

gms, log

names, atomic charges,

GAMESS output parser. Read only atoms of assembly section (atom, elems and coords) and construct topology. MDAnalysis.topology.GMSParser

DL_Poly 1

config, history

ids, names

DL_Poly CONFIG or HISTORY file. Reads only the atom names. If atoms are written out of order, will correct the order. MDAnalysis.topology.DLPolyParser

Hoomd XML

xml

types, charges, radii, masses bonds, angles, dihedrals

HOOMD XML topology file. Reads atom types, masses, and charges if possible. Also reads bonds, angles, and dihedrals. MDAnalysis.topology.HoomdXMLParser

GSD 1

gsd

types, charges, radii, masses bonds, angles, dihedrals

GSD topology file. Reads atom types, masses, and charges if possible. Also reads bonds, angles, and dihedrals. MDAnalysis.topology.GSDParser

Macromolecular

mmtf

altLocs,

Macromolecular Transmission Format (MMTF).

transmission

bfactors, bonds,

An efficient compact format for biomolecular

format

charges, masses, names, occupancies, types, icodes, resnames, resids, segids, models

structures.

1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

This format can also be used to provide coordinates so that it is possible to create a full Universe by simply providing a file of this format as the sole argument to Universe: u = Universe(filename)

2

The Gromacs TPR format contains coordinate information but parsing coordinates from a TPR file is currently not implemented in TPRParser.

5.1.1. Developer Notes

New in version 0.8.

Changed in version 0.16.0: The new array-based topology system completely replaced the old system that was based on a list of Atom instances.

Topology information consists of data that do not change over time, i.e. information that is the same for all time steps of a trajectory. This includes

  • identity of atoms (name, type, number, partial charge, …) and to which residue and segment they belong; atoms are identified in MDAnalysis by their index, an integer number starting at 0 and incremented in the order of atoms found in the topology.

  • bonds (pairs of atoms)

  • angles (triplets of atoms)

  • dihedral angles (quadruplets of atoms) — proper and improper dihedrals should be treated separately

Topology readers are generally called “parsers” in MDAnalysis (for historical reasons and in order to distinguish them from coordinate “readers”). All parsers are derived from MDAnalysis.topology.base.TopologyReaderBase and have a parse() method that returns a MDAnalysis.core.topology.Topology instance.

5.1.1.1. atoms

The atoms appear to the user as an array of Atom instances. However, under the hood this is essentially only an array of atom indices that are used to index the various components of the topology database Topology. The parser needs to initialize the Topology with the data read from the topology file.

See also

Topology system

5.1.1.2. bonds

Bonds are represented as a tuple of tuple. Each tuple contains two atom numbers, which indicate the atoms between which the bond is formed. Only one of the two permutations is stored, typically the one with the lower atom number first.

5.1.1.3. bondorder

Some bonds have additional information called order. When available this is stored in a dictionary of format {bondtuple:order}. This extra information is then passed to Bond initialisation in u._init_bonds()

5.1.1.4. angles

Angles are represented by a list of tuple. Each tuple contains three atom numbers. The second of these numbers represents the apex of the angle.

5.1.1.5. dihedrals

Proper dihedral angles are represented by a list of tuple. Each tuple contains four atom numbers. The angle of the torsion is defined by the angle between the planes formed by atoms 1, 2, and 3, and 2, 3, and 4.

5.1.1.6. impropers

Improper dihedral angles are represented by a list of tuple. Each tuple contains four atom numbers. The angle of the improper torsion is again defined by the angle between the planes formed by atoms 1, 2, and 3, and 2, 3, and 4. Improper dihedrals differ from regular dihedrals as the four atoms need not be sequentially bonded, and are instead often all bonded to the second atom.