7.3. RDKit topology parser — MDAnalysis.converters.RDKitParser

Converts an RDKit rdkit.Chem.rdchem.Mol into a MDAnalysis.core.Topology.

7.3.1. Classes

class MDAnalysis.converters.RDKitParser.RDKitParser(filename)[source]

For RDKit structures

Creates the following Attributes:
  • Atomids

  • Atomnames

  • Aromaticities

  • Elements

  • Masses

  • Bonds

  • Resids

  • Resnums

  • RSChirality

  • Segids

Guesses the following:
  • Atomtypes

Depending on RDKit’s input, the following Attributes might be present:
  • Charges

  • Resnames

  • AltLocs

  • ChainIDs

  • ICodes

  • Occupancies

  • Tempfactors

Attributes table:

RDKit attribute

MDAnalysis equivalent

atom.GetMonomerInfo().GetAltLoc()

altLocs

atom.GetIsAromatic()

aromaticities

atom.GetMonomerInfo().GetChainId()

chainIDs

atom.GetDoubleProp(‘_GasteigerCharge’) atom.GetDoubleProp(‘_TriposPartialCharge’)

charges

atom.GetSymbol()

elements

atom.GetMonomerInfo().GetInsertionCode()

icodes

atom.GetIdx()

indices

atom.GetMass()

masses

atom.GetMonomerInfo().GetName() atom.GetProp(‘_TriposAtomName’)

names

atom.GetProp(‘_CIPCode’)

chiralities

atom.GetMonomerInfo().GetOccupancy()

occupancies

atom.GetMonomerInfo().GetResidueName()

resnames

atom.GetMonomerInfo().GetResidueNumber()

resnums

atom.GetMonomerInfo().GetTempFactor()

tempfactors

atom.GetProp(‘_TriposAtomType’)

types

Raises

ValueError – If only part of the atoms have MonomerInfo available

New in version 2.0.0.

Changed in version 2.1.0: Added R/S chirality support

close()

Close the trajectory file.

convert_forces_from_native(force, inplace=True)

Conversion of forces array force from native to base units

Parameters
  • force (array_like) – Forces to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_forces_to_native(force, inplace=True)

Conversion of force array force from base to native units.

Parameters
  • force (array_like) – Forces to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_pos_from_native(x, inplace=True)

Conversion of coordinate array x from native units to base units.

Parameters
  • x (array_like) – Positions to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_pos_to_native(x, inplace=True)

Conversion of coordinate array x from base units to native units.

Parameters
  • x (array_like) – Positions to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_from_native(t, inplace=True)

Convert time t from native units to base units.

Parameters
  • t (array_like) – Time values to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_to_native(t, inplace=True)

Convert time t from base units to native units.

Parameters
  • t (array_like) – Time values to transform

  • inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_velocities_from_native(v, inplace=True)

Conversion of velocities array v from native to base units

Parameters
  • v (array_like) – Velocities to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

convert_velocities_to_native(v, inplace=True)

Conversion of coordinate array v from base to native units

Parameters
  • v (array_like) – Velocities to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

parse(**kwargs)[source]

Parse RDKit into Topology

Return type

MDAnalysis Topology object

units = {'length': None, 'time': None, 'velocity': None}

dict with units of of time and length (and velocity, force, … for formats that support it)

7.4. RDKit molecule I/O — MDAnalysis.converters.RDKit

Read coordinates data from an RDKit rdkit.Chem.rdchem.Mol with RDKitReader into an MDAnalysis Universe. Convert it back to a rdkit.Chem.rdchem.Mol with RDKitConverter.

Example

To read an RDKit molecule and then convert the AtomGroup back to an RDKit molecule:

>>> from rdkit import Chem
>>> import MDAnalysis as mda
>>> mol = Chem.MolFromMol2File("docking_poses.mol2", removeHs=False)
>>> u = mda.Universe(mol)
>>> u
<Universe with 42 atoms>
>>> u.trajectory
<RDKitReader with 10 frames of 42 atoms>
>>> u.atoms.convert_to("RDKIT")
<rdkit.Chem.rdchem.Mol object at 0x7fcebb958148>

Warning

The RDKit converter is currently experimental and may not work as expected for all molecules. Currently the converter accurately infers the structures of approximately 99% of the ChEMBL27 dataset. Work is currently ongoing on further improving this and updates to the converter are expected in future releases of MDAnalysis. Please see Issue #3339 and the RDKitConverter benchmark for more details.

7.4.1. Classes

class MDAnalysis.converters.RDKit.RDKitReader(filename, **kwargs)[source]

Coordinate reader for RDKit.

New in version 2.0.0.

Read coordinates from an RDKit molecule. Each conformer in the original RDKit molecule will be read as a frame in the resulting universe.

Parameters

filename (rdkit.Chem.rdchem.Mol) – RDKit molecule

units = {'length': 'Angstrom', 'time': None}

dict with units of of time and length (and velocity, force, … for formats that support it)

class MDAnalysis.converters.RDKit.RDKitConverter[source]

Convert MDAnalysis AtomGroup or Universe to RDKit Mol

MDanalysis attributes are stored in each RDKit Atom of the resulting molecule in two different ways:

  • in an AtomPDBResidueInfo object available through the GetMonomerInfo() method if it’s an attribute that is typically found in a PDB file,

  • directly as an atom property available through the GetProp() methods for the others.

Supported attributes:

MDAnalysis attribute

RDKit

altLocs

atom.GetMonomerInfo().GetAltLoc()

chainIDs

atom.GetMonomerInfo().GetChainId()

icodes

atom.GetMonomerInfo().GetInsertionCode()

names

atom.GetMonomerInfo().GetName() atom.GetProp(“_MDAnalysis_name”)

occupancies

atom.GetMonomerInfo().GetOccupancy()

resnames

atom.GetMonomerInfo().GetResidueName()

resids

atom.GetMonomerInfo().GetResidueNumber()

segindices

atom.GetMonomerInfo().GetSegmentNumber()

tempfactors

atom.GetMonomerInfo().GetTempFactor()

charges

atom.GetDoubleProp(“_MDAnalysis_charge”)

indices

atom.GetIntProp(“_MDAnalysis_index”)

segids

atom.GetProp(“_MDAnalysis_segid”)

types

atom.GetProp(“_MDAnalysis_type”)

Example

To access MDAnalysis properties:

>>> import MDAnalysis as mda
>>> from MDAnalysis.tests.datafiles import PDB_full
>>> u = mda.Universe(PDB_full)
>>> mol = u.select_atoms('resname DMS').convert_to('RDKIT')
>>> mol.GetAtomWithIdx(0).GetMonomerInfo().GetResidueName()
'DMS'

To create a molecule for each frame of a trajectory:

from MDAnalysisTests.datafiles import PSF, DCD
from rdkit.Chem.Descriptors3D import Asphericity

u = mda.Universe(PSF, DCD)
elements = mda.topology.guessers.guess_types(u.atoms.names)
u.add_TopologyAttr('elements', elements)
ag = u.select_atoms("resid 1-10")

for ts in u.trajectory:
    mol = ag.convert_to("RDKIT")
    x = Asphericity(mol)

Notes

The converter requires the Elements attribute to be present in the topology, else it will fail.

It also requires the bonds attribute, although they will be automatically guessed if not present.

Hydrogens should be explicit in the topology file. If this is not the case, use the parameter NoImplicit=False when using the converter to allow implicit hydrogens and disable inferring bond orders and charges.

Since one of the main use case of the converter is converting trajectories and not just a topology, creating a new molecule from scratch for every frame would be too slow so the converter uses a caching system. The cache only stores the 2 most recent AtomGroups that were converted, and is sensitive to the arguments that were passed to the converter. The number of objects cached can be changed with the function set_converter_cache_size(). However, ag.convert_to("RDKIT") followed by ag.convert_to("RDKIT", NoImplicit=False) will not use the cache since the arguments given are different. You can pass a cache=False argument to the converter to bypass the caching system.

The _MDAnalysis_index property of the resulting molecule corresponds to the index of the specific AtomGroup that was converted, which may not always match the index property.

To get a better understanding of how the converter works under the hood, please refer to the following RDKit UGM presentation:

There are some molecules containing specific patterns that the converter cannot currently tackle correctly. See Issue #3339 for more info.

New in version 2.0.0.

Changed in version 2.2.0: Improved the accuracy of the converter. Atoms in the resulting molecule now follow the same order as in the AtomGroup. The output of atom.GetMonomerInfo().GetName() now follows the guidelines for PDB files while the original name is still available through atom.GetProp("_MDAnalysis_name")

convert(obj, cache=True, NoImplicit=True, max_iter=200, force=False)[source]

Write selection at current trajectory frame to Mol.

Parameters
  • obj (AtomGroup or Universe) –

  • cache (bool) – Use a cached copy of the molecule’s topology when available. To be used, the cached molecule and the new one have to be made from the same AtomGroup selection and with the same arguments passed to the converter

  • NoImplicit (bool) – Prevent adding hydrogens to the molecule

  • max_iter (int) – Maximum number of iterations to standardize conjugated systems. See _rebuild_conjugated_bonds()

  • force (bool) – Force the conversion when no hydrogens were detected but NoImplicit=True. Useful for inorganic molecules mostly.

units = {'length': 'Angstrom', 'time': None}

dict with units of of time and length (and velocity, force, … for formats that support it)

MDAnalysis.converters.RDKit._infer_bo_and_charges(mol)[source]

Infer bond orders and formal charges from a molecule.

Since most MD topology files don’t explicitly retain information on bond orders or charges, it has to be guessed from the topology. This is done by looping over each atom and comparing its expected valence to the current valence to get the Number of Unpaired Electrons (NUE). If an atom has a negative NUE, it needs a positive formal charge (-NUE). If two neighbouring atoms have UEs, the bond between them most likely has to be increased by the value of the smallest NUE. If after this process, an atom still has UEs, it needs a negative formal charge of -NUE.

Parameters

mol (rdkit.Chem.rdchem.RWMol) – The molecule is modified inplace and must have all hydrogens added

Notes

This algorithm is order dependant. For example, for a carboxylate group R-C(-O)-O the first oxygen read will receive a double bond and the other one will be charged. It will also affect more complex conjugated systems.

MDAnalysis.converters.RDKit._standardize_patterns(mol, max_iter=200)[source]

Standardizes functional groups

Uses _rebuild_conjugated_bonds() to standardize conjugated systems, and SMARTS reactions for other functional groups. Due to the way reactions work, we first have to split the molecule by fragments. Then, for each fragment, we apply the standardization reactions. Finally, the fragments are recombined.

Parameters
  • mol (rdkit.Chem.rdchem.RWMol) – The molecule to standardize

  • max_iter (int) – Maximum number of iterations to standardize conjugated systems

Returns

mol – The standardized molecule

Return type

rdkit.Chem.rdchem.Mol

Notes

The following functional groups are transformed in this order:

Name

Reaction

conjugated

[*-:1]-[*:2]=[*:3]-[*-:4]>>[*+0:1]=[*:2]-[*:3]=[*+0:4]

conjugated N+

[N;X3;v3:1]-[*:2]=[*:3]-[*-:4]>>[N+:1]=[*:2]-[*:3]=[*+0:4]

conjugated O-

[O:1]=[#6+0,#7+:2]-[*:3]=[*:4]-[*-:5]>>[O-:1]-[*:2]=[*:3]-[*:4]=[*+0:5]

conjug. S=O

[O-:1]-[S;D4;v4:2]-[*:3]=[*:4]-[*-:5]>>[O+0:1]=[*:2]=[*:3]-[*:4]=[*+0:5]

Cterm

[C-;X2;H0:1]=[O:2]>>[C+0:1]=[O:2]

Nterm

[N-;X2;H1;$(N-[*^3]):1]>>[N+0:1]

keto-enolate

[#6-:1]-[#6:2]=[O:3]>>[#6+0:1]=[#6:2]-[O-:3]

arginine

[C-;v3:1]-[#7+0;v3;H2:2]>>[#6+0:1]=[#7+:2]

histidine

[#6+0;H0:1]=[#6+0:2]-[#7;X3:3]-[#6-;X3:4]>>[#6:1]=[#6:2]-[#7+:3]=[#6+0:4]

sulfone

[S;D4;!v6:1]-[*-:2]>>[S;v6:1]=[*+0:2]

charged N

[#7+0;X3:1]-[*-:2]>>[#7+:1]=[*+0:2]

MDAnalysis.converters.RDKit._rebuild_conjugated_bonds(mol, max_iter=200)[source]

Rebuild conjugated bonds without negatively charged atoms at the beginning and end of the conjugated system

Depending on the order in which atoms are read during the conversion, the _infer_bo_and_charges() function might write conjugated systems with a double bond less and both edges of the system as anions instead of the usual alternating single and double bonds. This function corrects this behaviour by using an iterative procedure. The problematic molecules always follow the same pattern: anion[-*=*]n-anion instead of *=[*-*=]n*, where n is the number of successive single and double bonds. The goal of the iterative procedure is to make n as small as possible by consecutively transforming anion-*=* into *=*-anion until it reaches the smallest pattern with n=1. This last pattern is then transformed from anion-*=*-anion to *=*-*=*. Since anion-*=* is the same as *=*-anion in terms of SMARTS, we can control that we don’t transform the same triplet of atoms back and forth by adding their indices to a list. This function also handles conjugated systems with triple bonds. The molecule needs to be kekulized first to also cover systems with aromatic rings.

Parameters
  • mol (rdkit.Chem.rdchem.RWMol) – The molecule to transform, modified inplace

  • max_iter (int) – Maximum number of iterations performed by the function

Notes

The molecule is modified inplace