6.25. Reading trajectories from memory — MDAnalysis.coordinates.memory

Author

Wouter Boomsma

Year

2016

Copyright

GNU Public License v2

Maintainer

Wouter Boomsma <wb@di.ku.dk>, wouterboomsma on github

New in version 0.16.0.

The module contains a trajectory reader that operates on an array in memory, rather than reading from file. This makes it possible to operate on raw coordinates using existing MDAnalysis tools. In addition, it allows the user to make changes to the coordinates in a trajectory (e.g. through MDAnalysis.core.groups.AtomGroup.positions) without having to write the entire state to file.

6.25.1. How to use the MemoryReader

The MemoryReader can be used to either directly generate a trajectory as a numpy array or by transferring an existing trajectory to memory.

6.25.1.1. In-memory representation of arbitrary trajectories

If sufficient memory is available to hold a whole trajectory in memory then analysis can be sped up substantially by transferring the trajectory to memory.

The most straightforward use of the MemoryReader is to simply use the in_memory=True flag for the Universe class, which automatically transfers a trajectory to memory:

import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC

universe = mda.Universe(TPR, XTC, in_memory=True)

Of course, sufficient memory has to be available to hold the whole trajectory.

6.25.1.2. Switching a trajectory to an in-memory representation

The decision to transfer the trajectory to memory can be made at any time with the transfer_to_memory() method of a Universe:

universe = mda.Universe(TPR, XTC)
universe.transfer_to_memory()

This operation may take a while (with verbose=True a progress bar is displayed) but then subsequent operations on the trajectory directly operate on the in-memory array and will be very fast.

6.25.1.3. Constructing a Reader from a numpy array

The MemoryReader provides great flexibility because it becomes possible to create a Universe directly from a numpy array.

A simple example consists of a new universe created from the array extracted from a DCD timeseries():

import MDAnalysis as mda
from MDAnalysisTests.datafiles import DCD, PSF
from MDAnalysis.coordinates.memory import MemoryReader

universe = mda.Universe(PSF, DCD)

coordinates = universe.trajectory.timeseries(universe.atoms)
universe2 = mda.Universe(PSF, coordinates, format=MemoryReader, order='afc')

Creating an in-memory trajectory with AnalysisFromFunction()

The timeseries() is currently only implemented for the DCDReader. However, the MDAnalysis.analysis.base.AnalysisFromFunction() can provide the same functionality for any supported trajectory format:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB, XTC

from MDAnalysis.coordinates.memory import MemoryReader
from MDAnalysis.analysis.base import AnalysisFromFunction

u = mda.Universe(PDB, XTC)

coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
                                   u.atoms).run().results['timeseries']
u2 = mda.Universe(PDB, coordinates, format=MemoryReader)

6.25.1.4. Creating an in-memory trajectory of a sub-system

Creating a trajectory for just a selection of an existing trajectory requires the transfer of the appropriate coordinates as well as creation of a topology of the sub-system. For the latter one can use the Merge() function, for the former the load_new() method of a Universe together with the MemoryReader. In the following, an in-memory trajectory of only the protein is created:

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB, XTC

from MDAnalysis.coordinates.memory import MemoryReader
from MDAnalysis.analysis.base import AnalysisFromFunction

u = mda.Universe(PDB, XTC)
protein = u.select_atoms("protein")

coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
                                   protein).run().results['timeseries']
u2 = mda.Merge(protein)            # create the protein-only Universe
u2.load_new(coordinates, format=MemoryReader)

The protein coordinates are extracted into coordinates and then the in-memory trajectory is loaded from these coordinates. In principle, this could have all be done in one line:

u2 = mda.Merge(protein).load_new(
         AnalysisFromFunction(lambda ag: ag.positions.copy(),
                              protein).run().results['timeseries'],
         format=MemoryReader)

The new Universe u2 can be used to, for instance, write out a new trajectory or perform fast analysis on the sub-system.

6.25.2. Classes

class MDAnalysis.coordinates.memory.Timestep(n_atoms, **kwargs)[source]

Timestep data for one frame

Methods

ts = Timestep(n_atoms)

create a timestep object with space for n_atoms

Changed in version 0.11.0: Added from_timestep() and from_coordinates() constructor methods. Timestep init now only accepts integer creation. n_atoms now a read only property. frame now 0-based instead of 1-based. Attributes status and step removed.

Changed in version 2.0.0: Timestep now can be (un)pickled. Weakref for Reader will be dropped. Timestep now stores in to numpy array memory in ‘C’ order rather than ‘F’ (Fortran).

Create a Timestep, representing a frame of a trajectory

Parameters
  • n_atoms (int) – The total number of atoms this Timestep describes

  • positions (bool, optional) – Whether this Timestep has position information [True]

  • velocities (bool (optional)) – Whether this Timestep has velocity information [False]

  • forces (bool (optional)) – Whether this Timestep has force information [False]

  • reader (Reader (optional)) – A weak reference to the owning Reader. Used for when attributes require trajectory manipulation (e.g. dt)

  • dt (float (optional)) – The time difference between frames (ps). If time is set, then dt will be ignored.

  • time_offset (float (optional)) – The starting time from which to calculate time (in ps)

Changed in version 0.11.0: Added keywords for positions, velocities and forces. Can add and remove position/velocity/force information by using the has_* attribute.

copy()[source]

Make an independent (“deep”) copy of the whole Timestep.

copy_slice(sel)[source]

Make a new Timestep containing a subset of the original Timestep.

Parameters

sel (array_like or slice) – The underlying position, velocity, and force arrays are sliced using a list, slice, or any array-like.

Returns

A Timestep object of the same type containing all header information and all atom information relevant to the selection.

Return type

Timestep

Note

The selection must be a 0 based slice or array of the atom indices in this Timestep

Example

Using a Python slice object:

new_ts = ts.copy_slice(slice(start, stop, step))

Using a list of indices:

new_ts = ts.copy_slice([0, 2, 10, 20, 23])

New in version 0.8.

Changed in version 0.11.0: Reworked to follow new Timestep API. Now will strictly only copy official attributes of the Timestep.

property dimensions

View of unitcell dimensions (A, B, C, alpha, beta, gamma)

lengths a, b, c are in the MDAnalysis length unit (Å), and angles are in degrees.

property dt

The time difference in ps between timesteps

Note

This defaults to 1.0 ps in the absence of time data

New in version 0.11.0.

property forces

A record of the forces of all atoms in this Timestep

Setting this attribute will add forces to the Timestep if they weren’t originally present.

Returns

forces – force data of shape (n_atoms, 3) for all atoms

Return type

numpy.ndarray with dtype numpy.float32

Raises

MDAnalysis.exceptions.NoDataError – if the Timestep has no force data

New in version 0.11.0.

classmethod from_coordinates(positions=None, velocities=None, forces=None, **kwargs)[source]

Create an instance of this Timestep, from coordinate data

Can pass position, velocity and force data to form a Timestep.

New in version 0.11.0.

classmethod from_timestep(other, **kwargs)[source]

Create a copy of another Timestep, in the format of this Timestep

New in version 0.11.0.

property has_forces

A boolean of whether this Timestep has force data

This can be changed to True or False to allocate space for or remove the data.

New in version 0.11.0.

property has_positions

A boolean of whether this Timestep has position data

This can be changed to True or False to allocate space for or remove the data.

New in version 0.11.0.

property has_velocities

A boolean of whether this Timestep has velocity data

This can be changed to True or False to allocate space for or remove the data.

New in version 0.11.0.

property n_atoms

A read only view of the number of atoms this Timestep has

Changed in version 0.11.0: Changed to read only property

property positions

A record of the positions of all atoms in this Timestep

Setting this attribute will add positions to the Timestep if they weren’t originally present.

Returns

positions – position data of shape (n_atoms, 3) for all atoms

Return type

numpy.ndarray with dtype numpy.float32

Raises

MDAnalysis.exceptions.NoDataError – if the Timestep has no position data

Changed in version 0.11.0: Now can raise NoDataError when no position data present

property time

The time in ps of this timestep

This is calculated as:

time = ts.data['time_offset'] + ts.time

Or, if the trajectory doesn’t provide time information:

time = ts.data['time_offset'] + ts.frame * ts.dt

New in version 0.11.0.

property triclinic_dimensions

The unitcell dimensions represented as triclinic vectors

Returns

A (3, 3) numpy.ndarray of unit cell vectors

Return type

numpy.ndarray

Examples

The unitcell for a given system can be queried as either three vectors lengths followed by their respective angle, or as three triclinic vectors.

>>> ts.dimensions
array([ 13.,  14.,  15.,  90.,  90.,  90.], dtype=float32)
>>> ts.triclinic_dimensions
array([[ 13.,   0.,   0.],
       [  0.,  14.,   0.],
       [  0.,   0.,  15.]], dtype=float32)

Setting the attribute also works:

>>> ts.triclinic_dimensions = [[15, 0, 0], [5, 15, 0], [5, 5, 15]]
>>> ts.dimensions
array([ 15.        ,  15.81138802,  16.58312416,  67.58049774,
        72.45159912,  71.56504822], dtype=float32)

New in version 0.11.0.

property velocities

A record of the velocities of all atoms in this Timestep

Setting this attribute will add velocities to the Timestep if they weren’t originally present.

Returns

velocities – velocity data of shape (n_atoms, 3) for all atoms

Return type

numpy.ndarray with dtype numpy.float32

Raises

MDAnalysis.exceptions.NoDataError – if the Timestep has no velocity data

New in version 0.11.0.

property volume

volume of the unitcell

class MDAnalysis.coordinates.memory.MemoryReader(coordinate_array, order='fac', dimensions=None, dt=1, filename=None, velocities=None, forces=None, **kwargs)[source]

MemoryReader works with trajectories represented as numpy arrays.

A trajectory reader interface to a numpy array of the coordinates. For compatibility with the timeseries interface, support is provided for specifying the order of columns through the order keyword.

New in version 0.16.0.

Changed in version 1.0.0: Support for the deprecated format keyword for MemoryReader.timeseries() has now been removed.

Parameters
  • coordinate_array (numpy.ndarray) – The underlying array of coordinates. The MemoryReader now necessarily requires a np.ndarray

  • order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).

  • dimensions ([A, B, C, alpha, beta, gamma] (optional)) – unitcell dimensions (A, B, C, alpha, beta, gamma) lengths A, B, C are in the MDAnalysis length unit (Å), and angles are in degrees. An array of dimensions can be given, which must then be shape (nframes, 6)

  • dt (float (optional)) – The time difference between frames (ps). If time is set, then dt will be ignored.

  • filename (string (optional)) – The name of the file from which this instance is created. Set to None when created from an array

  • velocities (numpy.ndarray (optional)) – Atom velocities. Must match shape of coordinate_array. Will share order with coordinates.

  • forces (numpy.ndarray (optional)) – Atom forces. Must match shape of coordinate_array Will share order with coordinates

Raises

TypeError if the coordinate array passed is not a np.ndarray

Note

At the moment, only a fixed dimension is supported, i.e., the same unit cell for all frames in coordinate_array. See issue #1041.

Changed in version 0.19.0: The input to the MemoryReader now must be a np.ndarray Added optional velocities and forces

OtherWriter(filename, **kwargs)

Returns a writer appropriate for filename.

Sets the default keywords start, step and dt (if available). n_atoms is always set from Reader.n_atoms.

See also

Reader.Writer()

Writer(filename, **kwargs)

A trajectory writer with the same properties as this trajectory.

add_auxiliary(auxname, auxdata, format=None, **kwargs)

Add auxiliary data to be read alongside trajectory.

Auxiliary data may be any data timeseries from the trajectory additional to that read in by the trajectory reader. auxdata can be an AuxReader instance, or the data itself as e.g. a filename; in the latter case an appropriate AuxReader is guessed from the data/file format. An appropriate format may also be directly provided as a key word argument.

On adding, the AuxReader is initially matched to the current timestep of the trajectory, and will be updated when the trajectory timestep changes (through a call to next() or jumping timesteps with trajectory[i]).

The representative value(s) of the auxiliary data for each timestep (as calculated by the AuxReader) are stored in the current timestep in the ts.aux namespace under auxname; e.g. to add additional pull force data stored in pull-force.xvg:

u = MDAnalysis.Universe(PDB, XTC)
u.trajectory.add_auxiliary('pull', 'pull-force.xvg')

The representative value for the current timestep may then be accessed as u.trajectory.ts.aux.pull or u.trajectory.ts.aux['pull'].

Note

Auxiliary data is assumed to be time-ordered, with no duplicates. See the Auxiliary API.

add_transformations(*transformations)[source]

Add all transformations to be applied to the trajectory.

This function take as list of transformations as an argument. These transformations are functions that will be called by the Reader and given a Timestep object as argument, which will be transformed and returned to the Reader. The transformations can be part of the transformations module, or created by the user, and are stored as a list transformations. This list can only be modified once, and further calls of this function will raise an exception.

u = MDAnalysis.Universe(topology, coordinates)
workflow = [some_transform, another_transform, this_transform]
u.trajectory.add_transformations(*workflow)
Parameters

transform_list (list) – list of all the transformations that will be applied to the coordinates

property aux_list

Lists the names of added auxiliary data.

check_slice_indices(start, stop, step)

Check frame indices are valid and clip to fit trajectory.

The usage follows standard Python conventions for range() but see the warning below.

Parameters
  • start (int or None) – Starting frame index (inclusive). None corresponds to the default of 0, i.e., the initial frame.

  • stop (int or None) – Last frame index (exclusive). None corresponds to the default of n_frames, i.e., it includes the last frame of the trajectory.

  • step (int or None) – step size of the slice, None corresponds to the default of 1, i.e, include every frame in the range start, stop.

Returns

start, stop, step – Integers representing the slice

Return type

tuple (int, int, int)

Warning

The returned values start, stop and step give the expected result when passed in range() but gives unexpected behavior when passed in a slice when stop=None and step=-1

This can be a problem for downstream processing of the output from this method. For example, slicing of trajectories is implemented by passing the values returned by check_slice_indices() to range()

range(start, stop, step)

and using them as the indices to randomly seek to. On the other hand, in MDAnalysis.analysis.base.AnalysisBase the values returned by check_slice_indices() are used to splice the trajectory by creating a slice instance

slice(start, stop, step)

This creates a discrepancy because these two lines are not equivalent:

range(10, -1, -1)             # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
range(10)[slice(10, -1, -1)]  # []
close()

Close the trajectory file.

convert_forces_from_native(force, inplace=True)

Conversion of forces array force from native to base units

Parameters
  • force (array_like) – Forces to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_forces_to_native(force, inplace=True)

Conversion of force array force from base to native units.

Parameters
  • force (array_like) – Forces to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.7.

convert_pos_from_native(x, inplace=True)

Conversion of coordinate array x from native units to base units.

Parameters
  • x (array_like) – Positions to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_pos_to_native(x, inplace=True)

Conversion of coordinate array x from base units to native units.

Parameters
  • x (array_like) – Positions to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_from_native(t, inplace=True)

Convert time t from native units to base units.

Parameters
  • t (array_like) – Time values to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_time_to_native(t, inplace=True)

Convert time t from base units to native units.

Parameters
  • t (array_like) – Time values to transform

  • inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)

Changed in version 0.7.5: Keyword inplace can be set to False so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.

convert_velocities_from_native(v, inplace=True)

Conversion of velocities array v from native to base units

Parameters
  • v (array_like) – Velocities to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

convert_velocities_to_native(v, inplace=True)

Conversion of coordinate array v from base to native units

Parameters
  • v (array_like) – Velocities to transform

  • inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data

Note

By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.

New in version 0.7.5.

copy()[source]

Return a copy of this Memory Reader

property dt

Time between two trajectory frames in picoseconds.

property frame

Frame number of the current time step.

This is a simple short cut to Timestep.frame.

get_array()[source]

Return underlying array.

get_aux_attribute(auxname, attrname)

Get the value of attrname from the auxiliary auxname

Parameters
  • auxname (str) – Name of the auxiliary to get value for

  • attrname (str) – Name of gettable attribute in the auxiliary reader

get_aux_descriptions(auxnames=None)

Get descriptions to allow reloading the specified auxiliaries.

If no auxnames are provided, defaults to the full list of added auxiliaries.

Passing the resultant description to add_auxiliary() will allow recreation of the auxiliary. e.g., to duplicate all auxiliaries into a second trajectory:

descriptions = trajectory_1.get_aux_descriptions()
for aux in descriptions:
    trajectory_2.add_auxiliary(**aux)
Returns

List of dictionaries of the args/kwargs describing each auxiliary.

Return type

list

iter_as_aux(auxname)

Iterate through timesteps for which there is at least one assigned step from the auxiliary auxname within the cutoff specified in auxname.

iter_auxiliary(auxname, start=None, stop=None, step=None, selected=None)

Iterate through the auxiliary auxname independently of the trajectory.

Will iterate over the specified steps of the auxiliary (defaults to all steps). Allows to access all values in an auxiliary, including those out of the time range of the trajectory, without having to also iterate through the trajectory.

After interation, the auxiliary will be repositioned at the current step.

Parameters
  • auxname (str) – Name of the auxiliary to iterate over.

  • (start (optional) – Options for iterating over a slice of the auxiliary.

  • stop (optional) – Options for iterating over a slice of the auxiliary.

  • step) (optional) – Options for iterating over a slice of the auxiliary.

  • selected (lst | ndarray, optional) – List of steps to iterate over.

Yields

AuxStep object

See also

iter_as_aux()

next()

Forward one step to next frame.

next_as_aux(auxname)

Move to the next timestep for which there is at least one step from the auxiliary auxname within the cutoff specified in auxname.

This allows progression through the trajectory without encountering NaN representative values (unless these are specifically part of the auxiliary data).

If the auxiliary cutoff is not set, where auxiliary steps are less frequent (auxiliary.dt > trajectory.dt), this allows progression at the auxiliary pace (rounded to nearest timestep); while if the auxiliary steps are more frequent, this will work the same as calling next().

See the Auxiliary API.

See also

iter_as_aux()

static parse_n_atoms(filename, order='fac', **kwargs)[source]

Deduce number of atoms in a given array of coordinates

Parameters
  • filename (numpy.ndarray) – data which will be used later in MemoryReader

  • order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).

Returns

n_atoms – number of atoms in system

Return type

int

remove_auxiliary(auxname)

Clear data and close the AuxReader for the auxiliary auxname.

See also

add_auxiliary()

rename_aux(auxname, new)

Change the name of the auxiliary auxname to new.

Provided there is not already an auxiliary named new, the auxiliary name will be changed in ts.aux namespace, the trajectory’s list of added auxiliaries, and in the auxiliary reader itself.

Parameters
  • auxname (str) – Name of the auxiliary to rename

  • new (str) – New name to try set

Raises

ValueError – If the name new is already in use by an existing auxiliary.

rewind()

Position at beginning of trajectory

set_array(coordinate_array, order='fac')[source]

Set underlying array in desired column order.

Parameters
  • coordinate_array (ndarray object) – The underlying array of coordinates

  • order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).

set_aux_attribute(auxname, attrname, new)

Set the value of attrname in the auxiliary auxname.

Parameters
  • auxname (str) – Name of the auxiliary to alter

  • attrname (str) – Name of settable attribute in the auxiliary reader

  • new – New value to try set attrname to

property time

Time of the current frame in MDAnalysis time units (typically ps).

This is either read straight from the Timestep, or calculated as time = Timestep.frame * Timestep.dt

timeseries(asel=None, start=0, stop=- 1, step=1, order='afc')[source]

Return a subset of coordinate data for an AtomGroup in desired column order. If no selection is given, it will return a view of the underlying array, while a copy is returned otherwise.

Parameters
  • asel (AtomGroup (optional)) – Atom selection. Defaults to None, in which case the full set of coordinate data is returned. Note that in this case, a view of the underlying numpy array is returned, while a copy of the data is returned whenever asel is different from None.

  • start (int (optional)) –

  • stop (int (optional)) –

  • step (int (optional)) – range of trajectory to access, start and stop are inclusive

  • order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).

Changed in version 1.0.0: Deprecated format keyword has been removed. Use order instead.

property totaltime

Total length of the trajectory

The time is calculated as (n_frames - 1) * dt, i.e., we assume that the first frame no time as elapsed. Thus, a trajectory with two frames will be considered to have a length of a single time step dt and a “trajectory” with a single frame will be reported as length 0.

property transformations

Returns the list of transformations