6.26. Reading trajectories from memory — MDAnalysis.coordinates.memory
- Author
Wouter Boomsma
- Year
2016
- Copyright
GNU Public License v2
- Maintainer
Wouter Boomsma <wb@di.ku.dk>, wouterboomsma on github
New in version 0.16.0.
The module contains a trajectory reader that operates on an array in
memory, rather than reading from file. This makes it possible to
operate on raw coordinates using existing MDAnalysis tools. In
addition, it allows the user to make changes to the coordinates in a
trajectory (e.g. through
MDAnalysis.core.groups.AtomGroup.positions
) without having
to write the entire state to file.
6.26.1. How to use the MemoryReader
The MemoryReader
can be used to either directly generate a
trajectory as a numpy array or by transferring an existing trajectory
to memory.
6.26.1.1. In-memory representation of arbitrary trajectories
If sufficient memory is available to hold a whole trajectory in memory then analysis can be sped up substantially by transferring the trajectory to memory.
The most straightforward use of the MemoryReader
is to simply
use the in_memory=True
flag for the
Universe
class, which
automatically transfers a trajectory to memory:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC
universe = mda.Universe(TPR, XTC, in_memory=True)
Of course, sufficient memory has to be available to hold the whole trajectory.
6.26.1.2. Switching a trajectory to an in-memory representation
The decision to transfer the trajectory to memory can be made at any
time with the
transfer_to_memory()
method
of a Universe
:
universe = mda.Universe(TPR, XTC)
universe.transfer_to_memory()
This operation may take a while (with verbose=True a progress bar is displayed) but then subsequent operations on the trajectory directly operate on the in-memory array and will be very fast.
6.26.1.3. Constructing a Reader from a numpy array
The MemoryReader
provides great flexibility because it
becomes possible to create a Universe
directly
from a numpy array.
A simple example consists of a new universe created from the array
extracted from a DCD
timeseries()
:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import DCD, PSF
from MDAnalysis.coordinates.memory import MemoryReader
universe = mda.Universe(PSF, DCD)
coordinates = universe.trajectory.timeseries(universe.atoms)
universe2 = mda.Universe(PSF, coordinates, format=MemoryReader, order='afc')
Creating an in-memory trajectory with
AnalysisFromFunction()
The timeseries()
is
currently only implemented for the
DCDReader
. However, the
MDAnalysis.analysis.base.AnalysisFromFunction()
can provide the
same functionality for any supported trajectory format:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB, XTC
from MDAnalysis.coordinates.memory import MemoryReader
from MDAnalysis.analysis.base import AnalysisFromFunction
u = mda.Universe(PDB, XTC)
coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
u.atoms).run().results['timeseries']
u2 = mda.Universe(PDB, coordinates, format=MemoryReader)
6.26.1.4. Creating an in-memory trajectory of a sub-system
Creating a trajectory for just a selection of an existing trajectory
requires the transfer of the appropriate coordinates as well as
creation of a topology of the sub-system. For the latter one can use
the Merge()
function, for the former
the load_new()
method of a
Universe
together with the
MemoryReader
. In the following, an in-memory trajectory of
only the protein is created:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB, XTC
from MDAnalysis.coordinates.memory import MemoryReader
from MDAnalysis.analysis.base import AnalysisFromFunction
u = mda.Universe(PDB, XTC)
protein = u.select_atoms("protein")
coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
protein).run().results['timeseries']
u2 = mda.Merge(protein) # create the protein-only Universe
u2.load_new(coordinates, format=MemoryReader)
The protein coordinates are extracted into coordinates
and then
the in-memory trajectory is loaded from these coordinates. In
principle, this could have all be done in one line:
u2 = mda.Merge(protein).load_new(
AnalysisFromFunction(lambda ag: ag.positions.copy(),
protein).run().results['timeseries'],
format=MemoryReader)
The new Universe
u2
can be used
to, for instance, write out a new trajectory or perform fast analysis
on the sub-system.
6.26.2. Classes
- class MDAnalysis.coordinates.memory.MemoryReader(coordinate_array, order='fac', dimensions=None, dt=1, filename=None, velocities=None, forces=None, **kwargs)[source]
MemoryReader works with trajectories represented as numpy arrays.
A trajectory reader interface to a numpy array of the coordinates. For compatibility with the timeseries interface, support is provided for specifying the order of columns through the order keyword.
New in version 0.16.0.
Changed in version 1.0.0: Support for the deprecated format keyword for
MemoryReader.timeseries()
has now been removed.- Parameters
coordinate_array (numpy.ndarray) – The underlying array of coordinates. The MemoryReader now necessarily requires a np.ndarray
order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).
dimensions ([A, B, C, alpha, beta, gamma] (optional)) – unitcell dimensions (A, B, C, alpha, beta, gamma) lengths A, B, C are in the MDAnalysis length unit (Å), and angles are in degrees. An array of dimensions can be given, which must then be shape (nframes, 6)
dt (float (optional)) – The time difference between frames (ps). If
time
is set, then dt will be ignored.filename (string (optional)) – The name of the file from which this instance is created. Set to
None
when created from an arrayvelocities (numpy.ndarray (optional)) – Atom velocities. Must match shape of coordinate_array. Will share order with coordinates.
forces (numpy.ndarray (optional)) – Atom forces. Must match shape of coordinate_array Will share order with coordinates
- Raises
TypeError if the coordinate array passed is not a np.ndarray –
Note
At the moment, only a fixed dimension is supported, i.e., the same unit cell for all frames in coordinate_array. See issue #1041.
Changed in version 0.19.0: The input to the MemoryReader now must be a np.ndarray Added optional velocities and forces
Changed in version 2.2.0: Input kwargs are now stored under the
_kwargs
attribute, and are passed on class creation incopy()
.- OtherWriter(filename, **kwargs)
Returns a writer appropriate for filename.
Sets the default keywords start, step and dt (if available). n_atoms is always set from
Reader.n_atoms
.See also
Reader.Writer()
- Writer(filename, **kwargs)
A trajectory writer with the same properties as this trajectory.
- add_auxiliary(aux_spec: Optional[Union[str, Dict[str, str]]] = None, auxdata: Optional[Union[str, AuxReader]] = None, format: Optional[str] = None, **kwargs) None
Add auxiliary data to be read alongside trajectory.
Auxiliary data may be any data timeseries from the trajectory additional to that read in by the trajectory reader. auxdata can be an
AuxReader
instance, or the data itself as e.g. a filename; in the latter case an appropriateAuxReader
is guessed from the data/file format. An appropriate format may also be directly provided as a key word argument.On adding, the AuxReader is initially matched to the current timestep of the trajectory, and will be updated when the trajectory timestep changes (through a call to
next()
or jumping timesteps withtrajectory[i]
).The representative value(s) of the auxiliary data for each timestep (as calculated by the
AuxReader
) are stored in the current timestep in thets.aux
namespace under aux_spec; e.g. to add additional pull force data stored in pull-force.xvg:u = MDAnalysis.Universe(PDB, XTC) u.trajectory.add_auxiliary('pull', 'pull-force.xvg')
The representative value for the current timestep may then be accessed as
u.trajectory.ts.aux.pull
oru.trajectory.ts.aux['pull']
.The following applies to energy readers like the
EDRReader
.All data that is present in the (energy) file can be added by omitting aux_spec like so:
u.trajectory.add_auxiliary(auxdata="ener.edr")
aux_spec is expected to be a dictionary that maps the desired attribute name in the
ts.aux
namespace to the precise data to be added as identified by adata_selector
:term_dict = {"temp": "Temperature", "epot": "Potential"} u.trajectory.add_auxiliary(term_dict, "ener.edr")
Adding this data can be useful, for example, to filter trajectory frames based on non-coordinate data like the potential energy of each time step. Trajectory slicing allows working on a subset of frames:
selected_frames = np.array([ts.frame for ts in u.trajectory if ts.aux.epot < some_threshold]) subset = u.trajectory[selected_frames]
See also
Note
Auxiliary data is assumed to be time-ordered, with no duplicates. See the Auxiliary API.
- add_transformations(*transformations)[source]
Add all transformations to be applied to the trajectory.
This function take as list of transformations as an argument. These transformations are functions that will be called by the Reader and given a
Timestep
object as argument, which will be transformed and returned to the Reader. The transformations can be part of thetransformations
module, or created by the user, and are stored as a list transformations. This list can only be modified once, and further calls of this function will raise an exception.u = MDAnalysis.Universe(topology, coordinates) workflow = [some_transform, another_transform, this_transform] u.trajectory.add_transformations(*workflow)
- Parameters
transform_list (list) – list of all the transformations that will be applied to the coordinates
See also
- property aux_list
Lists the names of added auxiliary data.
- check_slice_indices(start, stop, step)
Check frame indices are valid and clip to fit trajectory.
The usage follows standard Python conventions for
range()
but see the warning below.- Parameters
start (int or None) – Starting frame index (inclusive).
None
corresponds to the default of 0, i.e., the initial frame.stop (int or None) – Last frame index (exclusive).
None
corresponds to the default of n_frames, i.e., it includes the last frame of the trajectory.step (int or None) – step size of the slice,
None
corresponds to the default of 1, i.e, include every frame in the range start, stop.
- Returns
start, stop, step – Integers representing the slice
- Return type
Warning
The returned values start, stop and step give the expected result when passed in
range()
but gives unexpected behavior when passed in aslice
whenstop=None
andstep=-1
This can be a problem for downstream processing of the output from this method. For example, slicing of trajectories is implemented by passing the values returned by
check_slice_indices()
torange()
range(start, stop, step)
and using them as the indices to randomly seek to. On the other hand, in
MDAnalysis.analysis.base.AnalysisBase
the values returned bycheck_slice_indices()
are used to splice the trajectory by creating aslice
instanceslice(start, stop, step)
This creates a discrepancy because these two lines are not equivalent:
range(10, -1, -1) # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0] range(10)[slice(10, -1, -1)] # []
- close()
Close the trajectory file.
- convert_forces_from_native(force, inplace=True)
Conversion of forces array force from native to base units
- Parameters
force (array_like) – Forces to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
- convert_forces_to_native(force, inplace=True)
Conversion of force array force from base to native units.
- Parameters
force (array_like) – Forces to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
- convert_pos_from_native(x, inplace=True)
Conversion of coordinate array x from native units to base units.
- Parameters
x (array_like) – Positions to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_pos_to_native(x, inplace=True)
Conversion of coordinate array x from base units to native units.
- Parameters
x (array_like) – Positions to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_time_from_native(t, inplace=True)
Convert time t from native units to base units.
- Parameters
t (array_like) – Time values to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_time_to_native(t, inplace=True)
Convert time t from base units to native units.
- Parameters
t (array_like) – Time values to transform
inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
- convert_velocities_from_native(v, inplace=True)
Conversion of velocities array v from native to base units
- Parameters
v (array_like) – Velocities to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
- convert_velocities_to_native(v, inplace=True)
Conversion of coordinate array v from base to native units
- Parameters
v (array_like) – Velocities to transform
inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
- property frame: int
Frame number of the current time step.
This is a simple short cut to
Timestep.frame
.
- get_aux_attribute(auxname, attrname)
Get the value of attrname from the auxiliary auxname
- Parameters
See also
- get_aux_descriptions(auxnames=None)
Get descriptions to allow reloading the specified auxiliaries.
If no auxnames are provided, defaults to the full list of added auxiliaries.
Passing the resultant description to
add_auxiliary()
will allow recreation of the auxiliary. e.g., to duplicate all auxiliaries into a second trajectory:descriptions = trajectory_1.get_aux_descriptions() for aux in descriptions: trajectory_2.add_auxiliary(**aux)
- Returns
List of dictionaries of the args/kwargs describing each auxiliary.
- Return type
- iter_as_aux(auxname)
Iterate through timesteps for which there is at least one assigned step from the auxiliary auxname within the cutoff specified in auxname.
See also
- iter_auxiliary(auxname, start=None, stop=None, step=None, selected=None)
Iterate through the auxiliary auxname independently of the trajectory.
Will iterate over the specified steps of the auxiliary (defaults to all steps). Allows to access all values in an auxiliary, including those out of the time range of the trajectory, without having to also iterate through the trajectory.
After interation, the auxiliary will be repositioned at the current step.
- Parameters
auxname (str) – Name of the auxiliary to iterate over.
(start (optional) – Options for iterating over a slice of the auxiliary.
stop (optional) – Options for iterating over a slice of the auxiliary.
step) (optional) – Options for iterating over a slice of the auxiliary.
selected (lst | ndarray, optional) – List of steps to iterate over.
- Yields
AuxStep
object
See also
- next_as_aux(auxname)
Move to the next timestep for which there is at least one step from the auxiliary auxname within the cutoff specified in auxname.
This allows progression through the trajectory without encountering
NaN
representative values (unless these are specifically part of the auxiliary data).If the auxiliary cutoff is not set, where auxiliary steps are less frequent (
auxiliary.dt > trajectory.dt
), this allows progression at the auxiliary pace (rounded to nearest timestep); while if the auxiliary steps are more frequent, this will work the same as callingnext()
.See the Auxiliary API.
See also
- static parse_n_atoms(filename, order='fac', **kwargs)[source]
Deduce number of atoms in a given array of coordinates
- Parameters
filename (numpy.ndarray) – data which will be used later in MemoryReader
order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).
- Returns
n_atoms – number of atoms in system
- Return type
- rename_aux(auxname, new)
Change the name of the auxiliary auxname to new.
Provided there is not already an auxiliary named new, the auxiliary name will be changed in ts.aux namespace, the trajectory’s list of added auxiliaries, and in the auxiliary reader itself.
- Parameters
- Raises
ValueError – If the name new is already in use by an existing auxiliary.
- set_array(coordinate_array, order='fac')[source]
Set underlying array in desired column order.
- Parameters
coordinate_array (
ndarray
object) – The underlying array of coordinatesorder ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).
- set_aux_attribute(auxname, attrname, new)
Set the value of attrname in the auxiliary auxname.
- Parameters
See also
- property time
Time of the current frame in MDAnalysis time units (typically ps).
This is either read straight from the Timestep, or calculated as time =
Timestep.frame
*Timestep.dt
- timeseries(asel=None, start=0, stop=- 1, step=1, order='afc')[source]
Return a subset of coordinate data for an AtomGroup in desired column order. If no selection is given, it will return a view of the underlying array, while a copy is returned otherwise.
- Parameters
asel (AtomGroup (optional)) – Atom selection. Defaults to
None
, in which case the full set of coordinate data is returned. Note that in this case, a view of the underlying numpy array is returned, while a copy of the data is returned whenever asel is different fromNone
.start (int (optional)) – the start trajectory frame
stop (int (optional)) –
the end trajectory frame
Deprecated since version 2.4.0: Note that stop is currently inclusive but will be changed in favour of being exclusive in version 3.0.
step (int (optional)) – the number of trajectory frames to skip
order ({"afc", "acf", "caf", "fac", "fca", "cfa"} (optional)) – the order/shape of the return data array, corresponding to (a)tom, (f)rame, (c)oordinates all six combinations of ‘a’, ‘f’, ‘c’ are allowed ie “fac” - return array where the shape is (frame, number of atoms, coordinates).
Changed in version 1.0.0: Deprecated format keyword has been removed. Use order instead.
Changed in version 2.4.0: ValueError now raised instead of NoDataError for empty input AtomGroup
- property totaltime: float
Total length of the trajectory
The time is calculated as
(n_frames - 1) * dt
, i.e., we assume that the first frame no time as elapsed. Thus, a trajectory with two frames will be considered to have a length of a single time step dt and a “trajectory” with a single frame will be reported as length 0.
- property transformations
Returns the list of transformations
- units = {'length': None, 'time': None, 'velocity': None}
dict with units of of time and length (and velocity, force, … for formats that support it)