6.9. H5MD trajectories — MDAnalysis.coordinates.H5MD¶

The H5MD trajectory file format is based upon the general, high performance HDF5 file format. HDF5 files are self documenting and can be accessed with the h5py library. HDF5 can make use of parallel file system features through the MPI-IO interface of the HDF5 library to improve parallel reads and writes.

The HDF5 library and h5py must be installed; otherwise, H5MD files cannot be read by MDAnalysis. If h5py is not installed, a RuntimeError is raised.

6.9.1. Units¶

H5MD files are very flexible and can store data in a wide range of physical units. The H5MDReader will attempt to match the units in order to convert all data to the standard MDAnalysis units (see MDAnalysis.units).

Units are read from the attributes of the position, velocity, force, and time datasets provided by the H5MD file. The unit string is translated from H5MD notation to MDAnalysis notation. If MDAnalysis does not recognize the unit (likely because that unit string is not defined in MDAnalysis.units) provided, a RuntimeError is raised. If no units are provided, MDAnalysis stores a value of None for each unit. If the H5MD file does not contain units and convert_units=True, MDAnalysis will raise a :excValueError. To load a universe from an H5MD file with no units, set convert_units=False.

To load an H5MD simulation from an H5MD trajectory data file (using the H5MDReader), pass the topology and trajectory files to Universe:

import MDAnalysis as mda
u = mda.Universe("topology.tpr", "trajectory.h5md")


It is also possible to pass an open h5py.File file stream into the reader:

import MDAnalysis as mda
with h5py.File("trajectory.h5md", 'r') as f:
u = mda.Universe("topology.tpr", f)


Note

Directly using a h5py.File does not work yet. See issue #2884.

6.9.3. Example: Opening an H5MD file in parallel¶

The parallel features of HDF5 can be accessed through h5py (see parallel h5py docs for more detail) by using the mpi4py Python package with a Parallel build of HDF5. To load a an H5MD simulation with parallel HDF5, pass driver and comm arguments to Universe:

import MDAnalysis as mda
from mpi4py import MPI
u = mda.Universe("topology.tpr", "trajectory.h5md",
driver="mpio", comm=MPI.COMM_WORLD)


Note

h5py must be built with parallel features enabled on top of a parallel HDF5 build, and HDF5 and mpi4py must be built with a working MPI implementation. See instructions below.

6.9.3.1. Building parallel h5py and HDF5 on Linux¶

Building a working parallel HDF5/h5py/mpi4py environment can be challenging and is often specific to your local computing resources, e.g., the supercomputer that you’re running on typically already has its preferred MPI installation. As a starting point we provide instructions that worked in a specific, fairly generic environment.

These instructions successfully built parallel HDF5/h5py with OpenMPI 4.0.4, HDF5 1.10.6, h5py 2.9.0, and mpi4py 3.0.3 on Ubuntu 16.0.6. You may have to play around with different combinations of versions of h5py/HDF5 to get a working parallel build.

1. Build MPI from sources

2. Build HDF5 from sources with parallel settings enabled:

./configure --enable-parallel --enable-shared
make
make install

3. Install mpi4py, making sure to point mpicc to where you’ve installed your MPI implemenation:

env MPICC=/path/to/mpicc pip install mpi4py

4. Build h5py from sources, making sure to enable mpi and to point to your parallel build of HDF5:

export HDF5_PATH=path-to-parallel-hdf5
python setup.py clean --all
python setup.py configure -r --hdf5-version=X.Y.Z --mpi --hdf5=$HDF5_PATH export gcc=gcc CC=mpicc HDF5_DIR=$HDF5_PATH python setup.py build
python setup.py install


If you have questions or want to share how you managed to build parallel hdf5/h5py/mpi4py please let everyone know on the MDAnalysis forums.

6.9.4. Classes¶

class MDAnalysis.coordinates.H5MD.Timestep(n_atoms, **kwargs)[source]

H5MD Timestep

Create a Timestep, representing a frame of a trajectory

Parameters: n_atoms (int) – The total number of atoms this Timestep describes positions (bool, optional) – Whether this Timestep has position information [True] velocities (bool (optional)) – Whether this Timestep has velocity information [False] forces (bool (optional)) – Whether this Timestep has force information [False] reader (Reader (optional)) – A weak reference to the owning Reader. Used for when attributes require trajectory manipulation (e.g. dt) dt (float (optional)) – The time difference between frames (ps). If time is set, then dt will be ignored. time_offset (float (optional)) – The starting time from which to calculate time (in ps)

Changed in version 0.11.0: Added keywords for positions, velocities and forces. Can add and remove position/velocity/force information by using the has_* attribute.

positions

coordinates of the atoms as a numpy.ndarray of shape (n_atoms, 3)

velocities

velocities of the atoms as a numpy.ndarray of shape (n_atoms, 3); only available if the trajectory contains velocities or if the velocities = True keyword has been supplied.

forces

forces of the atoms as a numpy.ndarray of shape (n_atoms, 3); only available if the trajectory contains forces or if the forces = True keyword has been supplied.

dimensions

unitcell dimensions (A, B, C, alpha, beta, gamma)

lengths A, B, C are in the MDAnalysis length unit (Å), and angles are in degrees.

Setting dimensions will populate the underlying native format description (triclinic box vectors). If edges is a matrix, the box is of triclinic shape with the edge vectors given by the rows of the matrix.

class MDAnalysis.coordinates.H5MD.H5MDReader(filename, convert_units=True, driver=None, comm=None, **kwargs)[source]

See h5md documentation for a detailed overview of the H5MD file format.

The reader attempts to convert units in the trajectory file to the standard MDAnalysis units (MDAnalysis.units) if convert_units is set to True.

Additional data in the observables group of the H5MD file are loaded into the Timestep.data dictionary.

Only 3D-periodic boxes or no periodicity are supported; for no periodicity, Timestep.dimensions will return None.

Although H5MD can store varying numbers of particles per time step as produced by, e.g., GCMC simulations, MDAnalysis can currently only process a fixed number of particles per step. If the number of particles changes a ValueError is raised.

The H5MDReader reads .h5md files with the following HDF5 hierarchy:

Notation:
(name) is an HDF5 group that the reader recognizes
{name} is an HDF5 group with arbitrary name
[variable] is an HDF5 dataset
<dtype> is dataset datatype
+-- is an attribute of a group or dataset

H5MD root
\-- (h5md)
+-- version <int>
\-- author
+-- name <str>, author's name
+-- email <str>, optional email address
\-- creator
+-- name <str>, file that created .h5md file
+-- version
\-- (particles)
\-- {group1}
\-- (box)
+-- dimension : <int>, number of spatial dimensions
+-- boundary : <str>, boundary conditions of unit cell
\-- (edges)
\-- [step] <int>, gives frame
\-- [value] <float>, gives box dimensions
+-- unit <str>
\-- (position)
\-- [step] <int>, gives frame
\-- [time] <float>, gives time
+-- unit <str>
\-- [value] <float>, gives numpy arrary of positions
with shape (n_atoms, 3)
+-- unit <str>
\-- (velocity)
\-- [step] <int>, gives frame
\-- [time] <float>, gives time
+-- unit <str>
\-- [value] <float>, gives numpy arrary of velocities
with shape (n_atoms, 3)
+-- unit <str>
\-- (force)
\-- [step] <int>, gives frame
\-- [time] <float>, gives time
+-- unit <str>
\-- [value] <float>, gives numpy arrary of forces
with shape (n_atoms, 3)
+-- unit <str>
\-- (observables)
\-- (lambda)
\-- [step] <int>, gives frame
\-- [time] <float>, gives time
\-- [value] <float>
\-- (step)
\-- [step] <int>, gives frame
\-- [time] <float>, gives time
\-- [value] <int>, gives integration step


Note

Note

If the driver and comm arguments were used to open the hdf5 file (specifically, driver="mpio") then the _reopen() method does not close and open the file like most readers because the information about the MPI communicator would be lost; instead it rewinds the trajectory back to the first timestep.

New in version 2.0.0.

Parameters: filename (str or h5py.File) – trajectory filename or open h5py file convert_units (bool (optional)) – convert units to MDAnalysis units driver (str (optional)) – H5PY file driver used to open H5MD file comm (MPI.Comm (optional)) – MPI communicator used to open H5MD file Must be passed with ‘mpio’ file driver **kwargs (dict) – General reader arguments. RuntimeError – when H5PY is not installed RuntimeError – when a unit is not recognized by MDAnalysis ValueError – when n_atoms changes values between timesteps ValueError – when convert_units=True but the H5MD file contains no units ValueError – when dimension of unitcell is not 3 ValueError – when an MPI communicator object is passed to the reader but driver != 'mpio' NoDataError – when the H5MD file has no ‘position’, ‘velocity’, or ‘force’ group
_reopen()[source]

reopen trajectory

Note

If the driver and comm arguments were used to open the hdf5 file (specifically, driver="mpio") then this method does not close and open the file like most readers because the information about the MPI communicator would be lost; instead it rewinds the trajectory back to the first timstep.

close()[source]

has_forces

True if ‘force’ group is in trajectory.

has_positions

True if ‘position’ group is in trajectory.

has_velocities

True if ‘velocity’ group is in trajectory.

n_frames

number of frames in trajectory

open_trajectory()[source]

opens the trajectory file using h5py library

class MDAnalysis.coordinates.H5MD.H5PYPicklable(name, mode=None, driver=None, libver=None, userblock_size=None, swmr=False, rdcc_nslots=None, rdcc_nbytes=None, rdcc_w0=None, track_order=None, **kwds)[source]

H5PY file object (read-only) that can be pickled.

This class provides a file-like object (as returned by h5py.File) that, unlike standard Python file objects, can be pickled. Only read mode is supported.

When the file is pickled, filename, mode, driver, and comm of h5py.File in the file are saved. On unpickling, the file is opened by filename, mode, driver. This means that for a successful unpickle, the original file still has to be accessible with its filename.

Parameters: filename (str or file-like) – a filename given a text or byte string. driver (str (optional)) – H5PY file driver used to open H5MD file

Example

f = H5PYPicklable('filename', 'r')
print(f['particles/trajectory/position/value'][0])
f.close()


can also be used as context manager:

with H5PYPicklable('filename', 'r'):
print(f['particles/trajectory/position/value'][0])


Note

Pickling of an h5py.File opened with driver=”mpio” and an MPI communicator is currently not supported

New in version 2.0.0.

Create a new file object.

See the h5py user guide for a detailed explanation of the options.

name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
mode
r Readonly, file must exist r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise (default)
driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘stdio’, ‘mpio’.
libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, and ‘latest’. The ‘v108’ and ‘v110’ options can only be specified with the HDF5 1.10.2 library or later.
userblock
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
rdcc_nbytes
Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1 MB) per dataset.
rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.
rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.
track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.