6.30. Serialization of Coordinate Readers¶
To achieve a working implementation of parallelism, this document illustrates the basic idea of how different coordinate readers are being serialized in MDAnalysis, and what developers should do to serialize a new reader.
To make sure every Trajectory reader can be successfully
serialized, we implement picklable I/O classes (see Currently implemented picklable IO Formats).
When the file is pickled, filename and other necessary attributes of the open
file handle are saved. On unpickling, the file is opened by filename.
This means that for a successful unpickle, the original file still has to
be accessible with its filename. To retain the current frame of the trajectory,
_read_frame(previous frame)()
will be called during unpickling.
6.30.1. How to serialize a new reader¶
6.30.1.1. File Access¶
If the new reader uses util.anyopen()
(e.g. MDAnalysis.coordinates.PDB.PDBReader
),
the reading handler can be pickled without modification.
If the new reader uses I/O classes from other package
(e.g. MDAnalysis.coordinates.GSD.GSDReader
),
and cannot be pickled natively, create a new picklable class inherited from
the file class in that package
(e.g. MDAnalysis.coordinates.GSD.GSDPicklable
),
adding __getstate__()
,
__setstate__()
functions (or __reduce__()
if needed. Consult the
pickle documentation of python)
to allow file handler serialization.
6.30.1.2. To seek or not to seek¶
Some I/O classes support seek()
and tell()
functions to allow the file
to be pickled with an offset. It is normally not needed for MDAnalysis with
random access. But if error occurs during testing, find a way to make the offset work.
Maybe this I/O class supports frame indexing? Maybe the file handler inside this I/O
class supports offset?
For example, in MDAnalysis.coordinates.TRZ.TRZReader
,
_read_frame()
is implemented by _seek()
ing the file into
its previous frame and _read_next_timestep()
, so the offset of the file is crucial
for such machinery to work.
6.30.1.3. Miscellaneous¶
If pickle still fails due to some unpicklable attributes, try to find a way
to pickle those, or write custom __getstate__()
and __setstate__()
methods for the reader.
If the new reader is written in Cython, read lib.formats.libmdaxdr
and
lib.formats.libdcd
files as references.
6.30.2. Tests¶
6.30.2.1. _SingleFrameReader Test¶
If the new reader is a single-frame reader, the basic test should normally
inherited from _SingleFrameReader
, where the pickliablity is tested.
6.30.2.2. BaseReaderTest and MultiframeReaderTest¶
If the test for the new reader uses BaseReaderTest
or
MultiframeReaderTest
, whether the current timestep information is
saved (the former), whether its relative position is maintained,
i.e. next() reads the right next timestep, and whether its last timestep
can be pickled, are already tested.
6.30.2.3. File handler Test¶
If the new reader accesses the file with util.anyopen()
, add necessary
tests inside parallelism/test_multiprocessing.py
for the reader.
If the new reader accessed the file with a new picklable I/O class,
add necessary tests inside utils/test_pickleio.py
for the I/O class,
parallelism/test_multiprocessing.py
for the reader.
6.30.3. Currently implemented picklable IO Formats¶
MDAnalysis.lib.picklable_file_io.FileIOPicklable
MDAnalysis.lib.picklable_file_io.BufferIOPicklable
MDAnalysis.lib.picklable_file_io.TextIOPicklable
MDAnalysis.lib.picklable_file_io.BZ2Picklable
MDAnalysis.lib.picklable_file_io.GzipPicklable
MDAnalysis.coordinates.GSD.GSDPicklable
MDAnalysis.coordinates.TRJ.NCDFPicklable
MDAnalysis.coordinates.chemfiles.ChemfilesPicklable
MDAnalysis.coordinates.H5MD.H5PYPicklable