6.31. Serialization of Coordinate Readers

To achieve a working implementation of parallelism, this document illustrates the basic idea of how different coordinate readers are being serialized in MDAnalysis, and what developers should do to serialize a new reader.

To make sure every Trajectory reader can be successfully serialized, we implement picklable I/O classes (see Currently implemented picklable IO Formats). When the file is pickled, filename and other necessary attributes of the open file handle are saved. On unpickling, the file is opened by filename. This means that for a successful unpickle, the original file still has to be accessible with its filename. To retain the current frame of the trajectory, _read_frame(previous frame)() will be called during unpickling.

6.31.1. How to serialize a new reader

6.31.1.1. File Access

If the new reader uses util.anyopen() (e.g. MDAnalysis.coordinates.PDB.PDBReader), the reading handler can be pickled without modification. If the new reader uses I/O classes from other package (e.g. MDAnalysis.coordinates.GSD.GSDReader), and cannot be pickled natively, create a new picklable class inherited from the file class in that package (e.g. MDAnalysis.coordinates.GSD.GSDPicklable), adding __getstate__(), __setstate__() functions (or __reduce__() if needed. Consult the pickle documentation of python) to allow file handler serialization.

6.31.1.2. To seek or not to seek

Some I/O classes support seek() and tell() functions to allow the file to be pickled with an offset. It is normally not needed for MDAnalysis with random access. But if error occurs during testing, find a way to make the offset work. Maybe this I/O class supports frame indexing? Maybe the file handler inside this I/O class supports offset?

For example, in MDAnalysis.coordinates.TRZ.TRZReader, _read_frame() is implemented by _seek() ing the file into its previous frame and _read_next_timestep(), so the offset of the file is crucial for such machinery to work.

6.31.1.3. Miscellaneous

If pickle still fails due to some unpicklable attributes, try to find a way to pickle those, or write custom __getstate__() and __setstate__() methods for the reader.

If the new reader is written in Cython, read lib.formats.libmdaxdr and lib.formats.libdcd files as references.

6.31.2. Tests

6.31.2.1. _SingleFrameReader Test

If the new reader is a single-frame reader, the basic test should normally inherited from _SingleFrameReader, where the pickliablity is tested.

6.31.2.2. BaseReaderTest and MultiframeReaderTest

If the test for the new reader uses BaseReaderTest or MultiframeReaderTest, whether the current timestep information is saved (the former), whether its relative position is maintained, i.e. next() reads the right next timestep, and whether its last timestep can be pickled, are already tested.

6.31.2.3. File handler Test

If the new reader accesses the file with util.anyopen(), add necessary tests inside parallelism/test_multiprocessing.py for the reader.

If the new reader accessed the file with a new picklable I/O class, add necessary tests inside utils/test_pickleio.py for the I/O class, parallelism/test_multiprocessing.py for the reader.

6.31.3. Currently implemented picklable IO Formats