4.2.5. Calculating path similarity — MDAnalysis.analysis.psa¶
| Author: | Sean Seyler | 
|---|---|
| Year: | 2015 | 
| Copyright: | GNU Public License v3 | 
New in version 0.10.0.
The module contains code to calculate the geometric similarity of trajectories using path metrics such as the Hausdorff or Fréchet distances [Seyler2015]. The path metrics are functions of two paths and return a nonnegative number, i.e., a distance. Two paths are identical if their distance is zero, and large distances indicate dissimilarity. Each path metric is a function of the individual points (e.g., coordinate snapshots) that comprise each path and, loosely speaking, identify the two points, one per path of a pair of paths, where the paths deviate the most. The distance between these points of maximal deviation is measured by the root mean square deviation (RMSD), i.e., to compute structural similarity.
One typically computes the pairwise similarity for an ensemble of paths to produce a symmetric distance matrix, which can be clustered to, at a glance, identify patterns in the trajectory data. To properly analyze a path ensemble, one must select a suitable reference structure to which all paths (each conformer in each path) will be universally aligned using the rotations determined by the best-fit rmsds. Distances between paths and their structures are then computed directly with no further alignment. This pre-processing step is necessary to preserve the metric properties of the Hausdorff and Fréchet metrics; using the best-fit rmsd on a pairwise basis does not generally preserve the triangle inequality.
Note
The PSAnalysisTutorial outlines a typical application of PSA to a set of trajectories, including doing proper alignment, performing distance comparisons, and generating heat map-dendrogram plots from hierarchical clustering.
References
| [Seyler2015] | (1, 2, 3, 4) Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015) Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568. doi: 10.1371/journal.pcbi.1004568 | 
4.2.5.1. Helper functions and variables¶
The following convenience functions are used by other functions in this module.
- 
MDAnalysis.analysis.psa.sqnorm(v, axis=None)[source]¶
- Compute the sum of squares of elements along specified axes. - Parameters: - v (numpy.ndarray) – coordinates
- axes (None / int / tuple (optional)) – Axes or axes along which a sum is performed. The default
(axes = None) performs a sum over all the dimensions of the input array. The value of axes may be negative, in which case it counts from the last axis to the zeroth axis.
 - Returns: - the sum of the squares of the elements of v along axes - Return type: 
- 
MDAnalysis.analysis.psa.get_msd_matrix(P, Q, axis=None)[source]¶
- Generate the matrix of pairwise mean-squared deviations between paths. - The MSDs between all pairs of points in P and Q are calculated, each pair having a point from P and a point from Q. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). The pairwise MSD matrix has dimensions \(N_p\) by \(N_q\).- Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - msd_matrix – matrix of pairwise MSDs between points in P and points in Q - Return type: - Notes - We calculate the MSD matrix \[M_{ij} = ||p_i - q_j||^2\]- where \(p_i \in P\) and \(q_j \in Q\). 
- 
MDAnalysis.analysis.psa.get_coord_axes(path)[source]¶
- Return the number of atoms and the axes corresponding to atoms and coordinates for a given path. - The path is assumed to be a - numpy.ndarraywhere the 0th axis corresponds to a frame (a snapshot of coordinates). The \(3N\) (Cartesian) coordinates are assumed to be either:- all in the 1st axis, starting with the x,y,z coordinates of the first atom, followed by the x,*y*,*z* coordinates of the 2nd, etc.
- in the 1st and 2nd axis, where the 1st axis indexes the atom number and the 2nd axis contains the x,*y*,*z* coordinates of each atom.
 - Parameters: - path (numpy.ndarray) – representing a path - Returns: - the number of atoms and the axes containing coordinates - Return type: - (int, (int, ..)) 
4.2.5.2. Classes, methods, and functions¶
- 
MDAnalysis.analysis.psa.get_path_metric_func(name)[source]¶
- Selects a path metric function by name. - Parameters: - name (str) – name of path metric - Returns: - path_metric – The path metric function specified by name (if found). - Return type: - function 
- 
MDAnalysis.analysis.psa.hausdorff(P, Q)[source]¶
- Calculate the symmetric Hausdorff distance between two paths. - The metric used is RMSD, as opposed to the more conventional L2 (Euclidean) norm, because this is convenient for i.e., comparing protein configurations. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.- Note that reversing the path does not change the Hausdorff distance. - Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - the Hausdorff distance between paths P and Q - Return type: - Example - Calculate the Hausdorff distance between two halves of a trajectory: - >>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff(P,Q) 4.7786639840135905 >>> hausdorff(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory 4.7786639840135905 - Notes - scipy.spatial.distance.directed_hausdorff()is an optimized implementation of the early break algorithm of [Taha2015]; the latter code is used here to calculate the symmetric Hausdorff distance with an RMSD metric- References - [Taha2015] - A. A. Taha and A. Hanbury. An efficient algorithm for calculating the exact Hausdorff distance. IEEE Transactions On Pattern Analysis And Machine Intelligence, 37:2153-63, 2015. 
- 
MDAnalysis.analysis.psa.hausdorff_wavg(P, Q)[source]¶
- Calculate the weighted average Hausdorff distance between two paths. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form. The nearest neighbor distances for P (to Q) and those of Q (to P) are averaged individually to get the average nearest neighbor distance for P and likewise for Q. These averages are then summed and divided by 2 to get a measure that gives equal weight to P and Q.- Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - the weighted average Hausdorff distance between paths P and Q - Return type: - Example - >>> from MDAnalysis import Universe >>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff_wavg(P,Q) 2.5669644353703447 >>> hausdorff_wavg(P,Q[::-1]) # weighted avg hausdorff dist w/ Q reversed 2.5669644353703447 - Notes - The weighted average Hausdorff distance is not a true metric (it does not obey the triangle inequality); see [Seyler2015] for further details. 
- 
MDAnalysis.analysis.psa.hausdorff_avg(P, Q)[source]¶
- Calculate the average Hausdorff distance between two paths. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form. The nearest neighbor distances for P (to Q) and those of Q (to P) are all averaged together to get a mean nearest neighbor distance. This measure biases the average toward the path that has more snapshots, whereas weighted average Hausdorff gives equal weight to both paths.- Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - the average Hausdorff distance between paths P and Q - Return type: - Example - >>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff_avg(P,Q) 2.5669646575869005 >>> hausdorff_avg(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory 2.5669646575869005 - Notes - The average Hausdorff distance is not a true metric (it does not obey the triangle inequality); see [Seyler2015] for further details. 
- 
MDAnalysis.analysis.psa.hausdorff_neighbors(P, Q)[source]¶
- Find the Hausdorff neighbors of two paths. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.- Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - dictionary of two pairs of numpy arrays, the first pair (key “frames”) containing the indices of (Hausdorff) nearest neighbors for P and Q, respectively, the second (key “distances”) containing (corresponding) nearest neighbor distances for P and Q, respectively - Return type: - Notes - Hausdorff neighbors are those points on the two paths that are separated by the Hausdorff distance. They are the farthest nearest neighbors and are maximally different in the sense of the Hausdorff distance [Seyler2015].
- scipy.spatial.distance.directed_hausdorff()can also provide the hausdorff neighbors.
 
- 
MDAnalysis.analysis.psa.discrete_frechet(P, Q)[source]¶
- Calculate the discrete Fréchet distance between two paths. - P (Q) is a - numpy.ndarrayof \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,- MDAnalysis.core.groups.AtomGroup.positions). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or :\(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.- Parameters: - P (numpy.ndarray) – the points in the first path
- Q (numpy.ndarray) – the points in the second path
 - Returns: - the discrete Fréchet distance between paths P and Q - Return type: - Example - Calculate the discrete Fréchet distance between two halves of a trajectory. - >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = np.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = np.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> discrete_frechet(P,Q) 4.7786639840135905 >>> discrete_frechet(P,Q[::-1]) # frechet distance w/ 2nd trj reversed 2nd 6.8429011177113832 - Note that reversing the direction increased the Fréchet distance: it is sensitive to the direction of the path. - Notes - The discrete Fréchet metric is an approximation to the continuous Fréchet metric [Frechet1906] [Alt1995]. The calculation of the continuous Fréchet distance is implemented with the dynamic programming algorithm of [EiterMannila1994] [EiterMannila1997]. - References - [Frechet1906] - M. Fréchet. Sur quelques points du calcul fonctionnel. Rend. Circ. Mat. Palermo, 22(1):1–72, Dec. 1906. - [Alt1995] - H. Alt and M. Godau. Computing the Fréchet distance between two polygonal curves. Int J Comput Geometry & Applications, 5(01n02):75–91, 1995. doi: 10.1142/S0218195995000064 - [EiterMannila1994] - T. Eiter and H. Mannila. Computing discrete Fréchet distance. Technical Report CD-TR 94/64, Christian Doppler Laboratory for Expert Systems, Technische Universität Wien, Wien, 1994. - [EiterMannila1997] - T. Eiter and H. Mannila. Distance measures for point sets and their computation. Acta Informatica, 34:109–133, 1997. doi: 10.1007/s002360050075. 
- 
MDAnalysis.analysis.psa.dist_mat_to_vec(N, i, j)[source]¶
- Convert distance matrix indices (in the upper triangle) to the index of the corresponding distance vector. - This is a convenience function to locate distance matrix elements (and the pair generating it) in the corresponding distance vector. The row index j should be greater than i+1, corresponding to the upper triangle of the distance matrix. - Parameters: - Returns: - index (of the matrix element) in the corresponding distance vector - Return type: 
- 
class MDAnalysis.analysis.psa.Path(universe, reference, select='name CA', path_select='all', ref_frame=0)[source]¶
- Represent a path based on a - Universe.- Pre-process a - Universeobject: (1) fit the trajectory to a reference structure, (2) convert fitted time series to a- numpy.ndarrayrepresentation of- Path.path.- The analysis is performed with - PSAnalysis.run()and stores the result in the- numpy.ndarraydistance matrix- PSAnalysis.D.- PSAnalysis.run()also generates a fitted trajectory and path from alignment of the original trajectories to a reference structure.- New in version 0.9.1. - Setting up trajectory alignment and fitted path generation. - Parameters: - universe (Universe) – MDAnalysis.Universeobject containing a trajectory
- reference (Universe) – reference structure (uses ref_frame from the trajectory)
- select (str or dict or tuple (optional)) – The selection to operate on for rms fitting; can be one of: - any valid selection string for
select_atoms()that produces identical selections in mobile and reference; or
- a dictionary {'mobile':sel1, 'reference':sel2}(theMDAnalysis.analysis.align.fasta2select()function returns such a dictionary based on a ClustalW or STAMP sequence alignment); or
- a tuple (sel1, sel2)
 When using 2. or 3. with sel1 and sel2 then these selections can also each be a list of selection strings (to generate an AtomGroup with defined atom order as described under Ordered selections). 
- any valid selection string for
- ref_frame (int) – frame index to select the coordinate frame from select.trajectory
- path_select (selection_string) – atom selection composing coordinates of (fitted) path; if Nonethen path_select is set to select [None]
 - 
u_original¶
- MDAnalysis.Universeobject with a trajectory
 - 
u_reference¶
- MDAnalysis.Universeobject containing a reference structure
 - 
select¶
- string, selection for - select_atoms()to select frame from- Path.u_reference
 - 
path_select¶
- string, selection for - select_atoms()to select atoms to compose- Path.path
 - 
ref_frame¶
- int, frame index to select frame from - Path.u_reference
 - 
u_fitted¶
- MDAnalysis.Universeobject with the fitted trajectory
 - 
path¶
- numpy.ndarrayobject representation of the fitted trajectory
 - 
fit_to_reference(filename=None, prefix='', postfix='_fit', rmsdfile=None, targetdir='.', weights=None, tol_mass=0.1)[source]¶
- Align each trajectory frame to the reference structure - Parameters: - filename (str (optional)) – file name for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from Path.u_original) with prefix prepended
- prefix (str (optional)) – prefix for auto-generating the new output filename
- rmsdfile (str (optional)) – file name for writing the RMSD time series [None]
- weights ({“mass”, None} or array_like (optional)) – choose weights. With"mass"uses masses as weights; withNoneweigh each atom equally. If a float array of the same length as the selected AtomGroup is provided, use each element of the array_like as a weight for the corresponding atom in the AtomGroup.
- tol_mass (float (optional)) – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
 - Returns: - MDAnalysis.Universeobject containing a fitted trajectory- Return type: - Notes - Uses - MDAnalysis.analysis.align.AlignTrajfor the fitting.- Deprecated since version 0.16.1: Instead of - mass_weighted=Trueuse new- weights='mass'; refactored to fit with AnalysisBase API- Changed in version 0.17.0: Deprecated keyword mass_weighted was removed. 
- filename (str (optional)) – file name for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from 
 - 
get_num_atoms()[source]¶
- Return the number of atoms used to construct the - Path.- Must run - Path.to_path()prior to calling this method.- Returns: - the number of atoms in the - Path- Return type: - int 
 - 
run(align=False, filename=None, postfix='_fit', rmsdfile=None, targetdir='.', weights=None, tol_mass=0.1, flat=False)[source]¶
- Generate a path from a trajectory and reference structure. - As part of the path generation, the trajectory can be superimposed (“aligned”) to a reference structure if specified. - This is a convenience method to generate a fitted trajectory from an inputted universe ( - Path.u_original) and reference structure (- Path.u_reference).- Path.fit_to_reference()and- Path.to_path()are used consecutively to generate a new universe (- Path.u_fitted) containing the fitted trajectory along with the corresponding- Path.pathrepresented as an- numpy.ndarray. The method returns a tuple of the topology name and new trajectory name, which can be fed directly into an- MDAnalysis.Universeobject after unpacking the tuple using the- *operator, as in- MDAnalysis.Universe(*(top_name, newtraj_name)).- Parameters: - align (bool (optional)) – Align trajectory to atom selection Path.selectofPath.u_reference. IfTrue, a universe containing an aligned trajectory is produced withPath.fit_to_reference()[False]
- filename (str (optional)) – filename for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from Path.u_original) with prefix prepended
- postfix (str (optional)) – prefix for auto-generating the new output filename
- rmsdfile (str (optional)) – file name for writing the RMSD time series [None]
- weights ({“mass”, None} or array_like (optional)) – choose weights. With"mass"uses masses as weights; withNoneweigh each atom equally. If a float array of the same length as the selected AtomGroup is provided, use each element of the array_like as a weight for the corresponding atom in the AtomGroup.
- tol_mass (float (optional)) – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
- flat (bool (optional)) – represent Path.pathwith 2D (\(N_p\times 3N\))numpy.ndarray; ifFalsethenPath.pathis a 3D (\(N_p\times N\times 3\))numpy.ndarray[False]
 - Returns: - topology_trajectory – A tuple of the topology name and new trajectory name. - Return type: - Deprecated since version 0.16.1: Instead of - mass_weighted=Trueuse new- weights='mass'; refactored to fit with AnalysisBase API- Changed in version 0.17.0: Deprecated keyword mass_weighted was removed. 
- align (bool (optional)) – Align trajectory to atom selection 
 - 
to_path(fitted=False, select=None, flat=False)[source]¶
- Generates a coordinate time series from the fitted universe trajectory. - Given a selection of N atoms from select, the atomic positions for each frame in the fitted universe ( - Path.u_fitted) trajectory (with \(N_p\) total frames) are appended sequentially to form a 3D or 2D (if flat is- True)- numpy.ndarrayrepresentation of the fitted trajectory (with dimensions \(N_p\times N\times 3\) or \(N_p\times 3N\), respectively).- Parameters: - fitted (bool (optional)) – construct a Path.pathfrom thePath.u_fittedtrajectory; ifFalsethenPath.pathis generated with the trajectory fromPath.u_original[False]
- select (str (optional)) – the selection for constructing the coordinates of each frame in
Path.path; ifNonethenPath.path_selectis used, else it is overridden by select [None]
- flat (bool (optional)) – represent Path.pathas a 2D (\(N_p\times 3N\))numpy.ndarray; ifFalsethenPath.pathis a 3D (\(N_p\times N\times 3\))numpy.ndarray[False]
 - Returns: - representing a time series of atomic positions of an - MDAnalysis.core.groups.AtomGroupselection from- Path.u_fitted.trajectory- Return type: 
- fitted (bool (optional)) – construct a 
 
- universe (Universe) – 
- 
class MDAnalysis.analysis.psa.PSAPair(npaths, i, j)[source]¶
- Generate nearest neighbor and Hausdorff pair information between a pair of paths from an all-pairs comparison generated by - PSA.- The nearest neighbors for each path of a pair of paths is generated by - PSAPair.compute_nearest_neighbors()and stores the result in a dictionary (- nearest_neighbors): each path has a- numpy.ndarrayof the frames of its nearest neighbors, and a- numpy.ndarrayof its nearest neighbor distances- PSAnalysis.D. For example, nearest_neighbors[‘frames’] is a pair of- numpy.ndarray, the first being the frames of the nearest neighbors of the first path, i, the second being those of the second path, j.- The Hausdorff pair for the pair of paths is found by calling - find_hausdorff_pair()(locates the nearest neighbor pair having the largest overall distance separating them), which stores the result in a dictionary (- hausdorff_pair) containing the frames (indices) of the pair along with the corresponding (Hausdorff) distance. hausdorff_pair[‘frame’] contains a pair of frames in the first path, i, and the second path, j, respectively, that correspond to the Hausdorff distance between them.- New in version 0.11. - Set up a - PSAPairfor a pair of paths that are part of a- PSAcomparison of npaths total paths.- Each unique pair of paths compared using - PSAis related by their nearest neighbors (and corresponding distances) and the Hausdorff pair and distance.- PSAPairis a convenience class for calculating and encapsulating nearest neighbor and Hausdorff pair information for one pair of paths.- Given npaths, - PSAperforms and all-pairs comparison among all paths for a total of :math:` ext{npaths}*( ext{npaths}-1)/2` unique comparisons. If distances between paths are computed, the all-pairs comparison can be summarized in a symmetric distance matrix whose upper triangle can be mapped to a corresponding distance vector form in a one-to-one manner. A particular comparison of a pair of paths in a given instance of- PSAPairis thus unique identified by the row and column indices in the distance matrix representation (whether or not distances are actually computed), or a single ID (index) in the corresponding distance vector.- Parameters: - 
matrix_id¶
- (int, int), (row, column) indices of the location of this - PSAPairin the corresponding pairwise distance matrix
 - 
pair_id¶
- int, ID of this - PSAPair(the pair_id:math:^text{th} comparison) in the distance vector corresponding to the pairwise distance matrix
 
- 
- 
class MDAnalysis.analysis.psa.PSAnalysis(universes, reference=None, select='name CA', ref_frame=0, path_select=None, labels=None, targetdir='.')[source]¶
- Perform Path Similarity Analysis (PSA) on a set of trajectories. - The analysis is performed with - PSAnalysis.run()and stores the result in the- numpy.ndarraydistance matrix- PSAnalysis.D.- PSAnalysis.run()also generates a fitted trajectory and path from alignment of the original trajectories to a reference structure.- New in version 0.8. - Changed in version 1.0.0: - save_result()method has been removed. You can use- np.save()on- PSAnalysis.Dinstead.- Setting up Path Similarity Analysis. - The mutual similarity between all unique pairs of trajectories are computed using a selected path metric. - Parameters: - universes (list) – a list of universes (MDAnalysis.Universeobject), each containing a trajectory
- reference (Universe) – reference coordinates; MDAnalysis.Universeobject; ifNonethe first time step of the first item in universes is used [None]
- select (str or dict or tuple) – The selection to operate on; can be one of: - any valid selection string for
select_atoms()that produces identical selections in mobile and reference; or
- a dictionary {'mobile':sel1, 'reference':sel2}(theMDAnalysis.analysis.align.fasta2select()function returns such a dictionary based on a ClustalW or STAMP sequence alignment); or
- a tuple (sel1, sel2)
 When using 2. or 3. with sel1 and sel2 then these selections can also each be a list of selection strings (to generate an AtomGroup with defined atom order as described under Ordered selections). 
- any valid selection string for
- tol_mass (float) – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
- ref_frame (int) – frame index to select frame from reference [0]
- path_select (str) – atom selection composing coordinates of (fitted) path; if Nonethen path_select is set to select [None]
- targetdir (str) – output files are saved there; if Nonethen “./psadata” is created and used [.]
- labels (list) – list of strings, names of trajectories to be analyzed
(MDAnalysis.Universe); ifNone, defaults to trajectory names [None]
 - 
universes¶
- list of - MDAnalysis.Universeobjects containing trajectories
 - 
u_reference¶
- MDAnalysis.Universeobject containing a reference structure
 - 
select¶
- string, selection for - select_atoms()to select frame from- PSAnalysis.u_reference
 - 
path_select¶
- string, selection for - select_atoms()to select atoms to compose- Path.path
 - 
ref_frame¶
- int, frame index to select frame from - Path.u_reference
 - 
paths¶
- list of - numpy.ndarrayobjects representing the set/ensemble of fitted trajectories
 - 
D¶
- numpy.ndarraywhichs store calculated distance matrix
 - 
cluster(dist_mat=None, method='ward', count_sort=False, distance_sort=False, no_plot=False, no_labels=True, color_threshold=4)[source]¶
- Cluster trajectories and optionally plot the dendrogram. - This method is used by - PSAnalysis.plot()to generate a heatmap- dendrogram combination plot. By default, the distance matrix,- PSAnalysis.D, is assumed to exist, converted to distance-vector form, and inputted to- cluster.hierarchy.linkage()to generate a clustering. For convenience in plotting arbitrary distance matrices, one can also be specify dist_mat, which will be checked for proper distance matrix form by- spatial.distance.squareform()- Parameters: - dist_mat (numpy.ndarray) – user-specified distance matrix to be clustered [None]
- method (str) – name of linkage criterion for clustering ['ward']
- no_plot (bool) – if True, do not render the dendrogram [False]
- no_labels (bool) – if Truethen do not label dendrogram [True]
- color_threshold (float) – For brevity, let t be the color_threshold. Colors all the
descendent links below a cluster node k the same color if k is
the first node below the cut threshold t. All links connecting
nodes with distances greater than or equal to the threshold are
colored blue. If t is less than or equal to zero, all nodes are
colored blue. If color_threshold is None or ‘default’,
corresponding with MATLAB(TM) behavior, the threshold is set to
0.7*max(Z[:,2]). [4]]
 - Returns: - Z – output from scipy.cluster.hierarchy.linkage(); list of indices representing the row-wise order of the objects after clustering
- dgram – output from scipy.cluster.hierarchy.dendrogram()
 
- dist_mat (numpy.ndarray) – user-specified distance matrix to be clustered [
 - 
generate_paths(align=False, filename=None, infix='', weights=None, tol_mass=False, ref_frame=None, flat=False, save=True, store=False)[source]¶
- Generate paths, aligning each to reference structure if necessary. - Parameters: - align (bool) – Align trajectories to atom selection PSAnalysis.selectofPSAnalysis.u_reference[False]
- filename (str) – strings representing base filename for fitted trajectories and
paths [None]
- infix (str) – additional tag string that is inserted into the output filename of the fitted trajectory files [‘’]
- weights ({“mass”, None} or array_like (optional)) – choose weights. With"mass"uses masses as weights; withNoneweigh each atom equally. If a float array of the same length as the selected AtomGroup is provided, use each element of the array_like as a weight for the corresponding atom in the AtomGroup [None]
- tol_mass (float) – Reject match if the atomic masses for matched atoms differ by more
than tol_mass [False]
- ref_frame (int) – frame index to select frame from reference [None]
- flat (bool) – represent Path.pathas a 2D (\(N_p\times 3N\))numpy.ndarray; ifFalsethenPath.pathis a 3D (\(N_p\times N\times 3\))numpy.ndarray[False]
- save (bool) – if True, pickle list of names for fitted trajectories [True]
- store (bool) – if Truethen writes each path (numpy.ndarray) inPSAnalysis.pathsto compressed npz (numpy) files [False]
 - The fitted trajectories are written to new files in the “/trj_fit” subdirectory in - PSAnalysis.targetdirnamed “filename(trajectory)XXX*infix*_psa”, where “XXX” is a number between 000 and 999; the extension of each file is the same as its original. Optionally, the trajectories can also be saved in numpy compressed npz format in the “/paths” subdirectory in- PSAnalysis.targetdirfor persistence and can be accessed as the attribute- PSAnalysis.paths.- Deprecated since version 0.16.1: Instead of - mass_weighted=Trueuse new- weights='mass'; refactored to fit with AnalysisBase API- Changed in version 0.17.0: Deprecated keyword mass_weighted was removed. - Changed in version 1.0.0: Defaults for the store and filename keywords have been changed from True and fitted to False and None respectively. These now match the docstring documented defaults. 
- align (bool) – Align trajectories to atom selection 
 - 
get_num_atoms()[source]¶
- Return the number of atoms used to construct the - Pathinstances in- PSA.- Returns: - the number of atoms in any path - Return type: - int - Note - Must run - PSAnalysis.generate_paths()prior to calling this method.
 - 
get_num_paths()[source]¶
- Return the number of paths in - PSA.- Note - Must run - PSAnalysis.generate_paths()prior to calling this method.- Returns: - the number of paths in - PSA- Return type: - int 
 - 
get_pairwise_distances(vectorform=False, checks=False)[source]¶
- Return the distance matrix (or vector) of pairwise path distances. - Note - Must run - PSAnalysis.run()prior to calling this method.- Parameters: - vectorform (bool) – if True, return the distance vector instead [False]
- checks (bool) – if True, check thatPSAnalysis.Dis a proper distance matrix [False]
 - Returns: - representation of the distance matrix (or vector) - Return type: 
- vectorform (bool) – if 
 - 
get_paths()[source]¶
- Return the paths in - PSA.- Note - Must run - PSAnalysis.generate_paths()prior to calling this method.- Returns: - list of - numpy.ndarrayrepresentations of paths in- PSA- Return type: - list 
 - 
hausdorff_pairs¶
- The Hausdorff pair for each (unique) pairs of paths. - This attribute contains a list of Hausdorff pair information (in distance vector order), where each element is a dictionary containing the pair of frames and the (Hausdorff) distance between a pair of paths. See - PSAnalysis.psa_pairs()and- PSAPair.hausdorff_pairfor more information about accessing Hausdorff pair data.- Note - Must run - PSAnalysis.run_pairs_analysis()with- hausdorff_pairs=Trueprior to calling this method.
 - 
load()[source]¶
- Load fitted paths specified by ‘psa_path-names.pkl’ in - PSAnalysis.targetdir.- All filenames are determined by - PSAnalysis.- See also 
 - 
nearest_neighbors¶
- The nearest neighbors for each (unique) pair of paths. - This attribute contains a list of nearest neighbor information (in distance vector order), where each element is a dictionary containing the nearest neighbor frames and distances between a pair of paths. See - PSAnalysis.psa_pairs()and- PSAPair.nearest_neighborsfor more information about accessing nearest neighbor data.- Note - Must run - PSAnalysis.run_pairs_analysis()with- neighbors=Trueprior to calling this method.
 - 
plot(filename=None, linkage='ward', count_sort=False, distance_sort=False, figsize=4.5, labelsize=12)[source]¶
- Plot a clustered distance matrix. - Usese method linkage and plots the corresponding dendrogram. Rows (and columns) are identified using the list of strings specified by - PSAnalysis.labels.- If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to - matplotlib.pyplot.matshow().- Parameters: - filename (str) – save figure to filename [None]
- linkage (str) – name of linkage criterion for clustering ['ward']
- count_sort (bool) – see scipy.cluster.hierarchy.dendrogram()[False]
- distance_sort (bool) – see scipy.cluster.hierarchy.dendrogram()[False]
- figsize (float) – set the vertical size of plot in inches [4.5]
- labelsize (float) – set the font size for colorbar labels; font size for path labels on
dendrogram default to 3 points smaller [12]
 - Returns: 
- filename (str) – save figure to filename [
 - 
plot_annotated_heatmap(filename=None, linkage='ward', count_sort=False, distance_sort=False, figsize=8, annot_size=6.5)[source]¶
- Plot a clustered distance matrix. - Uses method linkage and plots annotated distances in the matrix. Rows (and columns) are identified using the list of strings specified by - PSAnalysis.labels.- If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to - matplotlib.pyplot.imshow().- Parameters: - filename (str) – save figure to filename [None]
- linkage (str) – name of linkage criterion for clustering ['ward']
- count_sort (bool) – see scipy.cluster.hierarchy.dendrogram()[False]
- distance_sort (bool) – see scipy.cluster.hierarchy.dendrogram()[False]
- figsize (float) – set the vertical size of plot in inches [4.5]
- annot_size (float) – font size of annotation labels on heat map [6.5]
 - Returns: - Note - This function requires the seaborn package, which can be installed with pip install seaborn or conda install seaborn. - Changed in version 1.0.0: - tick1On,- tick2On,- label1Onand- label2Onchanged to- tick1line,- tick2line,- label1and- label2due to upstream deprecation (see #2493)
- filename (str) – save figure to filename [
 - 
plot_nearest_neighbors(filename=None, idx=0, labels=('Path 1', 'Path 2'), figsize=4.5, multiplot=False, aspect_ratio=1.75, labelsize=12)[source]¶
- Plot nearest neighbor distances as a function of normalized frame number. - The frame number is mapped to the interval [0, 1]. - If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, …). All other keyword arguments are passed on to - matplotlib.pyplot.imshow().- Parameters: - filename (str) – save figure to filename [None]
- idx (int) – index of path (pair) comparison to plot [0]
- labels ((str, str)) – pair of names to label nearest neighbor distance
curves [('Path 1', 'Path 2')]
- figsize (float) – set the vertical size of plot in inches [4.5]
- multiplot (bool) – set to Trueto enable plotting multiple nearest neighbor distances on the same figure [False]
- aspect_ratio (float) – set the ratio of width to height of the plot [1.75]
- labelsize (float) – set the font size for colorbar labels; font size for path labels on
dendrogram default to 3 points smaller [12]
 - Returns: - ax - Return type: - axes - Note - This function requires the seaborn package, which can be installed with pip install seaborn or conda install seaborn. 
- filename (str) – save figure to filename [
 - 
psa_pairs¶
- The list of - PSAPairinstances for each pair of paths.- psa_pairsis a list of all- PSAPairobjects (in distance vector order). The elements of a- PSAPairare pairs of paths that have been compared using- PSAnalysis.run_pairs_analysis(). Each- PSAPaircontains nearest neighbor and Hausdorff pair information specific to a pair of paths. The nearest neighbor frames and distances for a- PSAPaircan be accessed in the nearest neighbor dictionary using the keys ‘frames’ and ‘distances’, respectively. E.g.,- PSAPair.nearest_neighbors['distances']returns a pair of- numpy.ndarraycorresponding to the nearest neighbor distances for each path. Similarly, Hausdorff pair information can be accessed using- PSAPair.hausdorff_pairwith the keys ‘frames’ and ‘distance’.- Note - Must run - PSAnalysis.run_pairs_analysis()prior to calling this method.
 - 
run(**kwargs)[source]¶
- Perform path similarity analysis on the trajectories to compute the distance matrix. - A number of parameters can be changed from the defaults. The result is stored as the array - PSAnalysis.D.- Parameters: - metric (str or callable) – selection string specifying the path metric to measure pairwise
distances among PSAnalysis.pathsor a callable with the same call signature ashausdorff()['hausdorff']
- start (int) – start and stop frame index with step size: analyze
trajectory[start:stop:step][None]
- stop (int) –
- step (int) –
- versionchanged: (.) – 1.0.0: store and filename have been removed.
 
- metric (str or callable) – selection string specifying the path metric to measure pairwise
distances among 
 - 
run_pairs_analysis(**kwargs)[source]¶
- Perform PSA Hausdorff (nearest neighbor) pairs analysis on all unique pairs of paths in - PSAnalysis.paths.- Partial results can be stored in separate lists, where each list is indexed according to distance vector convention (i.e., element (i,j) in distance matrix representation corresponds to element \(s=N*i+j-(i+1)*(i+2)\) in distance vector representation, which is the \(s^ ext{th}\) comparison). For each unique pair of paths, the nearest neighbors for that pair can be stored in - NNand the Hausdorff pair in- HP.- PPstores the full information of Hausdorff pairs analysis that is available for each pair of path, including nearest neighbors lists and the Hausdorff pairs.- The pairwise distances are stored as the array - PSAnalysis.D.- Parameters: - start (int) – start and stop frame index with step size: analyze
trajectory[start:stop:step][None]
- stop (int) –
- step (int) –
- neighbors (bool) – if True, then stores dictionary of nearest neighbor frames/distances inPSAnalysis.NN[False]
- hausdorff_pairs (bool) – if True, then stores dictionary of Hausdorff pair frames/distances inPSAnalysis.HP[False]
 
- start (int) – start and stop frame index with step size: analyze
 - 
save_paths(filename=None)[source]¶
- Save fitted - PSAnalysis.pathsto numpy compressed npz files.- The data are saved with - numpy.savez_compressed()in the directory specified by- PSAnalysis.targetdir.- Parameters: - filename (str) – specifies filename [ - None]- Returns: - filename - Return type: - str - See also 
 
- universes (list) – a list of universes (