4.5.2. Updated nucleic acid analysis — MDAnalysis.analysis.nucleicacids
- Author:
Alia Lescoulie
- Year:
2022-2023
- copyright:
LGPLv2.1
The module provides classes for analyzing nucleic acids structures. This is an updated, higher performance version of previous nucleic acid tools. For applications see [1][2].
References
4.5.2.1. Distances
- class MDAnalysis.analysis.nucleicacids.NucPairDist(selection1: List[AtomGroup], selection2: List[AtomGroup], **kwargs)[source]
Atom pair distance calculation base class.
Takes two lists of
AtomGroup
and computes the distances between them over a trajectory. Used as a superclass for the other nucleic acid distances classes. The distance will be measured between atoms sharing an index in the two lists ofAtomGroup
.- Parameters:
- results.pair_distances
2D array of pair distances. First dimension is simulation time, second dimension contains the pair distances for each each entry pair in selection1 and selection2.
New in version 2.4.0.
Note
results.pair_distances is slated for deprecation in version 3.0.0, use results.distances instead.
Deprecated since version 2.7.0: results.pair_distances will be removed in version 3.0.0, use
results.distances
instead.- Type:
- results.distances
stored in a 2d numpy array with first index selecting the Residue pair, and the second index selecting the frame number Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
New in version 2.7.0.
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – If the selections given are not the same length
ValueError – An
AtomGroup
in one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroup
when selecting the atom pairs used in the distance calculations
Version Info
Changed in version 2.5.0: The ability to access by passing selection indices to
results
is now removed as of MDAnalysis version 2.5.0. Please useresults.pair_distances
instead. Theresults.times
was deprecated and is now removed as of MDAnalysis 2.5.0. Please use the class attributetimes
instead.Changed in version 2.7.0: Added static method
select_strand_atoms
as a helper for selecting atom pairs for distance analysis.- run(start=None, stop=None, step=None, frames=None, verbose=None, *, progressbar_kwargs={})
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a
ValueError
.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
- static select_strand_atoms(strand1: ResidueGroup, strand2: ResidueGroup, a1_name: str, a2_name: str, g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C') Tuple[List[AtomGroup], List[AtomGroup]] [source]
A helper method for nucleic acid pair distance analyses. Used for selecting specific atoms from two strands of nucleic acids.
- Parameters:
strand1 (List[Residue]) – The first nucleic acid strand
strand2 (List[Residue]) – The second nucleic acid strand
a1_name (str) – The selection for the purine base of the strand pair
a2_name (str) – the selection for the pyrimidine base of the strand pair
g_name (str (optional)) – Name of Guanine in topology, by default assigned to G
a_name (str (optional)) – Name of Adenine in topology, by default assigned to A
u_name (str (optional)) – Name of Uracil in topology, by default assigned to U
t_name (str (optional)) – Name of Thymine in topology, by default assigned to T
c_name (str (optional)) – Name of Cytosine in topology, by default assigned to C
- Returns:
returns a tuple containing two lists of
AtomGroup
s corresponding to the provided selections from each strand.- Return type:
- Raises:
New in version 2.7.0.
- class MDAnalysis.analysis.nucleicacids.WatsonCrickDist(strand1: List[Residue] | ResidueGroup, strand2: List[Residue] | ResidueGroup, n1_name: str = 'N1', n3_name: str = 'N3', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Watson-Crick base pair distance for selected residues over a trajectory.
Takes two
ResidueGroup
objects or two lists ofResidue
and calculates the distance between the nitrogen atoms in the Watson-Crick hydrogen bond over the trajectory. Bases are matched either by their index in the twoResidueGroup
provided as arguments, or based on the indices of the provided lists ofResidue
objects depending on which is provided.Note
Support for
Residue
is slated for deprecation and will raise a warning when used. It still works butResidueGroup
is preferred.- Parameters:
strand1 (ResidueClass) –
First list of bases
Deprecated since version 2.7.0: Using a list of
Residue
will be removed in 3.0.0. Use aResidueGroup
.strand2 (ResidueClass) –
Second list of bases
Deprecated since version 2.7.0: Using a list of
Residue
will be removed in 3.0.0. Use aResidueGroup
.n1_name (str (optional)) – Name of Nitrogen 1 of nucleic acids, by default assigned to “N1”
n3_name (str (optional)) – Name of Nitrogen 3 of nucleic acids, by default assigned to “N3”
g_name (str (optional)) – Name of Guanine in topology, by default assigned to “G”
a_name (str (optional)) – Name of Adenine in topology, by default assigned to “A”
u_name (str (optional)) – Name of Uracil in topology, by default assigned to “U”
t_name (str (optional)) – Name of Thymine in topology, by default assigned to “T”
c_name (str (optional)) – Name of Cytosine in topology, by default assigned to C
**kwargs (dict) – Key word arguments for
AnalysisBase
- results.distances
Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
New in version 2.7.0.
- Type:
- results.pair_distances
2D array of pair distances. First dimension is simulation time, second dimension contains the pair distances for each each entry pair in selection1 and selection2.
New in version 2.4.0.
Deprecated since version 2.7.0: results.pair_distances will be removed in version 3.0.0, use
results.distances
instead.- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
TypeError – If the provided list of
Residue
contains non-Residue elements .. deprecated:: 2.7.0 Starting with version 3.0.0, this exception will no longer be raised because onlyResidueGroup
will be allowed.ValueError – If strand1 and strand2 are not the same length
ValueError: – An
AtomGroup
in one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroup
when selecting the atom pairs used in the distance calculations
Version Info
Changed in version 2.5.0: Accessing results by passing strand indices to
results
was deprecated and is now removed as of MDAnalysis version 2.5.0. Please useresults.pair_distances
instead. Theresults.times
was deprecated and is now removed as of MDAnalysis 2.5.0. Please use the class attributetimes
instead.Changed in version 2.7.0: strand1 and strand2 now also accept a
ResidueGroup
as input. The previous input type,List[Residue]
is still supported, but it is deprecated and will be removed in release 3.0.0.- run(start=None, stop=None, step=None, frames=None, verbose=None, *, progressbar_kwargs={})
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a
ValueError
.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
- class MDAnalysis.analysis.nucleicacids.MinorPairDist(strand1: ResidueGroup, strand2: ResidueGroup, o2_name: str = 'O2', c2_name: str = 'C2', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Minor-Pair basepair distance for selected residues over a trajectory.
Takes two
ResidueGroup
objects and calculates the Minor-groove hydrogen bond length between the nitrogen and oxygen atoms over the trajectory. Bases are matched by their index in the twoResidueGroup
provided as arguments.- Parameters:
strand1 (List[Residue]) – First list of bases
strand2 (List[Residue]) – Second list of bases
o2_name (str (optional)) – Name of Oxygen 2 of nucleic acids; by default assigned to “O2”;
c2_name (str (optional)) – Name of Carbon 2 of nucleic acids; by default assigned to “C2”;
g_name (str (optional)) – Name of Guanine in topology; by default assigned to “G”;
a_name (str (optional)) – Name of Adenine in topology by default assigned to “A”;
u_name (str (optional)) – Name of Uracil in topology; by default assigned to “U”;
t_name (str (optional)) – Name of Thymine in topology; by default assigned to “T”;
c_name (str (optional)) – Name of Cytosine in topology; by default assigned to “C”;
**kwargs – keyword arguments for
AnalysisBase
- results.distances
stored in a 2d numpy array with first index selecting the Residue pair, and the second index selecting the frame number
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – If the selections given are not the same length A
Residue
in one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroup
when selecting the atom pairs used in the distance calculations
New in version 2.7.0.
- run(start=None, stop=None, step=None, frames=None, verbose=None, *, progressbar_kwargs={})
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a
ValueError
.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars
- class MDAnalysis.analysis.nucleicacids.MajorPairDist(strand1: ResidueGroup, strand2: ResidueGroup, n4_name: str = 'N4', o6_name: str = 'O6', g_name: str = 'G', a_name: str = 'A', u_name: str = 'U', t_name: str = 'T', c_name: str = 'C', **kwargs)[source]
Minor-Pair base pair distance for selected residues over a trajectory.
Takes two
ResidueGroup
objects and calculates the Major-groove hydrogen bond length between the nitrogen and oxygen atoms over the trajectory. Bases are matched by their index in the twoResidueGroup
provided as arguments.- Parameters:
strand1 (List[Residue]) – First list of bases
strand2 (List[Residue]) – Second list of bases
o6_name (str (optional)) – Name of Oxygen 6 of nucleic acids; by default assigned to “O6”
n4_name (str (optional)) – Name of Nitrogen 4 of nucleic acids; by default assigned to “N4”
g_name (str (optional)) – Name of Guanine in topology; by default assigned to “G”
a_name (str (optional)) – Name of Adenine in topology; by default assigned to “A”
u_name (str (optional)) – Name of Uracil in topology; by default assigned to “U”
t_name (str (optional)) – Name of Thymine in topology; by default assigned to “T”
c_name (str (optional)) – Name of Cytosine in topology; by default assigned to “C”
**kwargs – arguments for
AnalysisBase
- results.distances
Distances are stored in a 2d numpy array with axis 0 (first index) indexing the trajectory frame and axis 1 (second index) selecting the Residue pair.
- Type:
- times
Simulation times for analysis.
- Type:
- Raises:
ValueError – A
Residue
in one of the strands not a valid nucleic acidValueError – If a given residue pair from the provided strands returns an empty
AtomGroup
when selecting the atom pairs used in the distance calculationsValueError – if the selections given are not the same length
New in version 2.7.0.
- run(start=None, stop=None, step=None, frames=None, verbose=None, *, progressbar_kwargs={})
Perform the calculation
- Parameters:
start (int, optional) – start frame of analysis
stop (int, optional) – stop frame of analysis
step (int, optional) – number of frames to skip between each analysed frame
frames (array_like, optional) –
array of integers or booleans to slice trajectory; frames can only be used instead of start, stop, and step. Setting both frames and at least one of start, stop, step to a non-default value will raise a
ValueError
.New in version 2.2.0.
verbose (bool, optional) – Turn on verbosity
progressbar_kwargs (dict, optional) – ProgressBar keywords with custom parameters regarding progress bar position, etc; see
MDAnalysis.lib.log.ProgressBar
for full list.
Changed in version 2.2.0: Added ability to analyze arbitrary frames by passing a list of frame indices in the frames keyword argument.
Changed in version 2.5.0: Add progressbar_kwargs parameter, allowing to modify description, position etc of tqdm progressbars