3. Selection commands
Once you have the Universe()
object, you can
select atoms (using a syntax very similar to CHARMM’s atom selection
syntax):
>>> kalp = universe.select_atoms("segid KALP")
The select_atoms()
method of a
AtomGroup
or a
Universe
returns a
AtomGroup
, so you can use all the methods
defined for AtomGroups on them. Selections always return an
AtomGroup
with atoms sorted according to their
index in the topology (this is to ensure that there are not any duplicates,
which can happen with complicated selections).
One can group subselections using parentheses:
>>> universe.select_atoms("segid DMPC and not (name H* or type OW)")
<AtomGroup with 3420 atoms>
Almost all the basic CHARMM selections work.
It is also possible to export selections for external software packages with the help of Selection exporters.
Note
By default, atoms are sorted by index in the output AtomGroup.
For example, the below code will return the first, second, and
sixth atom in ag
:
>>> ag = u.select_atoms("name N")
>>> ag2 = ag[[5, 1, 0]]
>>> ag3 = ag2.select_atoms("name N")
>>> np.all(ag3.ix == ag2.ix)
False
You can turn off sorting behavior with the sorted
keyword:
>>> ag = u.select_atoms("name N")
>>> ag2 = ag[[5, 1, 0]]
>>> ag3 = ag2.select_atoms("name N", sorted=False)
>>> np.all(ag3.ix == ag2.ix)
True
For further details on ordered selections, see Ordered selections.
3.1. Selection Keywords
The following describes all selection keywords currently understood by the selection parser. The following applies to all selections:
Keywords are case sensitive.
Atoms are automatically sequentially ordered in a resulting selection (see notes below on Ordered selections for how to circumvent this if necessary).
Selections are parsed left to right and parentheses can be used for grouping.
You can use the singular name of any topology attribute as a selection keyword. Defined topology attributes are listed in the User Guide. Alternatively, you can define a
TopologyAttr
yourself, providing that the attributedtype
is one ofint
,float
,str
(orobject
), orbool
. However, the topology must contain this attribute information for the selection to work.Selections of attributes that are integers or floats can use the syntax “myTopologyAttr 0 - 2”, “myTopologyAttr 0:2”, or “myTopologyAttr 0 to 2”, to select a range with both ends inclusive. Whitespace and negative numbers are allowed.
“myTopologyAttr 0” can be used to select all atoms matching the value; however, this can be tricky with floats because of precision differences and we recommend using a range like above when possible.
Boolean selections default to True, so “myTopologyAttr” and “myTopologyAttr True” both give all atoms with
myTopologyAttr == True
.
See also
Regular expression patterns
FLOAT_PATTERN
for matching floats;
INT_PATTERN
for matching integers;
and RANGE_PATTERN
for matching
selection ranges.
3.1.1. Simple selections
This is a non-exhaustive list of the available selection keywords. As noted
in the dot point above, keywords will be automatically generated for any
suitable TopologyAttr
. A list of
Defined topology attributes is available in the User Guide.
- protein, backbone, nucleic, nucleicbackbone
selects all atoms that belong to a standard set of residues; a protein is identfied by a hard-coded set of residue names so it may not work for esoteric residues.
- water
selects all atoms that belong to a set of water residues; water is defined with a set of common water abbreviations present in topology files and may not work with certain water residue names. Currently the following water resnames are supported: 3 letter resnames:
H2O
,HOH
,OH2
,HHO
,OHH
,TIP
,T3P
,T4P
,T5P
,SOL
,WAT
. 4 letter resnames:TIP2
,TIP3
,TIP4
.- segid seg-name
select by segid (as given in the topology), e.g.
segid 4AKE
orsegid DMPC
- resid residue-number-range
resid can take a single residue number or a range of numbers. A range consists of two numbers separated by a colon (inclusive) such as
resid 1:5
. A residue number (“resid”) is taken directly from the topology.- resnum resnum-number-range
resnum is the canonical residue number; typically it is set to the residue id in the original PDB structure.
- resname residue-name
select by residue name, e.g.
resname LYS
- name atom-name
select by atom name (as given in the topology). Often, this is force field dependent. Example:
name CA
(for Cα atoms) orname OW
(for SPC water oxygen)- type atom-type
select by atom type; this is either a string or a number and depends on the force field; it is read from the topology file (e.g. the CHARMM PSF file contains numeric atom types). It has non-sensical values when a PDB or GRO file is used as a topology.
- atom seg-name residue-number atom-name
a selector for a single atom consisting of segid resid atomname, e.g.
DMPC 1 C2
selects the C2 carbon of the first residue of the DMPC segment- altLoc alternative-location
a selection for atoms where alternative locations are available, which is often the case with high-resolution crystal structures e.g.
resid 4 and resname ALA and altLoc B
selects only the atoms of ALA-4 that have an altLoc B record.- chainID chain-name
a selection for atoms where chainIDs have been defined.
- element element-name
a selection for atoms where elements have been defined. e.g.
element H C
- moltype molecule-type
select by molecule type, e.g.
moltype Protein_A
. At the moment, only the TPR format defines the molecule type.- smarts SMARTS-query
select atoms using Daylight’s SMARTS queries, e.g.
smarts [#7;R]
to find nitrogen atoms in rings. Requires RDKit. All matches are combined as a single unique match. The smarts selection accepts two sets of key word arguments from select_atoms(): therdkit_kwargs
are passed internally to RDKitConverter.convert() and thesmarts_kwargs
are passed to RDKit’s GetSubstructMatches. By default, the useChirality kwarg inrdkit_kwargs
is set to true and maxMatches insmarts_kwargs
ismax(1000, 10 * n_atoms)
, wheren_atoms
is eitherlen(AtomGroup)
orlen(Universe.atoms)
, whichever is applicable. Note that the number of matches can occasionally exceed the default value of maxMatches, causing too few atoms to be returned. If this occurs, a warning will be issued. The problem can be fixed by increasing the value of maxMatches. This behavior may be updated in the future- chiral R | S
select a particular stereocenter. e.g.
name C and chirality S
to select only S-chiral carbon atoms. OnlyR
andS
will be possible options but other values will not raise an error.- formalcharge formal-charge
select atoms based on their formal charge, e.g.
name O and formalcharge -1
to select all oxygens with a negative 1 formal charge.
3.1.2. Pattern matching
The pattern matching notation described below is used to specify
patterns for matching strings (based on fnmatch
):
?
Is a pattern that will match any single character. For example,
resname T?R
selects residues named “TYR” and “THR”.*
Is a pattern that will match multiple characters. For example,
GL*
selects all strings that start with “GL” such as “GLU”, “GLY”, “GLX29”, “GLN”.[seq]
Would match any character in seq. For example, “resname GL[NY]” selects all residues named “GLN” or “GLY” but would not select “GLU”.
[!seq]
Would match any character not in seq. For example, “resname GL[!NY]” would match residues named “GLU” but would not match “GLN” or “GLY”.
3.1.3. Boolean
- not
all atoms not in the selection, e.g.
not protein
selects all atoms that aren’t part of a protein- and, or
combine two selections according to the rules of boolean algebra, e.g.
protein and not (resname ALA or resname LYS)
selects all atoms that belong to a protein, but are not in a lysine or alanine residue
3.1.4. Geometric
- around distance selection
selects all atoms a certain cutoff away from another selection, e.g.
around 3.5 protein
selects all atoms not belonging to protein that are within 3.5 Angstroms from the protein- sphlayer innerRadius externalRadius selection
selects all atoms within a spherical layer centered in the center of geometry (COG) of a given selection, e.g.,
sphlayer 2.4 6.0 ( protein and ( resid 130 or resid 80 ) )
selects the center of geometry of protein, resid 130, resid 80 and creates a spherical layer of inner radius 2.4 and external radius 6.0 around the COG.- sphzone externalRadius selection
selects all atoms within a spherical zone centered in the center of geometry (COG) of a given selection, e.g.
sphzone 6.0 ( protein and ( resid 130 or resid 80 ) )
selects the center of geometry of protein, resid 130, resid 80 and creates a sphere of radius 6.0 around the COG.- isolayer inner radius outer radius selection
Similar to sphlayer, but will find layer around all referenced atoms. For example, if the atom types for a polymer backbone were chosen, then an isolayer parallel to the backbone will be generated. As another example, if a set of ions were chosen as a reference to isolate the second hydration layer, then they will all be included in the same group. However, in the instance that a molecule is in the second hydration layer of one ion and the first hydration layer of another, those atoms will not be included.
- cylayer innerRadius externalRadius zMax zMin selection
selects all atoms within a cylindric layer centered in the center of geometry (COG) of a given selection, e.g.
cylayer 5 10 10 -8 protein
selects the center of geometry of protein, and creates a cylindrical layer of inner radius 5, external radius 10 centered on the COG. In z, the cylinder extends from 10 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.- cyzone externalRadius zMax zMin selection
selects all atoms within a cylindric zone centered in the center of geometry (COG) of a given selection, e.g.
cyzone 15 4 -8 protein and resid 42
selects the center of geometry of protein and resid 42, and creates a cylinder of external radius 15 centered on the COG. In z, the cylinder extends from 4 above the COG to 8 below. Positive values for zMin, or negative ones for zMax, are allowed.Changed in version 0.10.0: keywords cyzone and cylayer now take zMax and zMin to be relative to the COG of selection, instead of absolute z-values in the box.
- point x y z distance
selects all atoms within a cutoff of a point in space, make sure coordinate is separated by spaces, e.g.
point 5.0 5.0 5.0 3.5
selects all atoms within 3.5 Angstroms of the coordinate (5.0, 5.0, 5.0)- prop [abs] property operator value
selects atoms based on position, using property x, y, or z coordinate. Supports the abs keyword (for absolute value) and the following operators: <, >, <=, >=, ==, !=. For example,
prop z >= 5.0
selects all atoms with z coordinate greater than 5.0;prop abs z <= 5.0
selects all atoms within -5.0 <= z <= 5.0.
Note
By default periodicity is taken into account with geometric
selections, i.e. selections will find atoms that are in different
periodic images.
To control this behaviour, use the boolean "periodic"
keyword
argument of select_atoms()
.
3.1.5. Similarity and connectivity
- same subkeyword as selection
selects all atoms that have the same subkeyword value as any atom in selection. Allowed subkeyword values are the atom properties:
name, type, resname, resid, segid, mass, charge, radius, bfactor, resnum
, the groups an atom belong to:residue, segment, fragment
, and the atom coordinatesx, y, z
.- byres selection
selects all atoms that are in the same segment and residue as selection, e.g. specify the subselection after the byres keyword.
byres
is a shortcut tosame residue as
- bonded selection
selects all atoms that are bonded to selection eg:
select name H and bonded name O
selects only hydrogens bonded to oxygens
3.1.6. Index
- bynum index-range
selects all atoms within a range of (1-based) inclusive indices, e.g.
bynum 1
selects the first atom in the universe;bynum 5:10
selects atoms 5 through 10 inclusive. All atoms in theMDAnalysis.Universe
are consecutively numbered, and the index runs from 1 up to the total number of atoms.- id index-range
selects all atoms in a range of (1-based) inclusive indices, e.g.
id 1
selects all the atoms with id 1;id 5:7
selects all atoms with ids 5, all atoms with ids 6 and all atoms with ids 7.- index index-range
selects all atoms within a range of (0-based) inclusive indices, e.g.
index 0
selects the first atom in the universe;index 5:10
selects atoms 6 through 11 inclusive. All atoms in theMDAnalysis.Universe
are consecutively numbered, and the index runs from 0 up to the total number of atoms - 1.
Note
Conventionally, id
corresponds to the serial number in the PDB format. In contrast
to bynum
, the id
topology attribute is not necessarily continuous, ordered, or
unique, and can be arbitrarily assigned by the user.
3.1.7. Preexisting selections and modifiers
- group group-name
selects the atoms in the
AtomGroup
passed to the function as an argument named group-name. Only the atoms common to group-name and the instanceselect_atoms()
was called from will be considered, unlessgroup
is preceded by theglobal
keyword. group-name will be included in the parsing just by comparison of atom indices. This means that it is up to the user to make sure the group-name group was defined in an appropriateUniverse
.- global selection
by default, when issuing
select_atoms()
from anAtomGroup
, selections and subselections are returned intersected with the atoms of that instance. Prefixing a selection term withglobal
causes its selection to be returned in its entirety. As an example, theglobal
keyword allows forlipids.select_atoms("around 10 global protein")
— wherelipids
is a group that does not contain any proteins. Wereglobal
absent, the result would be an empty selection since theprotein
subselection would itself be empty. When issuingselect_atoms()
from aUniverse
,global
is ignored.
Changed in version 1.0.0: The fullgroup
selection has now been removed. Please use the equivalent
global group
selection.
3.2. Dynamic selections
By default select_atoms()
returns an
AtomGroup
, in which the list of atoms is
constant across trajectory frame changes. If
select_atoms()
is invoked with named
argument updating
set to True
, an
UpdatingAtomGroup
instance will be returned
instead. It behaves just like an AtomGroup
object, with the difference that the selection expressions are re-evaluated
every time the trajectory frame changes (this happens lazily, only when the
UpdatingAtomGroup
object is accessed so that
there is no redundant updating going on):
# A dynamic selection of corner atoms:
>>> ag_updating = universe.select_atoms("prop x < 5 and prop y < 5 and prop z < 5", updating=True)
>>> ag_updating
<UpdatingAtomGroup with 9 atoms>
>>> universe.trajectory.next()
>>> ag_updating
<UpdatingAtomGroup with 14 atoms>
Using the group
selection keyword for
preexisting-selections, one can
make updating selections depend on
AtomGroup
, or even other
UpdatingAtomGroup
, instances.
Likewise, making an updating selection from an already updating group will
cause later updates to also reflect the updating of the base group:
>>> chained_ag_updating = ag_updating.select_atoms("resid 1:1000", updating=True)
>>> chained_ag_updating
<UpdatingAtomGroup with 3 atoms>
>>> universe.trajectory.next()
>>> chained_ag_updating
<UpdatingAtomGroup with 7 atoms>
Finally, a non-updating selection or a slicing/addition operation made on an
UpdatingAtomGroup
will return a static
AtomGroup
, which will no longer update
across frames:
>>> static_ag = ag_updating.select_atoms("resid 1:1000")
>>> static_ag
<UpdatingAtomGroup with 3 atoms>
>>> universe.trajectory.next()
>>> static_ag
<UpdatingAtomGroup with 3 atoms>
3.3. Ordered selections
select_atoms()
sorts the atoms
in the AtomGroup
by atom index before
returning them (this is to eliminate possible duplicates in the
selection). If the ordering of atoms is crucial (for instance when
describing angles or dihedrals) or if duplicate atoms are required
then one has to concatenate multiple AtomGroups, which does not sort
them.
The most straightforward way to concatentate two AtomGroups is by using the
+
operator:
>>> ordered = u.select_atoms("segid DMPC and resid 3 and name P") + u.select_atoms("segid DMPC and resid 2 and name P")
>>> print(list(ordered))
[< Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>,
< Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>]
A shortcut is to provide two or more selections to
select_atoms()
, which then
does the concatenation automatically:
>>> print(list(universe.select_atoms("segid DMPC and resid 3 and name P", "segid DMPC and resid 2 and name P")))
[< Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>,
< Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>]
Just for comparison to show that a single selection string does not work as one might expect:
# WRONG!
>>> print(list(universe.select_atoms("segid DMPC and (resid 3 or resid 2) and name P")))
[< Atom 452: name 'P' of type '180' of resid 'DMPC', 2 and 'DMPC'>,
< Atom 570: name 'P' of type '180' of resid 'DMPC', 3 and 'DMPC'>]