9.2. PDB Fetchers — MDAnalysis.fetch.pdb
This suite of functions download structure files from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Batabank (PDB).
9.2.1. Variables
- MDAnalysis.fetch.pdb.DEFAULT_CACHE_NAME_DOWNLOADER = 'MDAnalysis_pdbs'
Name of the
poochcache directorypooch.os_cache(DEFAULT_CACHE_NAME_DOWNLOADER); seepooch.os_cache()for further details.Added in version 2.11.0.
9.2.2. Functions
- MDAnalysis.fetch.pdb.from_PDB(pdb_ids, cache_path=None, progressbar=False, file_format='cif.gz')[source]
Download one or more PDB files from the RCSB Protein Data Bank and cache them locally.
Given one or multiple PDB IDs, downloads the corresponding structure files format and stores them in a local cache directory. If files are cached on disk, from_PDB will skip the download and use the cached version instead.
Returns the path(s) as a
Pathto the downloaded file(s).- Parameters:
pdb_ids (str or sequence of str) – A single PDB ID as a string, or a sequence of PDB IDs to fetch.
cache_path (str or pathlib.Path) – Directory where downloaded file(s) will be cached. The default
Noneargument uses thepoochdefault cache with project nameDEFAULT_CACHE_NAME_DOWNLOADER.file_format (str) – The file extension/format to download (e.g., “cif”, “pdb”). See the Notes section below for a list of all supported file formats.
progressbar (bool) – If True, display a progress bar during file downloads. Default is False.
- Returns:
The path(s) to the downloaded file(s). Returns a single
Pathif a single pdb id is given, or a list ofPathif multiple pdb ids are provided.- Return type:
- Raises:
ValueError – For an invalid file format. Supported file formats are under Notes.
requests.exceptions.HTTPError – If an invalid PDB code is specified.
Notes
This function uses the RCSB File Download Services for directly downloading structure files via https.
The RCSB currently provides data in
'cif','cif.gz','bcif','bcif.gz','xml','xml.gz','pdb','pdb.gz','pdb1','pdb1.gz'file formats and can therefore be downloaded. Not all of these formats can be currently read with MDAnalysis.Caching, controlled by the cache_path parameter, is handled internally by
pooch. The default cache name is taken fromDEFAULT_CACHE_NAME_DOWNLOADER. To clear cache (and subsequently force re-fetching), it is required to delete the cache folder as specified by cache_path.Examples
Download a single PDB file:
>>> mda.fetch.from_PDB("1AKE", file_format="cif") './MDAnalysis_pdbs/1AKE.cif'
Download multiple PDB files with a progress bar:
>>> mda.fetch.from_PDB(["1AKE", "4BWZ"], progressbar=True) ['./MDAnalysis_pdbs/1AKE.pdb.gz', './MDAnalysis_pdbs/4BWZ.pdb.gz']
Download a single PDB file and convert it to a universe:
>>> mda.Universe(mda.fetch.from_PDB("1AKE"), file_format="pdb.gz") <Universe with 3816 atoms>
Download multiple PDB files and convert each of them into a universe:
>>> [mda.Universe(pdb) for pdb in mda.fetch.from_PDB(["1AKE", "4BWZ"], progressbar=True)] [<Universe with 3816 atoms>, <Universe with 2824 atoms>]
Added in version 2.11.0.