9.2. PDB Fetchers — MDAnalysis.fetch.pdb

This suite of functions download structure files from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Batabank (PDB).

9.2.1. Variables

MDAnalysis.fetch.pdb.DEFAULT_CACHE_NAME_DOWNLOADER = 'MDAnalysis_pdbs'

Name of the pooch cache directory pooch.os_cache(DEFAULT_CACHE_NAME_DOWNLOADER); see pooch.os_cache() for further details.

Added in version 2.11.0.

9.2.2. Functions

MDAnalysis.fetch.pdb.from_PDB(pdb_ids, cache_path=None, progressbar=False, file_format='cif.gz')[source]

Download one or more PDB files from the RCSB Protein Data Bank and cache them locally.

Given one or multiple PDB IDs, downloads the corresponding structure files format and stores them in a local cache directory. If files are cached on disk, from_PDB will skip the download and use the cached version instead.

Returns the path(s) as a Path to the downloaded file(s).

Parameters:
  • pdb_ids (str or sequence of str) – A single PDB ID as a string, or a sequence of PDB IDs to fetch.

  • cache_path (str or pathlib.Path) – Directory where downloaded file(s) will be cached. The default None argument uses the pooch default cache with project name DEFAULT_CACHE_NAME_DOWNLOADER.

  • file_format (str) – The file extension/format to download (e.g., “cif”, “pdb”). See the Notes section below for a list of all supported file formats.

  • progressbar (bool) – If True, display a progress bar during file downloads. Default is False.

Returns:

The path(s) to the downloaded file(s). Returns a single Path if a single pdb id is given, or a list of Path if multiple pdb ids are provided.

Return type:

Path or list of Path

Raises:

Notes

This function uses the RCSB File Download Services for directly downloading structure files via https.

The RCSB currently provides data in 'cif' , 'cif.gz' , 'bcif' , 'bcif.gz' , 'xml' , 'xml.gz' , 'pdb' , 'pdb.gz', 'pdb1', 'pdb1.gz' file formats and can therefore be downloaded. Not all of these formats can be currently read with MDAnalysis.

Caching, controlled by the cache_path parameter, is handled internally by pooch. The default cache name is taken from DEFAULT_CACHE_NAME_DOWNLOADER. To clear cache (and subsequently force re-fetching), it is required to delete the cache folder as specified by cache_path.

Examples

Download a single PDB file:

>>> mda.fetch.from_PDB("1AKE", file_format="cif")
'./MDAnalysis_pdbs/1AKE.cif'

Download multiple PDB files with a progress bar:

>>> mda.fetch.from_PDB(["1AKE", "4BWZ"], progressbar=True)
['./MDAnalysis_pdbs/1AKE.pdb.gz', './MDAnalysis_pdbs/4BWZ.pdb.gz']

Download a single PDB file and convert it to a universe:

>>> mda.Universe(mda.fetch.from_PDB("1AKE"), file_format="pdb.gz")
<Universe with 3816 atoms>

Download multiple PDB files and convert each of them into a universe:

>>> [mda.Universe(pdb) for pdb in mda.fetch.from_PDB(["1AKE", "4BWZ"], progressbar=True)]
[<Universe with 3816 atoms>, <Universe with 2824 atoms>]

Added in version 2.11.0.