.contaminants
- proteopy.download.contaminants(source='frankenfield2022', path=None, force=False, verbose=False)[source]
Download a contaminant FASTA file from a supported source.
Fetches a protein contaminant database in FASTA format and writes it to disk. Two sources are supported:
"frankenfield2022": Universal contaminant library for DDA and DIA proteomics from Frankenfield et al. [1] Headers are reformatted to standarddb|accession|idnotation andCont_prefixes are stripped from accession numbers. See database description."gpm_crap": The GPM common Repository of Adventitious Proteins (cRAP) [2], a community-curated list of common laboratory contaminants. See database description.
- Parameters:
source (str, optional) – Identifier of the contaminant database to download. Supported values:
"frankenfield2022","gpm_crap".path (str | Path | None, optional) – Destination path for the downloaded FASTA file. When
None, a default file name is written in the current working directory with an MD5 digest appended to the stem for reproducible identification.force (bool, optional) – If
True, overwrite an existing file at the resolved destination path.verbose (bool, optional) – Print download URL, formatting status, and final save path to stdout.
- Returns:
Absolute path to the written FASTA file.
- Return type:
Path
- Raises:
ValueError – If
sourceis not one of the supported source keys, or if a FASTA header from the"frankenfield2022"source does not contain exactly three pipe-separated fields or carries an invalid UniProt accession number.FileExistsError – If a file already exists at the resolved destination and
forceisFalse.
Examples
Download the Frankenfield 2022 library to the default path:
>>> import proteopy as pr >>> path = pr.download.contaminants()
Download the GPM cRAP database to the default path:
>>> path = pr.download.contaminants(source="gpm_crap")
Save to a specific location:
>>> path = pr.download.contaminants( ... source="frankenfield2022", ... path="my_project/contaminants.fasta", ... )
References