.contaminants

proteopy.download.contaminants(source='frankenfield2022', path=None, force=False, verbose=False)[source]

Download a contaminant FASTA file from a supported source.

Fetches a protein contaminant database in FASTA format and writes it to disk. Two sources are supported:

  • "frankenfield2022": Universal contaminant library for DDA and DIA proteomics from Frankenfield et al. [1] Headers are reformatted to standard db|accession|id notation and Cont_ prefixes are stripped from accession numbers. See database description.

  • "gpm_crap": The GPM common Repository of Adventitious Proteins (cRAP) [2], a community-curated list of common laboratory contaminants. See database description.

Parameters:
  • source (str, optional) – Identifier of the contaminant database to download. Supported values: "frankenfield2022", "gpm_crap".

  • path (str | Path | None, optional) – Destination path for the downloaded FASTA file. When None, a default file name is written in the current working directory with an MD5 digest appended to the stem for reproducible identification.

  • force (bool, optional) – If True, overwrite an existing file at the resolved destination path.

  • verbose (bool, optional) – Print download URL, formatting status, and final save path to stdout.

Returns:

Absolute path to the written FASTA file.

Return type:

Path

Raises:
  • ValueError – If source is not one of the supported source keys, or if a FASTA header from the "frankenfield2022" source does not contain exactly three pipe-separated fields or carries an invalid UniProt accession number.

  • FileExistsError – If a file already exists at the resolved destination and force is False.

Examples

Download the Frankenfield 2022 library to the default path:

>>> import proteopy as pr
>>> path = pr.download.contaminants()

Download the GPM cRAP database to the default path:

>>> path = pr.download.contaminants(source="gpm_crap")

Save to a specific location:

>>> path = pr.download.contaminants(
...     source="frankenfield2022",
...     path="my_project/contaminants.fasta",
... )

References