.diann

proteopy.read.diann(diann_output_path, aggr_level, version='1.0.0', **kwargs)[source]

Read a DIA-NN report into an AnnData object.

Parameters:
  • diann_output_path (str | Path) – Path to the DIA-NN output file. TSV for version "1.0.0"; parquet for version "1.9.1".

  • aggr_level (str) –

    Peptide aggregation level. Accepted values (case-insensitive regex match):

    • "Precursor.Id" — one row per charge-modified sequence pair; no intensity summing across precursors.

    • "Modified.Sequence" — sum precursor quantities per modified peptide sequence.

    • "Stripped.Sequence" — sum precursor quantities per unmodified peptide sequence.

  • version (str, optional) – DIA-NN version string used to select the parsing handler. Floor-matched against supported versions.

  • **kwargs

    Additional keyword arguments forwarded to the version-specific handler. Common options:

    v1.0.0 handler (_read_diann_v1):

    • precursor_pval_max (float) — maximum Q.Value.

    • gene_pval_max (float) — maximum Protein.Q.Value.

    • global_precursor_pval_max (float) — maximum Global.Q.Value.

    • show_input_stats (bool) — print Q-value distributions and proteotypicity fractions before and after filtering.

    • run_parser (callable | None) — function applied to each Run value to transform sample identifiers.

    • fill_na (float | int | None) — value used to replace NaN entries in the intensity matrix.

    v1.9.1 handler (_read_diann_v1_9_1):

    • max_precursor_q (float | None) — maximum Q.Value.

    • max_protein_q (float | None) — maximum Protein.Q.Value.

    • max_global_precursor_q (float | None) — maximum Global.Q.Value.

    • normalized (bool) — use Precursor.Normalised instead of Precursor.Quantity as the intensity column.

    • run_parser (callable | None) — function applied to each Run value to transform sample identifiers.

    • fill_na (float | int | None) — value used to replace NaN entries in the intensity matrix.

    • zero_to_na (bool) — replace zeros with np.nan before returning. Mutually exclusive with fill_na.

    • verbose (bool) — print row counts at each filtering step.

Returns:

AnnData with shape (n_samples, n_peptides). Observations (.obs) contain sample_id; variables (.var) contain peptide_id, protein_id.

Return type:

ad.AnnData

Raises:
  • ValueError – If version is below the minimum supported version.

  • ValueError – If aggr_level does not match any recognised pattern.

  • ValueError – If required columns are absent from the input file (v1.0.0).

  • ValueError – If no rows remain after Q-value and proteotypicity filtering.

  • NotImplementedError – If a protein-level aggr_level is requested for DIA-NN >= 1.9.1.

Examples

Read a DIA-NN v1.0.0 TSV report at stripped-sequence level:

>>> import proteopy as pr
>>> adata = pr.read.diann(
...     "report.tsv",
...     aggr_level="Stripped.Sequence",
...     version="1.0.0",
...     precursor_pval_max=0.01,
...     gene_pval_max=0.01,
...     global_precursor_pval_max=0.01,
... )

Read a DIA-NN v1.9.1 parquet report at precursor level with a custom run-name parser:

>>> import proteopy as pr
>>> adata = pr.read.diann(
...     "report.parquet",
...     aggr_level="Precursor.Id",
...     version="1.9.1",
...     max_precursor_q=0.01,
...     run_parser=lambda s: s.split("/")[-1].split(".")[0],
...     verbose=True,
... )