`.diann`

proteopy.read.diann(diann_output_path, aggr_level, version='1.0.0', **kwargs)[source]

Read a DIA-NN report into an AnnData object.

Parameters:

diann_output_path (str | Path) – Path to the DIA-NN output file. TSV for version "1.0.0"; parquet for version "1.9.1".
aggr_level (str) –
Peptide aggregation level. Accepted values (case-insensitive regex match):
- "Precursor.Id" — one row per charge-modified sequence pair; no intensity summing across precursors.
- "Modified.Sequence" — sum precursor quantities per modified peptide sequence.
- "Stripped.Sequence" — sum precursor quantities per unmodified peptide sequence.
version (str, optional) – DIA-NN version string used to select the parsing handler. Floor-matched against supported versions.
**kwargs –
Additional keyword arguments forwarded to the version-specific handler. Common options:

v1.0.0 handler (_read_diann_v1):
- precursor_pval_max (float) — maximum Q.Value.
- gene_pval_max (float) — maximum Protein.Q.Value.
- global_precursor_pval_max (float) — maximum Global.Q.Value.
- show_input_stats (bool) — print Q-value distributions and proteotypicity fractions before and after filtering.
- run_parser (callable | None) — function applied to each Run value to transform sample identifiers.
- fill_na (float | int | None) — value used to replace NaN entries in the intensity matrix.
v1.9.1 handler (_read_diann_v1_9_1):
- max_precursor_q (float | None) — maximum Q.Value.
- max_protein_q (float | None) — maximum Protein.Q.Value.
- max_global_precursor_q (float | None) — maximum Global.Q.Value.
- normalized (bool) — use Precursor.Normalised instead of Precursor.Quantity as the intensity column.
- run_parser (callable | None) — function applied to each Run value to transform sample identifiers.
- fill_na (float | int | None) — value used to replace NaN entries in the intensity matrix.
- zero_to_na (bool) — replace zeros with np.nan before returning. Mutually exclusive with fill_na.
- verbose (bool) — print row counts at each filtering step.

Returns:

AnnData with shape (n_samples, n_peptides). Observations (.obs) contain sample_id; variables (.var) contain peptide_id, protein_id.

Return type:

ad.AnnData

Raises:

ValueError – If version is below the minimum supported version.
ValueError – If aggr_level does not match any recognised pattern.
ValueError – If required columns are absent from the input file (v1.0.0).
ValueError – If no rows remain after Q-value and proteotypicity filtering.
NotImplementedError – If a protein-level aggr_level is requested for DIA-NN >= 1.9.1.

Examples

Read a DIA-NN v1.0.0 TSV report at stripped-sequence level:

>>> import proteopy as pr
>>> adata = pr.read.diann(
...     "report.tsv",
...     aggr_level="Stripped.Sequence",
...     version="1.0.0",
...     precursor_pval_max=0.01,
...     gene_pval_max=0.01,
...     global_precursor_pval_max=0.01,
... )

Read a DIA-NN v1.9.1 parquet report at precursor level with a custom run-name parser:

>>> import proteopy as pr
>>> adata = pr.read.diann(
...     "report.parquet",
...     aggr_level="Precursor.Id",
...     version="1.9.1",
...     max_precursor_q=0.01,
...     run_parser=lambda s: s.split("/")[-1].split(".")[0],
...     verbose=True,
... )

.diann

`.diann`