.long
- proteopy.read.long(intensities, level=None, *, sample_annotation=None, var_annotation=None, column_map=None, sep=None, fill_na=None, zero_to_na=False, sort_obs_by_annotation=False, verbose=False)[source]
- Return type:
- Parameters:
- Read long-format peptide or protein tabular data into an
AnnData container.
The
intensitiestable must be in long format with one row per (sample, feature) measurement. Required columns differ by level:Peptide level:
sample_id,intensity, andpeptide_idmust be present.protein_idmay come from the intensities table or fromvar_annotation; see below.Protein level:
sample_id,intensity, andprotein_idmust all be present.
At peptide level,
protein_idis resolved in two steps. If the intensities table already containsprotein_id, it is used directly. Otherwise,var_annotationmust be supplied and contain bothpeptide_idandprotein_id.sample_annotation, when supplied, must contain asample_idcolumn and is merged intoadata.obs.var_annotation, when supplied, must contain apeptide_idcolumn (peptide level) or aprotein_idcolumn (protein level) and is merged intoadata.var.Column names that differ from the defaults above can be mapped to the canonical names via
column_map.- intensitiesstr | Path | pd.DataFrame
Long-form intensities data. Accepts a file path (str or Path) or a
pandas.DataFrame.- level{“peptide”, “protein”}, default None
Select whether to process peptide- or protein-level inputs. This argument is required.
- sample_annotationstr | Path | pd.DataFrame, optional
Optional obs annotations. Accepts a file path or DataFrame.
- var_annotationstr | Path | pd.DataFrame, optional
Optional var annotations. Accepts a file path or DataFrame. Interpreted as peptide annotations when
level="peptide"and as protein annotations whenlevel="protein".- column_mapdict, optional
Optional mapping that specifies custom column names for the expected keys: peptide_id, protein_id, sample_id, intensity.
- sepstr, optional
Delimiter passed to pandas.read_csv. If None (the default), the separator is auto-detected from the file extension. Ignored when input is a DataFrame.
- fill_nafloat, optional
Optional replacement value for missing intensity entries.
- zero_to_nabool, optional
If True, zeros in the AnnData X matrix will be replaced with
np.nan.- sort_obs_by_annotationbool, default False
When True, reorder observations to match the order of samples in the annotation (if supplied) or the original intensity table.
- verbosebool, optional
If True, print status messages.
- AnnData
Structured representation of the long-form intensities ready for downstream analysis.
Example 1: Minimal peptide-level read with
protein_idin the intensities DataFrame.>>> import pandas as pd >>> import proteopy as pr >>> intensities = pd.DataFrame({ ... "sample_id": [ ... "S1", "S1", "S2", "S2", ... ], ... "peptide_id": [ ... "PEP1", "PEP2", "PEP1", "PEP2", ... ], ... "protein_id": [ ... "PROT1", "PROT1", "PROT1", "PROT1", ... ], ... "intensity": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> adata = pr.read.long( ... intensities, level="peptide", ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'peptide_id', 'protein_id'
Example 2: Peptide-level read with
protein_idsupplied viavar_annotationinstead of the intensities DataFrame.>>> intensities = pd.DataFrame({ ... "sample_id": [ ... "S1", "S1", "S2", "S2", ... ], ... "peptide_id": [ ... "PEP1", "PEP2", "PEP1", "PEP2", ... ], ... "intensity": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> var_ann = pd.DataFrame({ ... "peptide_id": ["PEP1", "PEP2"], ... "protein_id": ["PROT1", "PROT1"], ... }) >>> adata = pr.read.long( ... intensities, ... level="peptide", ... var_annotation=var_ann, ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'peptide_id', 'protein_id'
Example 3: Minimal protein-level read from a CSV file.
>>> import tempfile >>> from pathlib import Path >>> csv_text = ( ... "sample_id,protein_id,intensity
- “
… “S1,PROT1,12450.0
- “
… “S1,PROT2,8730.0
- “
… “S2,PROT1,15320.0
- “
… “S2,PROT2,6890.0
- “
… ) >>> tmp = tempfile.NamedTemporaryFile( … suffix=”.csv”, delete=False, mode=”w”, … ) >>> _ = tmp.write(csv_text) >>> tmp.close() >>> adata = pr.read.long( … Path(tmp.name), level=”protein”, … ) >>> adata AnnData object with n_obs × n_vars = 2 × 2
obs: ‘sample_id’ var: ‘protein_id’
>>> Path(tmp.name).unlink()
Example 4: Protein-level read with non-standard column names remapped via
column_map.>>> intensities = pd.DataFrame({ ... "run": ["S1", "S1", "S2", "S2"], ... "prot": [ ... "PROT1", "PROT2", "PROT1", "PROT2", ... ], ... "quant": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> adata = pr.read.long( ... intensities, ... level="protein", ... column_map={ ... "sample_id": "run", ... "protein_id": "prot", ... "intensity": "quant", ... }, ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'protein_id'