Transform Commands¶
Transform and process data within the QPX ecosystem.
Overview¶
The transform command group provides tools for processing and transforming QPX data into various downstream formats. These commands enable gene annotation and protein-level quantification from feature data.
Available Commands¶
- gene-map - Map genes from FASTA
- quantify - Protein quantification via mokume (DirectLFQ, MaxLFQ, iBAQ, TopN, etc.)
gene-map¶
Map gene information from FASTA to parquet format.
Description¶
Enriches protein identifications in QPX PSM or feature files with gene-level metadata extracted from FASTA database headers.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--parquet-path |
FILE | Yes | - | QPX PSM or feature parquet file path |
--fasta |
FILE | Yes | - | FASTA database file path |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--species |
TEXT | No | human |
Species name for gene mapping |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Map gene information to parquet file:
qpxc transform gene-map \
--parquet-path ./output/psm.parquet \
--fasta proteins.fasta \
--output-folder ./output \
--species human
With Species Parameter¶
qpxc transform gene-map \
--parquet-path ./output/feature.parquet \
--fasta tests/examples/fasta/Homo-sapiens.fasta \
--output-folder ./output \
--species human
Output Files¶
- Output: Enhanced parquet file(s) with gene information
- Format: Parquet file in output folder
- Added Fields: Gene names and metadata from FASTA headers
Best Practices¶
- Use species-specific FASTA files for accurate gene annotation
- Enable verbose mode for debugging
quantify¶
Compute protein-level quantification from QPX feature data using mokume.
Description¶
Reads a QPX feature.parquet file, extracts peptide-level intensities, and computes protein-level quantification using the selected method. Supported methods: directlfq — DirectLFQ intensity traces (default) maxlfq — MaxLFQ delayed normalization topn — Average of N most intense peptides top3 — Average of 3 most intense peptides ibaq — Intensity-Based Absolute Quantification (requires --fasta) sum — Sum of all peptide intensities
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--feature-path |
FILE | Yes | - | QPX feature.parquet file path |
--method |
TEXT | No | directlfq |
Quantification method (directlfq, maxlfq, topn, top3, ibaq, sum) |
--fasta |
FILE | No | - | FASTA database (required for ibaq method) |
--enzyme |
TEXT | No | Trypsin | Enzyme for iBAQ digestion (default: Trypsin) |
--topn-n |
INTEGER | No | 3 | N for TopN method (default: 3) |
--threads |
INTEGER | No | -1 | Parallel threads for MaxLFQ (-1 = all cores) |
--output |
PATH | Yes | - | Output file path (.parquet or .tsv) |
--normalize |
FLAG | No | - | Normalize quantification values |
--verbose |
FLAG | No | - | Enable verbose logging |
Supported Methods¶
| Method | Description | Extra Requirements |
|---|---|---|
directlfq |
DirectLFQ intensity traces (default) | pip install mokume[directlfq] |
maxlfq |
MaxLFQ delayed normalization | -- |
topn |
Average of N most intense peptides | --topn-n to set N |
top3 |
Average of 3 most intense peptides | -- |
ibaq |
Intensity-Based Absolute Quantification | --fasta required |
sum |
Sum of all peptide intensities | -- |
Usage Examples¶
DirectLFQ (default)¶
qpxc transform quantify \
--feature-path ./qpx_output/feature.parquet \
--method directlfq \
-o proteins_directlfq.parquet
iBAQ (requires FASTA)¶
qpxc transform quantify \
--feature-path ./qpx_output/feature.parquet \
--method ibaq --fasta proteome.fasta \
-o proteins_ibaq.tsv
MaxLFQ with 8 threads¶
qpxc transform quantify \
--feature-path ./qpx_output/feature.parquet \
--method maxlfq --threads 8 \
-o proteins_maxlfq.parquet
TopN with normalization¶
qpxc transform quantify \
--feature-path ./qpx_output/feature.parquet \
--method topn --topn-n 5 --normalize \
-o proteins_top5.parquet
Output Files¶
- Parquet:
.parquetfiles with protein-level quantification - TSV:
.tsvfiles (tab-separated) — determined by output file extension - Content: Protein accessions, sample IDs, and quantified intensities
Common Issues¶
Issue: mokume is not installed
- Solution: Install with
pip install mokume
Issue: DirectLFQ is not installed
- Solution: Install with
pip install mokume[directlfq]
Issue: --fasta option is required for the ibaq method
- Solution: Provide a FASTA database file with
--fasta
Best Practices¶
- Ensure QPX feature.parquet contains valid
anchor_protein,intensities, andrun_file_namefields - Decoy entries (
is_decoy=true) and zero-intensity rows are automatically filtered - Use
--normalizefor cross-sample normalization - Use
--threadsto control parallelism for MaxLFQ
Related Commands¶
- Convert Commands - Convert raw data to QPX format
- Visualization Commands - Visualize transformed data
- Statistics Commands - Analyze transformed data