Skip to content

Transform Commands

Transform and process data within the QPX ecosystem.

Overview

The transform command group provides tools for processing and transforming QPX data into various downstream formats. These commands enable gene annotation and protein-level quantification from feature data.

Available Commands

  • gene-map - Map genes from FASTA
  • quantify - Protein quantification via mokume (DirectLFQ, MaxLFQ, iBAQ, TopN, etc.)

gene-map

Map gene information from FASTA to parquet format.

Description

Enriches protein identifications in QPX PSM or feature files with gene-level metadata extracted from FASTA database headers. 

Parameters

ParameterTypeRequiredDefaultDescription
--parquet-path FILE Yes - QPX PSM or feature parquet file path
--fasta FILE Yes - FASTA database file path
--output-folder DIRECTORY Yes - Output directory for generated files
--species TEXT No human Species name for gene mapping
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Map gene information to parquet file:

qpxc transform gene-map \
    --parquet-path ./output/psm.parquet \
    --fasta proteins.fasta \
    --output-folder ./output \
    --species human

With Species Parameter

qpxc transform gene-map \
    --parquet-path ./output/feature.parquet \
    --fasta tests/examples/fasta/Homo-sapiens.fasta \
    --output-folder ./output \
    --species human

Output Files

  • Output: Enhanced parquet file(s) with gene information
  • Format: Parquet file in output folder
  • Added Fields: Gene names and metadata from FASTA headers

Best Practices

  • Use species-specific FASTA files for accurate gene annotation
  • Enable verbose mode for debugging

quantify

Compute protein-level quantification from QPX feature data using mokume.

Description

Reads a QPX feature.parquet file, extracts peptide-level intensities, and computes protein-level quantification using the selected method.  Supported methods: directlfq — DirectLFQ intensity traces (default) maxlfq — MaxLFQ delayed normalization topn — Average of N most intense peptides top3 — Average of 3 most intense peptides ibaq — Intensity-Based Absolute Quantification (requires --fasta) sum — Sum of all peptide intensities 

Parameters

ParameterTypeRequiredDefaultDescription
--feature-path FILE Yes - QPX feature.parquet file path
--method TEXT No directlfq Quantification method (directlfq, maxlfq, topn, top3, ibaq, sum)
--fasta FILE No - FASTA database (required for ibaq method)
--enzyme TEXT No Trypsin Enzyme for iBAQ digestion (default: Trypsin)
--topn-n INTEGER No 3 N for TopN method (default: 3)
--threads INTEGER No -1 Parallel threads for MaxLFQ (-1 = all cores)
--output PATH Yes - Output file path (.parquet or .tsv)
--normalize FLAG No - Normalize quantification values
--verbose FLAG No - Enable verbose logging

Supported Methods

Method Description Extra Requirements
directlfq DirectLFQ intensity traces (default) pip install mokume[directlfq]
maxlfq MaxLFQ delayed normalization --
topn Average of N most intense peptides --topn-n to set N
top3 Average of 3 most intense peptides --
ibaq Intensity-Based Absolute Quantification --fasta required
sum Sum of all peptide intensities --

Usage Examples

DirectLFQ (default)

qpxc transform quantify \
    --feature-path ./qpx_output/feature.parquet \
    --method directlfq \
    -o proteins_directlfq.parquet

iBAQ (requires FASTA)

qpxc transform quantify \
    --feature-path ./qpx_output/feature.parquet \
    --method ibaq --fasta proteome.fasta \
    -o proteins_ibaq.tsv

MaxLFQ with 8 threads

qpxc transform quantify \
    --feature-path ./qpx_output/feature.parquet \
    --method maxlfq --threads 8 \
    -o proteins_maxlfq.parquet

TopN with normalization

qpxc transform quantify \
    --feature-path ./qpx_output/feature.parquet \
    --method topn --topn-n 5 --normalize \
    -o proteins_top5.parquet

Output Files

  • Parquet: .parquet files with protein-level quantification
  • TSV: .tsv files (tab-separated) — determined by output file extension
  • Content: Protein accessions, sample IDs, and quantified intensities

Common Issues

Issue: mokume is not installed

  • Solution: Install with pip install mokume

Issue: DirectLFQ is not installed

  • Solution: Install with pip install mokume[directlfq]

Issue: --fasta option is required for the ibaq method

  • Solution: Provide a FASTA database file with --fasta

Best Practices

  • Ensure QPX feature.parquet contains valid anchor_protein, intensities, and run_file_name fields
  • Decoy entries (is_decoy=true) and zero-intensity rows are automatically filtered
  • Use --normalize for cross-sample normalization
  • Use --threads to control parallelism for MaxLFQ