Skip to content

CLI Reference

The qpx command-line tool provides a comprehensive set of commands for converting, transforming, querying, and validating mass spectrometry proteomics data.

Overview

The qpx CLI is organized into five main command groups:

Convert Commands

Convert various mass spectrometry data formats to the QPX standard format:

  • QuantMS Conversion: Convert QuantMS mzTab format to QPX data files
  • DIA-NN Conversion: Convert DIA-NN reports to feature and protein group formats
  • MaxQuant Conversion: Convert MaxQuant PSM, feature, and protein group data
  • FragPipe Conversion: Convert FragPipe PSM data to QPX format
  • mzIdentML Conversion: Convert mzIdentML format PSM data
  • SDRF Conversion: Convert SDRF metadata to QPX sample/run format

View detailed documentation →

Transform Commands

Transform and process data within the QPX ecosystem:

  • Gene Mapping: Map gene information from FASTA to protein data
  • Protein Quantification: Compute protein-level quantification via mokume (DirectLFQ, MaxLFQ, iBAQ, TopN, etc.)

View detailed documentation →

Query Commands

Query QPX datasets using SQL or structured commands:

  • Execute SQL queries on QPX Parquet files
  • Filter and aggregate data
  • Join multiple datasets
  • Export query results

Info Commands

Inspect QPX datasets and metadata:

  • Display dataset schema and statistics
  • Show sample and run information
  • List available columns and data types
  • Validate data integrity

Validate Commands

Validate QPX data against schemas:

  • Check data conformance to QPX specifications
  • Validate required fields and data types
  • Verify referential integrity
  • Generate validation reports

Python API

Core data operations are available through the Python API:

import qpx

# Load a dataset
with qpx.open_dataset("path/to/dataset") as ds:
    # Access data views
    psm_df = ds.psm.to_df()
    feature_df = ds.feature.to_df()

    # Filter and query
    targets = ds.psm.targets_only().to_df()
    count = ds.feature.count()

    # Validate against canonical schema
    results = ds.validate()
    for name, result in results.items():
        print(f"{name}: {result.summary}")
        for issue in result.issues:
            print(f"  [{issue.severity}] {issue.message}")

    # Run SQL queries
    df = ds.sql("SELECT anchor_protein, COUNT(*) AS n FROM feature GROUP BY 1")

Quick Start

Installation

pip install qpx

Basic Usage

View all available commands:

qpxc --help

View help for a specific command group:

qpxc convert --help

View detailed help for a specific command:

qpxc convert diann --help

Common Options

Most commands support the following common options:

  • --verbose: Enable verbose logging for debugging
  • --output-folder: Specify the output directory
  • --output-prefix: Specify the output file prefix

Example Data Processing Workflow

A typical data processing workflow:

# 1. Convert raw data from MaxQuant
qpxc convert maxquant \
    --evidence-file evidence.txt \
    --msms-file msms.txt \
    --output-folder ./output

# 2. Protein quantification (DirectLFQ)
qpxc transform quantify \
    --feature-path ./output/feature.parquet \
    --method directlfq \
    -o ./output/proteins.parquet

# 3. Query the data
qpxc query sql \
    --dataset-path ./output \
    --sql "SELECT anchor_protein, AVG(intensity) FROM feature GROUP BY anchor_protein" \
    --output results.csv

# 4. Validate the dataset
qpxc validate \
    --dataset-path ./output

# 5. Inspect dataset information
qpxc info --dataset-path ./output
qpxc info schema --dataset-path ./output --structure feature

For further analysis, use the Python API:

import qpx

# Load and inspect the dataset
with qpx.open_dataset("./output") as ds:
    # Validate the dataset
    results = ds.validate()
    for name, result in results.items():
        print(f"{name}: {result.summary}")

    # Query with SQL
    top_proteins = ds.sql(
        "SELECT anchor_protein, COUNT(*) AS n "
        "FROM feature GROUP BY 1 ORDER BY n DESC LIMIT 10"
    )
    print(top_proteins)

    # Access data views as DataFrames
    feature_df = ds.feature.to_df()
    print(f"Features: {len(feature_df)} rows")

Getting Help