CLI Reference¶
The qpx command-line tool provides a comprehensive set of commands for converting, transforming, querying, and validating mass spectrometry proteomics data.
Overview¶
The qpx CLI is organized into five main command groups:
Convert Commands¶
Convert various mass spectrometry data formats to the QPX standard format:
- QuantMS Conversion: Convert QuantMS mzTab format to QPX data files
- DIA-NN Conversion: Convert DIA-NN reports to feature and protein group formats
- MaxQuant Conversion: Convert MaxQuant PSM, feature, and protein group data
- FragPipe Conversion: Convert FragPipe PSM data to QPX format
- mzIdentML Conversion: Convert mzIdentML format PSM data
- SDRF Conversion: Convert SDRF metadata to QPX sample/run format
Transform Commands¶
Transform and process data within the QPX ecosystem:
- Gene Mapping: Map gene information from FASTA to protein data
- Protein Quantification: Compute protein-level quantification via mokume (DirectLFQ, MaxLFQ, iBAQ, TopN, etc.)
Query Commands (coming soon)¶
Query QPX datasets using SQL or structured commands:
- Execute SQL queries on QPX Parquet files
- Filter and aggregate data
- Join multiple datasets
- Export query results
Info Commands (coming soon)¶
Inspect QPX datasets and metadata:
- Display dataset schema and statistics
- Show sample and run information
- List available columns and data types
- Validate data integrity
Validate Commands (coming soon)¶
Validate QPX data against schemas:
- Check data conformance to QPX specifications
- Validate required fields and data types
- Verify referential integrity
- Generate validation reports
Python API¶
Visualization, statistics, and project management functionality are available through the Python API:
import qpx
# Load a dataset
dataset = qpx.Dataset("path/to/dataset")
# Generate visualizations
dataset.psm.plot.distribution()
dataset.feature.plot.intensity_heatmap()
# Compute statistics
stats = dataset.psm.stats()
summary = dataset.get_summary()
# Project management
project = qpx.Project.from_pride("PXD000001")
project.attach_files(["psm.parquet", "feature.parquet"])
See the API documentation for more details.
Quick Start¶
Installation¶
Basic Usage¶
View all available commands:
View help for a specific command group:
View detailed help for a specific command:
Common Options¶
Most commands support the following common options:
--verbose: Enable verbose logging for debugging--output-folder: Specify the output directory--output-prefix: Specify the output file prefix
Example Data Processing Workflow¶
A typical data processing workflow:
# 1. Convert raw data from MaxQuant
qpxc convert maxquant \
--evidence-file evidence.txt \
--msms-file msms.txt \
--output-folder ./output
# 2. Protein quantification (DirectLFQ)
qpxc transform quantify \
--feature-path ./output/feature.parquet \
--method directlfq \
-o ./output/proteins.parquet
# 3. Query the data
qpxc query \
--dataset ./output \
--sql "SELECT protein_accession, AVG(intensity) FROM psm GROUP BY protein_accession" \
--output results.tsv
# 4. Validate the dataset
qpxc validate \
--dataset ./output \
--report validation_report.txt
# 5. Inspect dataset information
qpxc info \
--dataset ./output \
--show-schema \
--show-stats
For visualization and statistical analysis, use the Python API:
import qpx
# Load the dataset
dataset = qpx.Dataset("./output")
# Generate visualizations
dataset.psm.plot.distribution(save_path="./plots/psm_distribution.svg")
dataset.feature.plot.intensity_distribution(save_path="./plots/intensity_dist.svg")
# Generate statistics
stats = dataset.psm.stats()
print(stats.summary())
# Save statistical report
stats.save_report("./stats/report.txt")
Getting Help¶
- Each command provides detailed help information using the
--helpparameter - See Format Specification for output file formats and detailed schema information
- Check the Troubleshooting Guide for common issues
- Visit the GitHub Repository to report issues