CLI Reference¶
The qpx command-line tool provides a comprehensive set of commands for converting, transforming, querying, and validating mass spectrometry proteomics data.
Overview¶
The qpx CLI is organized into five main command groups:
Convert Commands¶
Convert various mass spectrometry data formats to the QPX standard format:
- QuantMS Conversion: Convert QuantMS mzTab format to QPX data files
- DIA-NN Conversion: Convert DIA-NN reports to feature and protein group formats
- MaxQuant Conversion: Convert MaxQuant PSM, feature, and protein group data
- FragPipe Conversion: Convert FragPipe PSM data to QPX format
- mzIdentML Conversion: Convert mzIdentML format PSM data
- SDRF Conversion: Convert SDRF metadata to QPX sample/run format
Transform Commands¶
Transform and process data within the QPX ecosystem:
- Gene Mapping: Map gene information from FASTA to protein data
- Protein Quantification: Compute protein-level quantification via mokume (DirectLFQ, MaxLFQ, iBAQ, TopN, etc.)
Query Commands¶
Query QPX datasets using SQL or structured commands:
- Execute SQL queries on QPX Parquet files
- Filter and aggregate data
- Join multiple datasets
- Export query results
Info Commands¶
Inspect QPX datasets and metadata:
- Display dataset schema and statistics
- Show sample and run information
- List available columns and data types
- Validate data integrity
Validate Commands¶
Validate QPX data against schemas:
- Check data conformance to QPX specifications
- Validate required fields and data types
- Verify referential integrity
- Generate validation reports
Python API¶
Core data operations are available through the Python API:
import qpx
# Load a dataset
with qpx.open_dataset("path/to/dataset") as ds:
# Access data views
psm_df = ds.psm.to_df()
feature_df = ds.feature.to_df()
# Filter and query
targets = ds.psm.targets_only().to_df()
count = ds.feature.count()
# Validate against canonical schema
results = ds.validate()
for name, result in results.items():
print(f"{name}: {result.summary}")
for issue in result.issues:
print(f" [{issue.severity}] {issue.message}")
# Run SQL queries
df = ds.sql("SELECT anchor_protein, COUNT(*) AS n FROM feature GROUP BY 1")
Quick Start¶
Installation¶
Basic Usage¶
View all available commands:
View help for a specific command group:
View detailed help for a specific command:
Common Options¶
Most commands support the following common options:
--verbose: Enable verbose logging for debugging--output-folder: Specify the output directory--output-prefix: Specify the output file prefix
Example Data Processing Workflow¶
A typical data processing workflow:
# 1. Convert raw data from MaxQuant
qpxc convert maxquant \
--evidence-file evidence.txt \
--msms-file msms.txt \
--output-folder ./output
# 2. Protein quantification (DirectLFQ)
qpxc transform quantify \
--feature-path ./output/feature.parquet \
--method directlfq \
-o ./output/proteins.parquet
# 3. Query the data
qpxc query sql \
--dataset-path ./output \
--sql "SELECT anchor_protein, AVG(intensity) FROM feature GROUP BY anchor_protein" \
--output results.csv
# 4. Validate the dataset
qpxc validate \
--dataset-path ./output
# 5. Inspect dataset information
qpxc info --dataset-path ./output
qpxc info schema --dataset-path ./output --structure feature
For further analysis, use the Python API:
import qpx
# Load and inspect the dataset
with qpx.open_dataset("./output") as ds:
# Validate the dataset
results = ds.validate()
for name, result in results.items():
print(f"{name}: {result.summary}")
# Query with SQL
top_proteins = ds.sql(
"SELECT anchor_protein, COUNT(*) AS n "
"FROM feature GROUP BY 1 ORDER BY n DESC LIMIT 10"
)
print(top_proteins)
# Access data views as DataFrames
feature_df = ds.feature.to_df()
print(f"Features: {len(feature_df)} rows")
Getting Help¶
- Each command provides detailed help information using the
--helpparameter - See Format Specification for output file formats and detailed schema information
- Check the Troubleshooting Guide for common issues
- Visit the GitHub Repository to report issues