Skip to content

Convert Commands

Convert various mass spectrometry data formats to the QPX standard format.

Overview

The convert command group provides converters for multiple proteomics software outputs, enabling standardization of data formats for downstream analysis. All commands generate parquet-format output files following the QPX specification.

Available Commands

  • quantms - Convert QuantMS mzTab output to QPX format
  • diann - Convert DIA-NN report to QPX format
  • maxquant - Convert MaxQuant output to QPX format
  • fragpipe - Convert FragPipe output to QPX format
  • mzidentml - Convert mzIdentML file to PSM format
  • sdrf - Convert SDRF to sample and run parquet files

quantms

Convert QuantMS mzTab output to QPX format.

Description

Reads a QuantMS-produced mzTab file (with optional MSstats quantification) and writes QPX Parquet files for the requested data structures. 

Parameters

ParameterTypeRequiredDefaultDescription
--mztab-path FILE Yes - Input mzTab file path
--sdrf-file FILE Yes - SDRF metadata file path
--msstats-file FILE No - MSstats input file path (required for feature and pg)
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No quantms Prefix for output file names
--structures TEXT No psm,feature,pg Comma-separated list of structures to produce (psm, feature, pg). Default: all.
--database-path FILE No - DuckDB database file path (reuse existing or create new)
--project-accession TEXT No - PRIDE / ProteomeXchange accession (e.g. PXD020192)
--enrich-pride FLAG No - Fetch project metadata from PRIDE API after conversion
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert QuantMS data with default settings:

# Convert everything (PSM + feature + protein groups)
qpxc convert quantms \
    --mztab-path data.mzTab \
    --sdrf-file metadata.sdrf.tsv \
    --msstats-file msstats_in.csv \
    --output-folder ./qpx_output

# Convert only PSMs
qpxc convert quantms \
    --mztab-path data.mzTab \
    --sdrf-file metadata.sdrf.tsv \
    --output-folder ./qpx_output \
    --structures psm

PSM Data Only

qpxc convert quantms \
    --mztab-path tests/examples/quantms/dda-lfq-small/PXD007683-LFQ.sdrf_openms_design_openms.mzTab \
    --sdrf-file tests/examples/quantms/dda-lfq-small/PXD007683-LFQ.sdrf.tsv \
    --output-folder ./output \
    --structures psm \
    --output-prefix quantms_psm

Feature Data with MSstats

qpxc convert quantms \
    --mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
    --msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
    --sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
    --output-folder ./output \
    --structures feature \
    --verbose

All Structures

qpxc convert quantms \
    --mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
    --msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
    --sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
    --output-folder ./output \
    --structures psm,feature,pg \
    --verbose

TMT Data

qpxc convert quantms \
    --mztab-path tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz \
    --msstats-file tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz \
    --sdrf-file tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv \
    --output-folder ./output \
    --structures pg

Output Files

Depending on --structures parameter: - PSM: {output-prefix}-{uuid}.psm.parquet - Feature: {output-prefix}-{uuid}.feature.parquet - Protein Group: {output-prefix}-{uuid}.pg.parquet

All files are in Parquet format and conform to their respective QPX specifications.

Best Practices

  • Use --structures to control which output files are generated
  • Provide --msstats-file when converting feature or pg structures
  • Reuse database files with --database-path when processing the same mzTab multiple times
  • Enable verbose mode for large datasets to monitor progress

diann

Convert DIA-NN report files to QPX format.

Description

Reads a DIA-NN report.tsv file and converts feature-level quantification data into QPX Parquet format. When --pg-matrix-path is provided, also produces protein group output. 

Parameters

ParameterTypeRequiredDefaultDescription
--report-path FILE Yes - DIA-NN report file path
--sdrf-file FILE Yes - SDRF metadata file path
--mzml-info-folder DIRECTORY No - Folder containing mzML info files (optional; scan/mz fields left empty if omitted)
--qvalue-threshold FLOAT No 0.05 Q-value threshold for filtering
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No - Prefix for output file names
--pg-matrix-path FILE No - DIA-NN protein quantities matrix file (enables PG conversion)
--protein-file FILE No - Protein file for filtering
--partitions TEXT No - Field(s) for splitting output files (comma-separated)
--duckdb-max-memory TEXT No - Maximum memory for DuckDB engine (e.g., '4GB')
--duckdb-threads INTEGER No - Number of threads for DuckDB engine
--batch-size INTEGER No 100 Number of files to process simultaneously
--standardized-intensities FLAG No - Calculate standardized intensity metrics for PG output
--project-accession TEXT No - PRIDE / ProteomeXchange accession (e.g. PXD020192)
--enrich-pride FLAG No - Fetch project metadata from PRIDE API after conversion
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example - Feature Data

Convert a DIA-NN report with default settings:

# Feature conversion
qpxc convert diann \
    --report-path report.tsv \
    --sdrf-file data.sdrf.tsv \
    --mzml-info-folder ./mzml_info \
    --output-folder ./qpx_output

# Feature + protein groups
qpxc convert diann \
    --report-path report.tsv \
    --sdrf-file data.sdrf.tsv \
    --mzml-info-folder ./mzml_info \
    --pg-matrix-path report.pg_matrix.tsv \
    --output-folder ./qpx_output \
    --standardized-intensities

Advanced Example with Partitioning

Convert with file partitioning based on run_file_name:

qpxc convert diann \
    --report-path tests/examples/diann/full/diann_report.tsv.gz \
    --qvalue-threshold 0.01 \
    --mzml-info-folder tests/examples/diann/full/mzml \
    --sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
    --output-folder ./output \
    --partitions run_file_name \
    --duckdb-max-memory 8GB \
    --duckdb-threads 4 \
    --verbose

Protein Groups from PG Matrix

Convert DIA-NN protein groups using the pg_matrix file:

qpxc convert diann \
    --report-path tests/examples/diann/full/diann_report.tsv.gz \
    --pg-matrix-path tests/examples/diann/full/diann_report.pg_matrix.tsv \
    --sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
    --output-folder ./output \
    --structures pg \
    --duckdb-max-memory 16GB \
    --duckdb-threads 8 \
    --verbose

Output Files

Depending on --structures parameter: - Feature: {output-prefix}-{uuid}.feature.parquet - Protein Group: {output-prefix}-{uuid}.pg.parquet (requires --pg-matrix-path)

Common Issues

Issue: Out of memory errors with large files

  • Solution: Increase --duckdb-max-memory parameter (e.g., 8GB, 16GB)

Issue: Slow processing

  • Solution: Increase --duckdb-threads to utilize more CPU cores

Issue: Missing mzML info files

  • Solution: Ensure all mzML info TSV files are in the specified folder with correct naming

Best Practices

  • Use Q-value threshold of 0.05 or lower for high-confidence results
  • Enable partitioning for large datasets to improve memory usage
  • Use verbose mode during initial testing to diagnose issues
  • Ensure SDRF file correctly matches sample names in DIA-NN report
  • For protein groups, ensure both report and pg_matrix files are from the same DIA-NN run

maxquant

Convert MaxQuant output to QPX format.

Description

Reads MaxQuant result files (msms.txt, evidence.txt, proteinGroups.txt) and writes corresponding QPX Parquet files. 

Parameters

ParameterTypeRequiredDefaultDescription
--msms-file FILE No - MaxQuant msms.txt file (for PSM conversion)
--evidence-file FILE No - MaxQuant evidence.txt file (for feature conversion)
--protein-groups-file FILE No - MaxQuant proteinGroups.txt file (for PG conversion)
--sdrf-file FILE No - SDRF metadata file (required for feature and PG)
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No - Prefix for output file names
--structures TEXT No - Comma-separated list of structures to produce (psm, feature, pg). Default: all available.
--protein-file FILE No - Protein list file for filtering feature output
--batch-size INTEGER No 100000 Processing batch size
--n-workers INTEGER No - Number of parallel workers
--memory-limit FLOAT No - Memory limit in GB
--spectral-data FLAG No - Include spectral data fields in PSM output
--standardized-intensities FLAG No - Calculate standardized intensity metrics for PG output
--project-accession TEXT No - PRIDE / ProteomeXchange accession (e.g. PXD020192)
--enrich-pride FLAG No - Fetch project metadata from PRIDE API after conversion
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert MaxQuant data with default settings:

# Convert everything
qpxc convert maxquant \
    --msms-file msms.txt \
    --evidence-file evidence.txt \
    --protein-groups-file proteinGroups.txt \
    --sdrf-file metadata.sdrf.tsv \
    --output-folder ./qpx_output

# Convert PSMs only
qpxc convert maxquant \
    --msms-file msms.txt \
    --output-folder ./qpx_output \
    --structures psm

PSM Data Only

qpxc convert maxquant \
    --msms-file tests/examples/maxquant/maxquant_simple/msms.txt \
    --output-folder ./output \
    --structures psm \
    --spectral-data \
    --output-prefix maxquant_psm

Feature Data with Protein Groups

qpxc convert maxquant \
    --evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
    --protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
    --sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
    --output-folder ./output \
    --structures feature \
    --batch-size 500000 \
    --verbose

All Structures

qpxc convert maxquant \
    --msms-file tests/examples/maxquant/maxquant_full/msms.txt.gz \
    --evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
    --protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
    --sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
    --output-folder ./output \
    --structures psm,feature,pg \
    --batch-size 500000 \
    --verbose

Output Files

Depending on --structures parameter: - PSM: {output-prefix}-{uuid}.psm.parquet - Feature: {output-prefix}-{uuid}.feature.parquet - Protein Group: {output-prefix}-{uuid}.pg.parquet

Common Issues

Issue: Memory errors with compressed evidence files

  • Solution: Reduce --batch-size or increase available RAM

Issue: Missing Q-value information

  • Solution: Provide --protein-groups-file for accurate Q-value mapping

Best Practices

  • Use --structures to control which output files are generated
  • Always provide --protein-groups-file when available for better data quality
  • Ensure SDRF sample names match MaxQuant experiment names
  • Use compressed files (.gz) to save disk space
  • Adjust --batch-size based on available memory
  • Use --spectral-data flag if downstream analysis requires spectral information

fragpipe

Convert FragPipe output to QPX format.

Description

Reads FragPipe result files and converts them into QPX Parquet format. Supports psm.tsv, combined_ion.tsv, combined_peptide.tsv, and combined_protein.tsv. 

Parameters

ParameterTypeRequiredDefaultDescription
--psm-file FILE No - FragPipe psm.tsv file
--ion-file FILE No - FragPipe combined_ion.tsv file (for feature conversion)
--peptide-file FILE No - FragPipe combined_peptide.tsv file (for feature conversion)
--pg-file FILE No - FragPipe combined_protein.tsv file (for PG conversion)
--sdrf-file FILE No - SDRF metadata file (for sample/run conversion)
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No - Prefix for output file names
--batch-size INTEGER No 1000000 Processing batch size
--project-accession TEXT No - PRIDE / ProteomeXchange accession (e.g. PXD020192)
--enrich-pride FLAG No - Fetch project metadata from PRIDE API after conversion
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert FragPipe PSM data with default settings:

# Convert PSMs only
qpxc convert fragpipe \
    --psm-file psm.tsv \
    --output-folder ./qpx_output

# Convert features + protein groups
qpxc convert fragpipe \
    --ion-file combined_ion.tsv \
    --pg-file combined_protein.tsv \
    --sdrf-file metadata.sdrf.tsv \
    --output-folder ./qpx_output

With Custom Settings

qpxc convert fragpipe \
    --msms-file /path/to/psm.tsv \
    --output-folder ./output \
    --batch-size 500000 \
    --output-prefix fragpipe_psm

Output Files

  • Output: {output-prefix}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM data
  • Schema: Conforms to QPX PSM specification

mzidentml

Convert mzIdentML (.mzid) files to QPX PSM parquet format.

Description

Supports both standard mzIdentML (1.1/1.2) and mzIdentML 1.3 with cross-linking extensions (inter-peptide, looplinks, noncovalent). Produces a full QPX dataset including PSM, pepmap, provenance, ontology, and dataset metadata. 

Parameters

ParameterTypeRequiredDefaultDescription
--mzid-path FILE Yes - Input mzIdentML (.mzid) file path
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No mzidentml Prefix for output file names
--mgf-path FILE No - Optional MGF file for spectra attachment
--include-spectra FLAG No - Attach mz_array and intensity_array from MGF to PSM records
--project-accession TEXT No - PRIDE / ProteomeXchange accession (e.g. PXD054720)
--enrich-pride FLAG No - Fetch project metadata from PRIDE API after conversion
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert an mzIdentML file with default settings:

# Convert a standard mzIdentML file
qpxc convert mzidentml \
    --mzid-path results.mzid \
    --output-folder ./qpx_output

# Convert an XL-MS mzIdentML 1.3 file
qpxc convert mzidentml \
    --mzid-path crosslinks.mzid \
    --output-folder ./qpx_output \
    --output-prefix xl_experiment

# Convert with spectra from MGF
qpxc convert mzidentml \
    --mzid-path results.mzid \
    --mgf-path spectra.mgf \
    --include-spectra \
    --output-folder ./qpx_output

# Convert with project accession
qpxc convert mzidentml \
    --mzid-path results.mzid \
    --output-folder ./qpx_output \
    --project-accession PXD054720

With Spectral Data from Single mzML

qpxc convert mzidentml \
    --mzid-file /path/to/results.mzid \
    --mzml-file /path/to/spectra.mzML \
    --output-folder ./output \
    --spectral-data \
    --output-prefix psm_with_spectra

With Spectral Data from Multiple mzML Files

When your mzIdentML references multiple mzML files, use the --mzml-folder option:

qpxc convert mzidentml \
    --mzid-file /path/to/results.mzid.gz \
    --mzml-folder /path/to/mzml_files/ \
    --output-folder ./output \
    --spectral-data \
    --output-prefix psm_multi_mzml

The converter automatically matches PSMs to mzML files based on the run_file_name field in the mzIdentML. File matching is case-insensitive and supports both .mzML and .mzML.gz extensions.

Supported Native ID Formats

The converter supports multiple native ID formats for scan number extraction:

Format Vendor/Source Example
scan=XXX Thermo controllerType=0 controllerNumber=1 scan=12345
cycle=XXX Waters/Agilent sample=1 period=1 cycle=1055 experiment=4
index=XXX Generic index=500
spectrum=XXX Various spectrum=999

Output Files

  • Output: {output-prefix}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM-level data
  • Schema: Conforms to QPX PSM specification

Supported mzIdentML Features

  • Compressed files: Supports both .mzid and .mzid.gz formats
  • Modifications: Full support for UNIMOD and custom modifications
  • Scores: Extracts all CV-term scores with higher_better flag annotation
  • Decoy detection: Automatic detection via isDecoy attribute
  • Multi-file support: Handles mzIdentML referencing multiple spectra files

Best Practices

  • Use --mzml-folder when mzIdentML references multiple mzML files
  • Ensure mzML file names match those referenced in mzIdentML (case-insensitive)
  • Use compressed .mzid.gz files to save disk space
  • Enable --spectral-data only when spectral arrays are needed for downstream analysis

Common Issues

Issue: No spectra attached from mzML folder

  • Solution: Verify mzML file names match run_file_name in mzIdentML

Issue: zlib errors when reading mzML.gz files

  • Solution: Decompress mzML.gz files or re-download if corrupted

Issue: Scan numbers not extracted correctly

  • Solution: Check if your native ID format is supported; the converter auto-detects common formats

sdrf

Convert SDRF metadata files to QPX sample and run parquet format.

Description

Reads a Sample and Data Relationship Format (SDRF) file and produces the QPX sample and run data structures as Parquet files. 

Parameters

ParameterTypeRequiredDefaultDescription
--sdrf-file FILE Yes - SDRF metadata file path
--output-folder DIRECTORY Yes - Output directory for generated QPX files
--output-prefix TEXT No sdrf Prefix for output file names
--compression TEXT No zstd Parquet compression codec.
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert SDRF metadata with default settings:

qpxc convert sdrf \
    --sdrf-file metadata.sdrf.tsv \
    --output-folder ./qpx_output

Output Files

  • Sample: {output-prefix}-{uuid}.sample.parquet
  • Run: {output-prefix}-{uuid}.run.parquet
  • Format: Parquet files containing sample and run metadata
  • Schema: Conforms to QPX sample and run specifications

Best Practices

  • Ensure SDRF file follows the PRIDE SDRF specifications
  • Use verbose mode to diagnose parsing issues
  • The converter automatically maps SDRF characteristics to QPX ontology terms