Skip to content

Convert Commands

Convert various mass spectrometry data formats to the QPX standard format.

Overview

The convert command group provides converters for multiple proteomics software outputs, enabling standardization of data formats for downstream analysis. All commands generate parquet-format output files following the QPX specification.

Available Commands


diann

Convert DIA-NN report files to QPX feature format.

Description

This command takes a DIA-NN report file and converts it to the QPX parquet format. The conversion includes feature data and can optionally split the output into multiple files based on specified fields.

Parameters

ParameterTypeRequiredDefaultDescription
--report-path FILE Yes - DIA-NN report file path
--qvalue-threshold FLOAT Yes 0.05 Q-value threshold for filtering
--mzml-info-folder DIRECTORY Yes - mzML info file folder
--sdrf-path FILE Yes - SDRF file path for metadata
--output-folder DIRECTORY Yes - Output directory for generated files
--protein-file FILE No - Protein file with specific requirements
--output-prefix TEXT No - Prefix for output files
--partitions TEXT No - Field(s) for splitting files (comma-separated)
--duckdb-max-memory TEXT No - Maximum memory for DuckDB (e.g., '4GB')
--duckdb-threads INTEGER No - Number of threads for DuckDB
--batch-size INTEGER No 100 Number of files to process simultaneously
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert a DIA-NN report with default settings:

qpxc convert diann \
    --report-path report.tsv \
    --qvalue-threshold 0.05 \
    --mzml-info-folder ./mzml_info \
    --sdrf-path data.sdrf.tsv \
    --output-folder ./output

Advanced Example

Convert with file partitioning based on reference_file_name:

qpxc convert diann \
    --report-path tests/examples/diann/small/diann_report.tsv \
    --qvalue-threshold 0.05 \
    --mzml-info-folder tests/examples/diann/small/mzml \
    --sdrf-path tests/examples/diann/small/PXD019909-DIA.sdrf.tsv \
    --output-folder ./output

Advanced Example with Partitioning

Convert with file partitioning based on reference_file_name:

qpxc convert diann \
    --report-path tests/examples/diann/full/diann_report.tsv.gz \
    --qvalue-threshold 0.01 \
    --mzml-info-folder tests/examples/diann/full/mzml \
    --sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
    --output-folder ./output \
    --partitions reference_file_name \
    --duckdb-max-memory 8GB \
    --duckdb-threads 4 \
    --verbose

Output Files

  • Output: {output-prefix}-{uuid}.feature.parquet
  • Format: Parquet file containing feature-level quantification data
  • Schema: Conforms to QPX feature specification

Common Issues

Issue: Out of memory errors with large files

  • Solution: Increase --duckdb-max-memory parameter (e.g., 8GB, 16GB)

Issue: Slow processing

  • Solution: Increase --duckdb-threads to utilize more CPU cores

Issue: Missing mzML info files

  • Solution: Ensure all mzML info TSV files are in the specified folder with correct naming

Best Practices

  • Use Q-value threshold of 0.05 or lower for high-confidence results
  • Enable partitioning for large datasets to improve memory usage
  • Use verbose mode during initial testing to diagnose issues
  • Ensure SDRF file correctly matches sample names in DIA-NN report

diann-pg

Convert DIA-NN report files to QPX protein group format.

Description

This command takes a DIA-NN report file and converts it to the QPX protein group format in parquet format.

Parameters

ParameterTypeRequiredDefaultDescription
--report-path FILE Yes - DIA-NN report file path
--pg-matrix-path FILE Yes - DIA-NN protein quantities table file path
--sdrf-path FILE Yes - SDRF file path for metadata
--output-folder DIRECTORY Yes - Output directory for generated files
--output-prefix TEXT No - Prefix for output files
--duckdb-max-memory TEXT No - Maximum memory for DuckDB (e.g., '4GB')
--duckdb-threads INTEGER No - Number of threads for DuckDB
--batch-size INTEGER No 100 Number of files to process simultaneously
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert DIA-NN protein groups with default settings:

qpxc convert diann-pg \
    --report-path report.tsv \
    --pg-matrix-path report.pg_matrix.tsv \
    --sdrf-path data.sdrf.tsv \
    --output-folder ./output

High-Performance Example

qpxc convert diann-pg \
    --report-path tests/examples/diann/full/diann_report.tsv.gz \
    --pg-matrix-path tests/examples/diann/full/diann_report.pg_matrix.tsv \
    --sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
    --output-folder ./output \
    --duckdb-max-memory 16GB \
    --duckdb-threads 8 \
    --output-prefix protein_groups \
    --verbose

Output Files

  • Output: {output-prefix}-{uuid}.pg.parquet
  • Format: Parquet file containing protein group quantification data
  • Schema: Conforms to QPX protein group specification

Best Practices

  • Ensure both report and pg_matrix files are from the same DIA-NN run
  • Use adequate memory allocation for large datasets
  • Validate SDRF metadata matches the sample columns in the matrix file

maxquant-psm

Convert MaxQuant PSM data from msms.txt to QPX parquet format.

Description

This command takes a MaxQuant msms.txt file and converts it to the QPX parquet format for PSM data.

Parameters

ParameterTypeRequiredDefaultDescription
--msms-file FILE Yes - MaxQuant msms.txt file
--output-folder DIRECTORY Yes - Output folder
--batch-size INTEGER No 100000 Batch size
--output-prefix TEXT No - Output file prefix
--spectral-data FLAG No - Include spectral data fields
--n-workers INTEGER No - Number of parallel workers
--memory-limit FLOAT No - Memory limit in GB
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert MaxQuant PSM data with default settings:

qpxc convert maxquant-psm \
    --msms-file msms.txt \
    --output-folder ./output \
    --n-workers 8 \
    --memory-limit 16

With Spectral Data

qpxc convert maxquant-psm \
    --msms-file tests/examples/maxquant/maxquant_simple/msms.txt \
    --output-folder ./output \
    --spectral-data \
    --batch-size 500000 \
    --output-prefix psm_with_spectra \
    --verbose

Output Files

  • Output: {output-prefix}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM-level data
  • Schema: Conforms to QPX PSM specification

Best Practices

  • Adjust --batch-size based on available memory
  • Use --spectral-data flag if downstream analysis requires spectral information
  • Ensure sufficient disk space for large msms.txt files

maxquant-feature

Convert MaxQuant feature data from evidence.txt to QPX parquet format.

Description

This command takes a MaxQuant evidence.txt file and converts it to the QPX parquet format for feature data, using metadata from an SDRF file.

Parameters

ParameterTypeRequiredDefaultDescription
--evidence-file FILE Yes - MaxQuant evidence.txt file
--sdrf-file FILE Yes - SDRF metadata file
--output-folder DIRECTORY Yes - Output folder
--protein-file FILE No - Protein list file for filtering
--protein-groups-file FILE Yes - MaxQuant proteinGroups.txt file
--batch-size INTEGER No 100000 Batch size
--output-prefix TEXT No - Output file prefix
--n-workers INTEGER No - Number of parallel workers
--memory-limit FLOAT No - Memory limit in GB
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert MaxQuant feature data with default settings:

qpxc convert maxquant-feature \
    --evidence-file evidence.txt \
    --sdrf-file data.sdrf.tsv \
    --protein-groups-file proteinGroups.txt \
    --output-folder ./output \
    --n-workers 8 \
    --memory-limit 16

With Protein Groups Q-value Mapping

qpxc convert maxquant-feature \
    --evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
    --sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
    --protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
    --output-folder ./output \
    --batch-size 500000 \
    --verbose

Output Files

  • Output: {output-prefix}-{uuid}.feature.parquet
  • Format: Parquet file containing feature-level quantification
  • Schema: Conforms to QPX feature specification

Common Issues

Issue: Memory errors with compressed evidence files

  • Solution: Reduce --batch-size or increase available RAM

Issue: Missing Q-value information

  • Solution: Provide --protein-groups-file for accurate Q-value mapping

Best Practices

  • Always provide --protein-groups-file when available for better data quality
  • Ensure SDRF sample names match MaxQuant experiment names
  • Use compressed evidence files (.gz) to save disk space

maxquant-pg

Convert MaxQuant protein groups from proteinGroups.txt to QPX format.

Description

Parameters

ParameterTypeRequiredDefaultDescription
--protein-groups-file FILE Yes - MaxQuant proteinGroups.txt file
--sdrf-file FILE Yes - SDRF metadata file
--evidence-file FILE Yes - MaxQuant evidence.txt file
--output-folder DIRECTORY Yes - Output folder
--batch-size INTEGER No 10000 Batch size
--output-prefix TEXT No - Output file prefix
--n-workers INTEGER No - Number of parallel workers
--memory-limit FLOAT No - Memory limit in GB
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert MaxQuant protein groups with default settings:

qpxc convert maxquant-pg \
    --protein-groups-file proteinGroups.txt \
    --sdrf-file data.sdrf.tsv \
    --evidence-file evidence.txt \
    --output-folder ./output \
    --n-workers 8 \
    --memory-limit 16

Output Files

  • Output: {output-prefix}-{uuid}.pg.parquet
  • Format: Parquet file containing protein group data
  • Schema: Conforms to QPX protein group specification

fragpipe

Convert FragPipe PSM data to QPX parquet format.

Description

Transforms FragPipe PSM results from psm.tsv format into the standardized QPX parquet format for downstream analysis and integration.

Parameters

ParameterTypeRequiredDefaultDescription
--msms-file FILE Yes - the psm.tsv file, this will be used to extract the peptide information
-o DIRECTORY Yes - Folder where the parquet file will be generated
-b INTEGER No 1000000 Read batch size
--output-prefix TEXT No - Prefix of the parquet file needed to generate the file name

Usage Examples

Basic Example

Convert FragPipe PSM data with default settings:

qpxc convert fragpipe \
    --msms-file psm.tsv \
    --output-folder ./output \
    --batch-size 1000000 \
    --output-prefix fragpipe_psm

With Custom Settings

qpxc convert fragpipe \
    --msms-file /path/to/psm.tsv \
    --output-folder ./output \
    --batch-size 500000 \
    --output-prefix fragpipe_psm

Output Files

  • Output: {output-prefix}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM data
  • Schema: Conforms to QPX PSM specification

quantms-psm

Convert mzTab PSM data to QPX parquet format.

Description

Converts PSM data from mzTab format to the QPX standardized parquet format. Can work with existing DuckDB indexes or create new ones from mzTab files.

Parameters

ParameterTypeRequiredDefaultDescription
--mztab-path FILE No - Input mzTab file path (required if creating a new indexer)
--database-path FILE No - DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided)
--output-folder DIRECTORY Yes - Output directory for generated files
--output-prefix TEXT No psm Prefix for output files (final name will be {prefix}-{uuid}.psm.parquet)
--spectral-data FLAG No - Spectral data fields (optional)
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert PSM data with default settings:

qpxc convert quantms-psm \
    --mztab-path /path/to/data.mzTab \
    --output-folder ./output \
    --verbose

Use Existing Database

qpxc convert quantms-psm \
    --database-path ./existing_database.duckdb \
    --output-folder ./output \
    --spectral-data

Output Files

  • Output: {output-prefix}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM data
  • Schema: Conforms to QPX PSM specification

Best Practices

  • Reuse database files when processing multiple outputs from the same mzTab
  • Use --spectral-data flag when spectral information is needed for downstream analysis

quantms-feature

Convert mzTab feature data to QPX parquet format.

Description

Converts feature-level quantification data from mzTab format to the QPX standardized format, including MSstats quantification data.

Parameters

ParameterTypeRequiredDefaultDescription
--mztab-path FILE No - Input mzTab file path (required if creating a new indexer)
--database-path FILE No - DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided)
--output-folder DIRECTORY Yes - Output directory for generated files
--output-prefix TEXT No feature Prefix for output files (final name will be {prefix}-{uuid}.feature.parquet)
--sdrf-file FILE Yes - SDRF file path
--msstats-file FILE Yes - MSstats input file path
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert feature data with default settings:

qpxc convert quantms-feature \
    --mztab-path /path/to/data.mzTab \
    --sdrf-file /path/to/metadata.sdrf.tsv \
    --msstats-file /path/to/msstats_in.csv \
    --output-folder ./output

Output Files

  • Output: {output-prefix}-{uuid}.feature.parquet
  • Format: Parquet file containing feature quantification
  • Schema: Conforms to QPX feature specification

quantms-pg

Convert mzTab protein group data to QPX parquet format.

Description

This command combines protein group definitions from mzTab with complete quantification data from msstats_in.csv. Supports both TMT and LFQ data, with optional TopN and iBAQ intensity calculations.

Parameters

ParameterTypeRequiredDefaultDescription
--mztab-path FILE No - Input mzTab file path (required if creating a new indexer)
--database-path FILE No - DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided)
--msstats-file FILE Yes - Input msstats_in.csv file path for quantification
--sdrf-file FILE Yes - SDRF file path
--output-folder DIRECTORY Yes - Output directory for generated files
--output-prefix TEXT No pg Prefix for output files (final name will be {prefix}-{uuid}.pg.parquet)
--compute-topn FLAG No - Whether to compute TopN intensity
--compute-ibaq FLAG No - Whether to compute iBAQ intensity
--topn INTEGER No 3 Number of peptides to use for TopN intensity
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert protein groups with default settings:

qpxc convert quantms-pg \
    --mztab-path /path/to/data.mzTab \
    --msstats-file /path/to/msstats_in.csv \
    --sdrf-file /path/to/metadata.sdrf.tsv \
    --output-folder ./output \
    --compute-topn \
    --compute-ibaq \
    --topn 3

LFQ Data with All Intensities

qpxc convert quantms-pg \
    --mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
    --msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
    --sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
    --output-folder ./output \
    --compute-topn \
    --compute-ibaq \
    --topn 3

TMT Data (Skip iBAQ)

qpxc convert quantms-pg \
    --mztab-path tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz \
    --msstats-file tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz \
    --sdrf-file tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv \
    --output-folder ./output \
    --no-compute-ibaq

Output Files

  • Output: {output-prefix}-{uuid}.pg.parquet
  • Format: Parquet file containing protein group quantification
  • Schema: Conforms to QPX protein group specification

Best Practices

  • Use --no-compute-ibaq for TMT/iTRAQ labeled data
  • Adjust --topn value based on dataset characteristics (typically 3-5)
  • Enable verbose mode for large datasets to monitor progress

idxml

Convert a single OpenMS idXML file to QPX PSM format.

Description

Converts PSM data from OpenMS idXML format to the QPX standardized parquet format. Can optionally attach spectral information from corresponding mzML files.

Parameters

ParameterTypeRequiredDefaultDescription
--idxml-file TEXT Yes - the IdXML file containing identifications
--output-folder TEXT Yes - Folder where the parquet file will be generated
--mzml-file TEXT No - Optional mzML to attach spectra by scan
--output-prefix-file TEXT No - Prefix of the parquet file needed to generate the file name
--spectral-data FLAG No - Spectral data fields (optional)

Usage Examples

Basic Example

Convert a single idXML file:

qpxc convert idxml \
    --idxml-file /path/to/data.idXML \
    --output-folder ./output

With Spectral Data

qpxc convert idxml \
    --idxml-file tests/examples/idxml/SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep2_consensus_fdr_pep_luciphor.idXML \
    --mzml-file tests/examples/idxml/SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep1.mzML \
    --output-folder ./output \
    --spectral-data \
    --output-prefix-file idxml_psm_with_spectra

Output Files

  • Output: {output-prefix-file}-{uuid}.psm.parquet
  • Format: Parquet file containing PSM data
  • Schema: Conforms to QPX PSM specification

idxml-batch

Convert multiple OpenMS idXML files to a single merged PSM parquet file.

Description

Batch converts multiple idXML files and merges them into a single QPX PSM parquet file. Supports both folder-based and file-list-based input, with flexible mzML matching strategies.

Parameters

ParameterTypeRequiredDefaultDescription
--idxml-folder DIRECTORY No - Folder containing IdXML files to convert
--idxml-files TEXT No - Comma-separated list of IdXML file paths
--output-folder TEXT Yes - Folder where the merged parquet file will be generated
--output-prefix-file TEXT No merged-psm Prefix of the parquet file needed to generate the file name
--mzml-folder DIRECTORY No - Optional folder containing mzML files to attach spectra by scan
--mzml-files TEXT No - Comma-separated list of mzML file paths
--verbose FLAG No - Enable verbose logging

Usage Examples

Basic Example

Convert multiple idXML files:

qpxc convert idxml-batch \
    --idxml-folder ./idxml_files \
    --output-folder ./output \
    --output-prefix-file batch_psm

Folder-Based Conversion

qpxc convert idxml-batch \
    --idxml-folder ./idxml_files \
    --output-folder ./output \
    --output-prefix-file batch_psm

File List with Index Matching

qpxc convert idxml-batch \
    --idxml-files file1.idXML,file2.idXML,file3.idXML \
    --mzml-files file1.mzML,file2.mzML,file3.mzML \
    --output-folder ./output \
    --verbose

Folder with Basename Matching

qpxc convert idxml-batch \
    --idxml-folder ./idxml_files \
    --mzml-folder ./mzml_files \
    --output-folder ./output \
    --output-prefix-file merged_with_spectra \
    --verbose

Matching Strategies

The command supports three mzML matching strategies:

  1. Folder-Folder: Matches files by basename (filename without extension)
  2. List-List: Matches files by position in the list (index-based)
  3. Folder-List: Matches folder files by basename with list files

Output Files

  • Output: {output-prefix-file}-{uuid}.psm.parquet
  • Format: Single merged parquet file containing PSM data from all inputs
  • Schema: Conforms to QPX PSM specification

Best Practices

  • Use verbose mode to monitor matching and conversion progress
  • Ensure consistent naming when using basename matching
  • Verify file order when using index-based matching
  • Check temporary directory has sufficient space for large batches