Convert Commands¶
Convert various mass spectrometry data formats to the QPX standard format.
Overview¶
The convert command group provides converters for multiple proteomics software outputs, enabling standardization of data formats for downstream analysis. All commands generate parquet-format output files following the QPX specification.
Available Commands¶
- quantms - Convert QuantMS mzTab output to QPX format
- diann - Convert DIA-NN report to QPX format
- maxquant - Convert MaxQuant output to QPX format
- fragpipe - Convert FragPipe output to QPX format
- mzidentml - Convert mzIdentML file to PSM format
- sdrf - Convert SDRF to sample and run parquet files
quantms¶
Convert QuantMS mzTab output to QPX format.
Description¶
Reads a QuantMS-produced mzTab file (with optional MSstats quantification) and writes QPX Parquet files for the requested data structures.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--mztab-path |
FILE | Yes | - | Input mzTab file path |
--sdrf-file |
FILE | Yes | - | SDRF metadata file path |
--msstats-file |
FILE | No | - | MSstats input file path (required for feature and pg) |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | quantms |
Prefix for output file names |
--structures |
TEXT | No | psm,feature,pg |
Comma-separated list of structures to produce (psm, feature, pg). Default: all. |
--database-path |
FILE | No | - | DuckDB database file path (reuse existing or create new) |
--project-accession |
TEXT | No | - | PRIDE / ProteomeXchange accession (e.g. PXD020192) |
--enrich-pride |
FLAG | No | - | Fetch project metadata from PRIDE API after conversion |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert QuantMS data with default settings:
# Convert everything (PSM + feature + protein groups)
qpxc convert quantms \
--mztab-path data.mzTab \
--sdrf-file metadata.sdrf.tsv \
--msstats-file msstats_in.csv \
--output-folder ./qpx_output
# Convert only PSMs
qpxc convert quantms \
--mztab-path data.mzTab \
--sdrf-file metadata.sdrf.tsv \
--output-folder ./qpx_output \
--structures psm
PSM Data Only¶
qpxc convert quantms \
--mztab-path tests/examples/quantms/dda-lfq-small/PXD007683-LFQ.sdrf_openms_design_openms.mzTab \
--sdrf-file tests/examples/quantms/dda-lfq-small/PXD007683-LFQ.sdrf.tsv \
--output-folder ./output \
--structures psm \
--output-prefix quantms_psm
Feature Data with MSstats¶
qpxc convert quantms \
--mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
--msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
--sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
--output-folder ./output \
--structures feature \
--verbose
All Structures¶
qpxc convert quantms \
--mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
--msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
--sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
--output-folder ./output \
--structures psm,feature,pg \
--verbose
TMT Data¶
qpxc convert quantms \
--mztab-path tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz \
--msstats-file tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz \
--sdrf-file tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv \
--output-folder ./output \
--structures pg
Output Files¶
Depending on --structures parameter:
- PSM: {output-prefix}-{uuid}.psm.parquet
- Feature: {output-prefix}-{uuid}.feature.parquet
- Protein Group: {output-prefix}-{uuid}.pg.parquet
All files are in Parquet format and conform to their respective QPX specifications.
Best Practices¶
- Use
--structuresto control which output files are generated - Provide
--msstats-filewhen converting feature or pg structures - Reuse database files with
--database-pathwhen processing the same mzTab multiple times - Enable verbose mode for large datasets to monitor progress
diann¶
Convert DIA-NN report files to QPX format.
Description¶
Reads a DIA-NN report.tsv file and converts feature-level quantification data into QPX Parquet format. When --pg-matrix-path is provided, also produces protein group output.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--report-path |
FILE | Yes | - | DIA-NN report file path |
--sdrf-file |
FILE | Yes | - | SDRF metadata file path |
--mzml-info-folder |
DIRECTORY | No | - | Folder containing mzML info files (optional; scan/mz fields left empty if omitted) |
--qvalue-threshold |
FLOAT | No | 0.05 | Q-value threshold for filtering |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | - | Prefix for output file names |
--pg-matrix-path |
FILE | No | - | DIA-NN protein quantities matrix file (enables PG conversion) |
--protein-file |
FILE | No | - | Protein file for filtering |
--partitions |
TEXT | No | - | Field(s) for splitting output files (comma-separated) |
--duckdb-max-memory |
TEXT | No | - | Maximum memory for DuckDB engine (e.g., '4GB') |
--duckdb-threads |
INTEGER | No | - | Number of threads for DuckDB engine |
--batch-size |
INTEGER | No | 100 | Number of files to process simultaneously |
--standardized-intensities |
FLAG | No | - | Calculate standardized intensity metrics for PG output |
--project-accession |
TEXT | No | - | PRIDE / ProteomeXchange accession (e.g. PXD020192) |
--enrich-pride |
FLAG | No | - | Fetch project metadata from PRIDE API after conversion |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example - Feature Data¶
Convert a DIA-NN report with default settings:
# Feature conversion
qpxc convert diann \
--report-path report.tsv \
--sdrf-file data.sdrf.tsv \
--mzml-info-folder ./mzml_info \
--output-folder ./qpx_output
# Feature + protein groups
qpxc convert diann \
--report-path report.tsv \
--sdrf-file data.sdrf.tsv \
--mzml-info-folder ./mzml_info \
--pg-matrix-path report.pg_matrix.tsv \
--output-folder ./qpx_output \
--standardized-intensities
Advanced Example with Partitioning¶
Convert with file partitioning based on run_file_name:
qpxc convert diann \
--report-path tests/examples/diann/full/diann_report.tsv.gz \
--qvalue-threshold 0.01 \
--mzml-info-folder tests/examples/diann/full/mzml \
--sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
--output-folder ./output \
--partitions run_file_name \
--duckdb-max-memory 8GB \
--duckdb-threads 4 \
--verbose
Protein Groups from PG Matrix¶
Convert DIA-NN protein groups using the pg_matrix file:
qpxc convert diann \
--report-path tests/examples/diann/full/diann_report.tsv.gz \
--pg-matrix-path tests/examples/diann/full/diann_report.pg_matrix.tsv \
--sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
--output-folder ./output \
--structures pg \
--duckdb-max-memory 16GB \
--duckdb-threads 8 \
--verbose
Output Files¶
Depending on --structures parameter:
- Feature: {output-prefix}-{uuid}.feature.parquet
- Protein Group: {output-prefix}-{uuid}.pg.parquet (requires --pg-matrix-path)
Common Issues¶
Issue: Out of memory errors with large files
- Solution: Increase
--duckdb-max-memoryparameter (e.g.,8GB,16GB)
Issue: Slow processing
- Solution: Increase
--duckdb-threadsto utilize more CPU cores
Issue: Missing mzML info files
- Solution: Ensure all mzML info TSV files are in the specified folder with correct naming
Best Practices¶
- Use Q-value threshold of 0.05 or lower for high-confidence results
- Enable partitioning for large datasets to improve memory usage
- Use verbose mode during initial testing to diagnose issues
- Ensure SDRF file correctly matches sample names in DIA-NN report
- For protein groups, ensure both report and pg_matrix files are from the same DIA-NN run
maxquant¶
Convert MaxQuant output to QPX format.
Description¶
Reads MaxQuant result files (msms.txt, evidence.txt, proteinGroups.txt) and writes corresponding QPX Parquet files.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--msms-file |
FILE | No | - | MaxQuant msms.txt file (for PSM conversion) |
--evidence-file |
FILE | No | - | MaxQuant evidence.txt file (for feature conversion) |
--protein-groups-file |
FILE | No | - | MaxQuant proteinGroups.txt file (for PG conversion) |
--sdrf-file |
FILE | No | - | SDRF metadata file (required for feature and PG) |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | - | Prefix for output file names |
--structures |
TEXT | No | - | Comma-separated list of structures to produce (psm, feature, pg). Default: all available. |
--protein-file |
FILE | No | - | Protein list file for filtering feature output |
--batch-size |
INTEGER | No | 100000 | Processing batch size |
--n-workers |
INTEGER | No | - | Number of parallel workers |
--memory-limit |
FLOAT | No | - | Memory limit in GB |
--spectral-data |
FLAG | No | - | Include spectral data fields in PSM output |
--standardized-intensities |
FLAG | No | - | Calculate standardized intensity metrics for PG output |
--project-accession |
TEXT | No | - | PRIDE / ProteomeXchange accession (e.g. PXD020192) |
--enrich-pride |
FLAG | No | - | Fetch project metadata from PRIDE API after conversion |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert MaxQuant data with default settings:
# Convert everything
qpxc convert maxquant \
--msms-file msms.txt \
--evidence-file evidence.txt \
--protein-groups-file proteinGroups.txt \
--sdrf-file metadata.sdrf.tsv \
--output-folder ./qpx_output
# Convert PSMs only
qpxc convert maxquant \
--msms-file msms.txt \
--output-folder ./qpx_output \
--structures psm
PSM Data Only¶
qpxc convert maxquant \
--msms-file tests/examples/maxquant/maxquant_simple/msms.txt \
--output-folder ./output \
--structures psm \
--spectral-data \
--output-prefix maxquant_psm
Feature Data with Protein Groups¶
qpxc convert maxquant \
--evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
--protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
--sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
--output-folder ./output \
--structures feature \
--batch-size 500000 \
--verbose
All Structures¶
qpxc convert maxquant \
--msms-file tests/examples/maxquant/maxquant_full/msms.txt.gz \
--evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
--protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
--sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
--output-folder ./output \
--structures psm,feature,pg \
--batch-size 500000 \
--verbose
Output Files¶
Depending on --structures parameter:
- PSM: {output-prefix}-{uuid}.psm.parquet
- Feature: {output-prefix}-{uuid}.feature.parquet
- Protein Group: {output-prefix}-{uuid}.pg.parquet
Common Issues¶
Issue: Memory errors with compressed evidence files
- Solution: Reduce
--batch-sizeor increase available RAM
Issue: Missing Q-value information
- Solution: Provide
--protein-groups-filefor accurate Q-value mapping
Best Practices¶
- Use
--structuresto control which output files are generated - Always provide
--protein-groups-filewhen available for better data quality - Ensure SDRF sample names match MaxQuant experiment names
- Use compressed files (.gz) to save disk space
- Adjust
--batch-sizebased on available memory - Use
--spectral-dataflag if downstream analysis requires spectral information
fragpipe¶
Convert FragPipe output to QPX format.
Description¶
Reads FragPipe result files and converts them into QPX Parquet format. Supports psm.tsv, combined_ion.tsv, combined_peptide.tsv, and combined_protein.tsv.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--psm-file |
FILE | No | - | FragPipe psm.tsv file |
--ion-file |
FILE | No | - | FragPipe combined_ion.tsv file (for feature conversion) |
--peptide-file |
FILE | No | - | FragPipe combined_peptide.tsv file (for feature conversion) |
--pg-file |
FILE | No | - | FragPipe combined_protein.tsv file (for PG conversion) |
--sdrf-file |
FILE | No | - | SDRF metadata file (for sample/run conversion) |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | - | Prefix for output file names |
--batch-size |
INTEGER | No | 1000000 | Processing batch size |
--project-accession |
TEXT | No | - | PRIDE / ProteomeXchange accession (e.g. PXD020192) |
--enrich-pride |
FLAG | No | - | Fetch project metadata from PRIDE API after conversion |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert FragPipe PSM data with default settings:
# Convert PSMs only
qpxc convert fragpipe \
--psm-file psm.tsv \
--output-folder ./qpx_output
# Convert features + protein groups
qpxc convert fragpipe \
--ion-file combined_ion.tsv \
--pg-file combined_protein.tsv \
--sdrf-file metadata.sdrf.tsv \
--output-folder ./qpx_output
With Custom Settings¶
qpxc convert fragpipe \
--msms-file /path/to/psm.tsv \
--output-folder ./output \
--batch-size 500000 \
--output-prefix fragpipe_psm
Output Files¶
- Output:
{output-prefix}-{uuid}.psm.parquet - Format: Parquet file containing PSM data
- Schema: Conforms to QPX PSM specification
mzidentml¶
Convert mzIdentML (.mzid) files to QPX PSM parquet format.
Description¶
Supports both standard mzIdentML (1.1/1.2) and mzIdentML 1.3 with cross-linking extensions (inter-peptide, looplinks, noncovalent). Produces a full QPX dataset including PSM, pepmap, provenance, ontology, and dataset metadata.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--mzid-path |
FILE | Yes | - | Input mzIdentML (.mzid) file path |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | mzidentml |
Prefix for output file names |
--mgf-path |
FILE | No | - | Optional MGF file for spectra attachment |
--include-spectra |
FLAG | No | - | Attach mz_array and intensity_array from MGF to PSM records |
--project-accession |
TEXT | No | - | PRIDE / ProteomeXchange accession (e.g. PXD054720) |
--enrich-pride |
FLAG | No | - | Fetch project metadata from PRIDE API after conversion |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert an mzIdentML file with default settings:
# Convert a standard mzIdentML file
qpxc convert mzidentml \
--mzid-path results.mzid \
--output-folder ./qpx_output
# Convert an XL-MS mzIdentML 1.3 file
qpxc convert mzidentml \
--mzid-path crosslinks.mzid \
--output-folder ./qpx_output \
--output-prefix xl_experiment
# Convert with spectra from MGF
qpxc convert mzidentml \
--mzid-path results.mzid \
--mgf-path spectra.mgf \
--include-spectra \
--output-folder ./qpx_output
# Convert with project accession
qpxc convert mzidentml \
--mzid-path results.mzid \
--output-folder ./qpx_output \
--project-accession PXD054720
With Spectral Data from Single mzML¶
qpxc convert mzidentml \
--mzid-file /path/to/results.mzid \
--mzml-file /path/to/spectra.mzML \
--output-folder ./output \
--spectral-data \
--output-prefix psm_with_spectra
With Spectral Data from Multiple mzML Files¶
When your mzIdentML references multiple mzML files, use the --mzml-folder option:
qpxc convert mzidentml \
--mzid-file /path/to/results.mzid.gz \
--mzml-folder /path/to/mzml_files/ \
--output-folder ./output \
--spectral-data \
--output-prefix psm_multi_mzml
The converter automatically matches PSMs to mzML files based on the run_file_name field in the mzIdentML. File matching is case-insensitive and supports both .mzML and .mzML.gz extensions.
Supported Native ID Formats¶
The converter supports multiple native ID formats for scan number extraction:
| Format | Vendor/Source | Example |
|---|---|---|
scan=XXX |
Thermo | controllerType=0 controllerNumber=1 scan=12345 |
cycle=XXX |
Waters/Agilent | sample=1 period=1 cycle=1055 experiment=4 |
index=XXX |
Generic | index=500 |
spectrum=XXX |
Various | spectrum=999 |
Output Files¶
- Output:
{output-prefix}-{uuid}.psm.parquet - Format: Parquet file containing PSM-level data
- Schema: Conforms to QPX PSM specification
Supported mzIdentML Features¶
- Compressed files: Supports both
.mzidand.mzid.gzformats - Modifications: Full support for UNIMOD and custom modifications
- Scores: Extracts all CV-term scores with
higher_betterflag annotation - Decoy detection: Automatic detection via
isDecoyattribute - Multi-file support: Handles mzIdentML referencing multiple spectra files
Best Practices¶
- Use
--mzml-folderwhen mzIdentML references multiple mzML files - Ensure mzML file names match those referenced in mzIdentML (case-insensitive)
- Use compressed
.mzid.gzfiles to save disk space - Enable
--spectral-dataonly when spectral arrays are needed for downstream analysis
Common Issues¶
Issue: No spectra attached from mzML folder
- Solution: Verify mzML file names match
run_file_namein mzIdentML
Issue: zlib errors when reading mzML.gz files
- Solution: Decompress mzML.gz files or re-download if corrupted
Issue: Scan numbers not extracted correctly
- Solution: Check if your native ID format is supported; the converter auto-detects common formats
sdrf¶
Convert SDRF metadata files to QPX sample and run parquet format.
Description¶
Reads a Sample and Data Relationship Format (SDRF) file and produces the QPX sample and run data structures as Parquet files.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--sdrf-file |
FILE | Yes | - | SDRF metadata file path |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated QPX files |
--output-prefix |
TEXT | No | sdrf |
Prefix for output file names |
--compression |
TEXT | No | zstd |
Parquet compression codec. |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert SDRF metadata with default settings:
qpxc convert sdrf \
--sdrf-file metadata.sdrf.tsv \
--output-folder ./qpx_output
Output Files¶
- Sample:
{output-prefix}-{uuid}.sample.parquet - Run:
{output-prefix}-{uuid}.run.parquet - Format: Parquet files containing sample and run metadata
- Schema: Conforms to QPX sample and run specifications
Best Practices¶
- Ensure SDRF file follows the PRIDE SDRF specifications
- Use verbose mode to diagnose parsing issues
- The converter automatically maps SDRF characteristics to QPX ontology terms
Related Commands¶
- Transform Commands - Further process converted data
- Visualization Commands - Create plots from converted data
- Statistics Commands - Generate statistics from converted data