Convert Commands¶
Convert various mass spectrometry data formats to the QPX standard format.
Overview¶
The convert command group provides converters for multiple proteomics software outputs, enabling standardization of data formats for downstream analysis. All commands generate parquet-format output files following the QPX specification.
Available Commands¶
- diann - Convert DIA-NN report to feature format
- diann-pg - Convert DIA-NN report to protein group format
- maxquant-psm - Convert MaxQuant PSM data
- maxquant-feature - Convert MaxQuant feature data
- maxquant-pg - Convert MaxQuant protein groups
- fragpipe - Convert FragPipe PSM data
- quantms-psm - Convert mzTab to PSM format
- quantms-feature - Convert mzTab to feature format
- quantms-pg - Convert mzTab to protein group format
- idxml - Convert single idXML file to PSM format
- idxml-batch - Convert multiple idXML files to merged PSM format
diann¶
Convert DIA-NN report files to QPX feature format.
Description¶
This command takes a DIA-NN report file and converts it to the QPX parquet format. The conversion includes feature data and can optionally split the output into multiple files based on specified fields.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--report-path |
FILE | Yes | - | DIA-NN report file path |
--qvalue-threshold |
FLOAT | Yes | 0.05 | Q-value threshold for filtering |
--mzml-info-folder |
DIRECTORY | Yes | - | mzML info file folder |
--sdrf-path |
FILE | Yes | - | SDRF file path for metadata |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--protein-file |
FILE | No | - | Protein file with specific requirements |
--output-prefix |
TEXT | No | - | Prefix for output files |
--partitions |
TEXT | No | - | Field(s) for splitting files (comma-separated) |
--duckdb-max-memory |
TEXT | No | - | Maximum memory for DuckDB (e.g., '4GB') |
--duckdb-threads |
INTEGER | No | - | Number of threads for DuckDB |
--batch-size |
INTEGER | No | 100 | Number of files to process simultaneously |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert a DIA-NN report with default settings:
qpxc convert diann \
--report-path report.tsv \
--qvalue-threshold 0.05 \
--mzml-info-folder ./mzml_info \
--sdrf-path data.sdrf.tsv \
--output-folder ./output
Advanced Example¶
Convert with file partitioning based on reference_file_name:
qpxc convert diann \
--report-path tests/examples/diann/small/diann_report.tsv \
--qvalue-threshold 0.05 \
--mzml-info-folder tests/examples/diann/small/mzml \
--sdrf-path tests/examples/diann/small/PXD019909-DIA.sdrf.tsv \
--output-folder ./output
Advanced Example with Partitioning¶
Convert with file partitioning based on reference_file_name:
qpxc convert diann \
--report-path tests/examples/diann/full/diann_report.tsv.gz \
--qvalue-threshold 0.01 \
--mzml-info-folder tests/examples/diann/full/mzml \
--sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
--output-folder ./output \
--partitions reference_file_name \
--duckdb-max-memory 8GB \
--duckdb-threads 4 \
--verbose
Output Files¶
- Output:
{output-prefix}-{uuid}.feature.parquet - Format: Parquet file containing feature-level quantification data
- Schema: Conforms to QPX feature specification
Common Issues¶
Issue: Out of memory errors with large files
- Solution: Increase
--duckdb-max-memoryparameter (e.g.,8GB,16GB)
Issue: Slow processing
- Solution: Increase
--duckdb-threadsto utilize more CPU cores
Issue: Missing mzML info files
- Solution: Ensure all mzML info TSV files are in the specified folder with correct naming
Best Practices¶
- Use Q-value threshold of 0.05 or lower for high-confidence results
- Enable partitioning for large datasets to improve memory usage
- Use verbose mode during initial testing to diagnose issues
- Ensure SDRF file correctly matches sample names in DIA-NN report
diann-pg¶
Convert DIA-NN report files to QPX protein group format.
Description¶
This command takes a DIA-NN report file and converts it to the QPX protein group format in parquet format.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--report-path |
FILE | Yes | - | DIA-NN report file path |
--pg-matrix-path |
FILE | Yes | - | DIA-NN protein quantities table file path |
--sdrf-path |
FILE | Yes | - | SDRF file path for metadata |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--output-prefix |
TEXT | No | - | Prefix for output files |
--duckdb-max-memory |
TEXT | No | - | Maximum memory for DuckDB (e.g., '4GB') |
--duckdb-threads |
INTEGER | No | - | Number of threads for DuckDB |
--batch-size |
INTEGER | No | 100 | Number of files to process simultaneously |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert DIA-NN protein groups with default settings:
qpxc convert diann-pg \
--report-path report.tsv \
--pg-matrix-path report.pg_matrix.tsv \
--sdrf-path data.sdrf.tsv \
--output-folder ./output
High-Performance Example¶
qpxc convert diann-pg \
--report-path tests/examples/diann/full/diann_report.tsv.gz \
--pg-matrix-path tests/examples/diann/full/diann_report.pg_matrix.tsv \
--sdrf-path tests/examples/diann/full/PXD036609.sdrf.tsv \
--output-folder ./output \
--duckdb-max-memory 16GB \
--duckdb-threads 8 \
--output-prefix protein_groups \
--verbose
Output Files¶
- Output:
{output-prefix}-{uuid}.pg.parquet - Format: Parquet file containing protein group quantification data
- Schema: Conforms to QPX protein group specification
Best Practices¶
- Ensure both report and pg_matrix files are from the same DIA-NN run
- Use adequate memory allocation for large datasets
- Validate SDRF metadata matches the sample columns in the matrix file
maxquant-psm¶
Convert MaxQuant PSM data from msms.txt to QPX parquet format.
Description¶
This command takes a MaxQuant msms.txt file and converts it to the QPX parquet format for PSM data.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--msms-file |
FILE | Yes | - | MaxQuant msms.txt file |
--output-folder |
DIRECTORY | Yes | - | Output folder |
--batch-size |
INTEGER | No | 100000 | Batch size |
--output-prefix |
TEXT | No | - | Output file prefix |
--spectral-data |
FLAG | No | - | Include spectral data fields |
--n-workers |
INTEGER | No | - | Number of parallel workers |
--memory-limit |
FLOAT | No | - | Memory limit in GB |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert MaxQuant PSM data with default settings:
qpxc convert maxquant-psm \
--msms-file msms.txt \
--output-folder ./output \
--n-workers 8 \
--memory-limit 16
With Spectral Data¶
qpxc convert maxquant-psm \
--msms-file tests/examples/maxquant/maxquant_simple/msms.txt \
--output-folder ./output \
--spectral-data \
--batch-size 500000 \
--output-prefix psm_with_spectra \
--verbose
Output Files¶
- Output:
{output-prefix}-{uuid}.psm.parquet - Format: Parquet file containing PSM-level data
- Schema: Conforms to QPX PSM specification
Best Practices¶
- Adjust
--batch-sizebased on available memory - Use
--spectral-dataflag if downstream analysis requires spectral information - Ensure sufficient disk space for large msms.txt files
maxquant-feature¶
Convert MaxQuant feature data from evidence.txt to QPX parquet format.
Description¶
This command takes a MaxQuant evidence.txt file and converts it to the QPX parquet format for feature data, using metadata from an SDRF file.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--evidence-file |
FILE | Yes | - | MaxQuant evidence.txt file |
--sdrf-file |
FILE | Yes | - | SDRF metadata file |
--output-folder |
DIRECTORY | Yes | - | Output folder |
--protein-file |
FILE | No | - | Protein list file for filtering |
--protein-groups-file |
FILE | Yes | - | MaxQuant proteinGroups.txt file |
--batch-size |
INTEGER | No | 100000 | Batch size |
--output-prefix |
TEXT | No | - | Output file prefix |
--n-workers |
INTEGER | No | - | Number of parallel workers |
--memory-limit |
FLOAT | No | - | Memory limit in GB |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert MaxQuant feature data with default settings:
qpxc convert maxquant-feature \
--evidence-file evidence.txt \
--sdrf-file data.sdrf.tsv \
--protein-groups-file proteinGroups.txt \
--output-folder ./output \
--n-workers 8 \
--memory-limit 16
With Protein Groups Q-value Mapping¶
qpxc convert maxquant-feature \
--evidence-file tests/examples/maxquant/maxquant_full/evidence.txt.gz \
--sdrf-file tests/examples/maxquant/maxquant_full/PXD001819.sdrf.tsv \
--protein-groups-file tests/examples/maxquant/maxquant_full/proteinGroups.txt \
--output-folder ./output \
--batch-size 500000 \
--verbose
Output Files¶
- Output:
{output-prefix}-{uuid}.feature.parquet - Format: Parquet file containing feature-level quantification
- Schema: Conforms to QPX feature specification
Common Issues¶
Issue: Memory errors with compressed evidence files
- Solution: Reduce
--batch-sizeor increase available RAM
Issue: Missing Q-value information
- Solution: Provide
--protein-groups-filefor accurate Q-value mapping
Best Practices¶
- Always provide
--protein-groups-filewhen available for better data quality - Ensure SDRF sample names match MaxQuant experiment names
- Use compressed evidence files (.gz) to save disk space
maxquant-pg¶
Convert MaxQuant protein groups from proteinGroups.txt to QPX format.
Description¶
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--protein-groups-file |
FILE | Yes | - | MaxQuant proteinGroups.txt file |
--sdrf-file |
FILE | Yes | - | SDRF metadata file |
--evidence-file |
FILE | Yes | - | MaxQuant evidence.txt file |
--output-folder |
DIRECTORY | Yes | - | Output folder |
--batch-size |
INTEGER | No | 10000 | Batch size |
--output-prefix |
TEXT | No | - | Output file prefix |
--n-workers |
INTEGER | No | - | Number of parallel workers |
--memory-limit |
FLOAT | No | - | Memory limit in GB |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert MaxQuant protein groups with default settings:
qpxc convert maxquant-pg \
--protein-groups-file proteinGroups.txt \
--sdrf-file data.sdrf.tsv \
--evidence-file evidence.txt \
--output-folder ./output \
--n-workers 8 \
--memory-limit 16
Output Files¶
- Output:
{output-prefix}-{uuid}.pg.parquet - Format: Parquet file containing protein group data
- Schema: Conforms to QPX protein group specification
fragpipe¶
Convert FragPipe PSM data to QPX parquet format.
Description¶
Transforms FragPipe PSM results from psm.tsv format into the standardized QPX parquet format for downstream analysis and integration.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--msms-file |
FILE | Yes | - | the psm.tsv file, this will be used to extract the peptide information |
-o |
DIRECTORY | Yes | - | Folder where the parquet file will be generated |
-b |
INTEGER | No | 1000000 | Read batch size |
--output-prefix |
TEXT | No | - | Prefix of the parquet file needed to generate the file name |
Usage Examples¶
Basic Example¶
Convert FragPipe PSM data with default settings:
qpxc convert fragpipe \
--msms-file psm.tsv \
--output-folder ./output \
--batch-size 1000000 \
--output-prefix fragpipe_psm
With Custom Settings¶
qpxc convert fragpipe \
--msms-file /path/to/psm.tsv \
--output-folder ./output \
--batch-size 500000 \
--output-prefix fragpipe_psm
Output Files¶
- Output:
{output-prefix}-{uuid}.psm.parquet - Format: Parquet file containing PSM data
- Schema: Conforms to QPX PSM specification
quantms-psm¶
Convert mzTab PSM data to QPX parquet format.
Description¶
Converts PSM data from mzTab format to the QPX standardized parquet format. Can work with existing DuckDB indexes or create new ones from mzTab files.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--mztab-path |
FILE | No | - | Input mzTab file path (required if creating a new indexer) |
--database-path |
FILE | No | - | DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided) |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--output-prefix |
TEXT | No | psm |
Prefix for output files (final name will be {prefix}-{uuid}.psm.parquet) |
--spectral-data |
FLAG | No | - | Spectral data fields (optional) |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert PSM data with default settings:
qpxc convert quantms-psm \
--mztab-path /path/to/data.mzTab \
--output-folder ./output \
--verbose
Use Existing Database¶
qpxc convert quantms-psm \
--database-path ./existing_database.duckdb \
--output-folder ./output \
--spectral-data
Output Files¶
- Output:
{output-prefix}-{uuid}.psm.parquet - Format: Parquet file containing PSM data
- Schema: Conforms to QPX PSM specification
Best Practices¶
- Reuse database files when processing multiple outputs from the same mzTab
- Use
--spectral-dataflag when spectral information is needed for downstream analysis
quantms-feature¶
Convert mzTab feature data to QPX parquet format.
Description¶
Converts feature-level quantification data from mzTab format to the QPX standardized format, including MSstats quantification data.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--mztab-path |
FILE | No | - | Input mzTab file path (required if creating a new indexer) |
--database-path |
FILE | No | - | DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided) |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--output-prefix |
TEXT | No | feature |
Prefix for output files (final name will be {prefix}-{uuid}.feature.parquet) |
--sdrf-file |
FILE | Yes | - | SDRF file path |
--msstats-file |
FILE | Yes | - | MSstats input file path |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert feature data with default settings:
qpxc convert quantms-feature \
--mztab-path /path/to/data.mzTab \
--sdrf-file /path/to/metadata.sdrf.tsv \
--msstats-file /path/to/msstats_in.csv \
--output-folder ./output
Output Files¶
- Output:
{output-prefix}-{uuid}.feature.parquet - Format: Parquet file containing feature quantification
- Schema: Conforms to QPX feature specification
quantms-pg¶
Convert mzTab protein group data to QPX parquet format.
Description¶
This command combines protein group definitions from mzTab with complete quantification data from msstats_in.csv. Supports both TMT and LFQ data, with optional TopN and iBAQ intensity calculations.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--mztab-path |
FILE | No | - | Input mzTab file path (required if creating a new indexer) |
--database-path |
FILE | No | - | DuckDB database file path (if exists, will be opened; if not, will be created if mztab-path is provided) |
--msstats-file |
FILE | Yes | - | Input msstats_in.csv file path for quantification |
--sdrf-file |
FILE | Yes | - | SDRF file path |
--output-folder |
DIRECTORY | Yes | - | Output directory for generated files |
--output-prefix |
TEXT | No | pg |
Prefix for output files (final name will be {prefix}-{uuid}.pg.parquet) |
--compute-topn |
FLAG | No | - | Whether to compute TopN intensity |
--compute-ibaq |
FLAG | No | - | Whether to compute iBAQ intensity |
--topn |
INTEGER | No | 3 | Number of peptides to use for TopN intensity |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert protein groups with default settings:
qpxc convert quantms-pg \
--mztab-path /path/to/data.mzTab \
--msstats-file /path/to/msstats_in.csv \
--sdrf-file /path/to/metadata.sdrf.tsv \
--output-folder ./output \
--compute-topn \
--compute-ibaq \
--topn 3
LFQ Data with All Intensities¶
qpxc convert quantms-pg \
--mztab-path tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_openms.mzTab.gz \
--msstats-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf_openms_design_msstats_in.csv.gz \
--sdrf-file tests/examples/quantms/dda-lfq-full/PXD007683-LFQ.sdrf.tsv \
--output-folder ./output \
--compute-topn \
--compute-ibaq \
--topn 3
TMT Data (Skip iBAQ)¶
qpxc convert quantms-pg \
--mztab-path tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_openms.mzTab.gz \
--msstats-file tests/examples/quantms/dda-plex-full/PXD007683TMT.sdrf_openms_design_msstats_in.csv.gz \
--sdrf-file tests/examples/quantms/dda-plex-full/PXD007683-TMT.sdrf.tsv \
--output-folder ./output \
--no-compute-ibaq
Output Files¶
- Output:
{output-prefix}-{uuid}.pg.parquet - Format: Parquet file containing protein group quantification
- Schema: Conforms to QPX protein group specification
Best Practices¶
- Use
--no-compute-ibaqfor TMT/iTRAQ labeled data - Adjust
--topnvalue based on dataset characteristics (typically 3-5) - Enable verbose mode for large datasets to monitor progress
idxml¶
Convert a single OpenMS idXML file to QPX PSM format.
Description¶
Converts PSM data from OpenMS idXML format to the QPX standardized parquet format. Can optionally attach spectral information from corresponding mzML files.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--idxml-file |
TEXT | Yes | - | the IdXML file containing identifications |
--output-folder |
TEXT | Yes | - | Folder where the parquet file will be generated |
--mzml-file |
TEXT | No | - | Optional mzML to attach spectra by scan |
--output-prefix-file |
TEXT | No | - | Prefix of the parquet file needed to generate the file name |
--spectral-data |
FLAG | No | - | Spectral data fields (optional) |
Usage Examples¶
Basic Example¶
Convert a single idXML file:
qpxc convert idxml \
--idxml-file /path/to/data.idXML \
--output-folder ./output
With Spectral Data¶
qpxc convert idxml \
--idxml-file tests/examples/idxml/SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep2_consensus_fdr_pep_luciphor.idXML \
--mzml-file tests/examples/idxml/SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep1.mzML \
--output-folder ./output \
--spectral-data \
--output-prefix-file idxml_psm_with_spectra
Output Files¶
- Output:
{output-prefix-file}-{uuid}.psm.parquet - Format: Parquet file containing PSM data
- Schema: Conforms to QPX PSM specification
idxml-batch¶
Convert multiple OpenMS idXML files to a single merged PSM parquet file.
Description¶
Batch converts multiple idXML files and merges them into a single QPX PSM parquet file. Supports both folder-based and file-list-based input, with flexible mzML matching strategies.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--idxml-folder |
DIRECTORY | No | - | Folder containing IdXML files to convert |
--idxml-files |
TEXT | No | - | Comma-separated list of IdXML file paths |
--output-folder |
TEXT | Yes | - | Folder where the merged parquet file will be generated |
--output-prefix-file |
TEXT | No | merged-psm |
Prefix of the parquet file needed to generate the file name |
--mzml-folder |
DIRECTORY | No | - | Optional folder containing mzML files to attach spectra by scan |
--mzml-files |
TEXT | No | - | Comma-separated list of mzML file paths |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Basic Example¶
Convert multiple idXML files:
qpxc convert idxml-batch \
--idxml-folder ./idxml_files \
--output-folder ./output \
--output-prefix-file batch_psm
Folder-Based Conversion¶
qpxc convert idxml-batch \
--idxml-folder ./idxml_files \
--output-folder ./output \
--output-prefix-file batch_psm
File List with Index Matching¶
qpxc convert idxml-batch \
--idxml-files file1.idXML,file2.idXML,file3.idXML \
--mzml-files file1.mzML,file2.mzML,file3.mzML \
--output-folder ./output \
--verbose
Folder with Basename Matching¶
qpxc convert idxml-batch \
--idxml-folder ./idxml_files \
--mzml-folder ./mzml_files \
--output-folder ./output \
--output-prefix-file merged_with_spectra \
--verbose
Matching Strategies¶
The command supports three mzML matching strategies:
- Folder-Folder: Matches files by basename (filename without extension)
- List-List: Matches files by position in the list (index-based)
- Folder-List: Matches folder files by basename with list files
Output Files¶
- Output:
{output-prefix-file}-{uuid}.psm.parquet - Format: Single merged parquet file containing PSM data from all inputs
- Schema: Conforms to QPX PSM specification
Best Practices¶
- Use verbose mode to monitor matching and conversion progress
- Ensure consistent naming when using basename matching
- Verify file order when using index-based matching
- Check temporary directory has sufficient space for large batches
Related Commands¶
- Transform Commands - Further process converted data
- Visualization Commands - Create plots from converted data
- Statistics Commands - Generate statistics from converted data