Troubleshooting¶
Common issues and solutions when using qpx.
Installation Issues¶
ModuleNotFoundError: No module named 'qpx'¶
Problem: Python cannot find the qpx package after installation.
Solutions:
- Ensure you're using the correct Python environment:
- Reinstall in the active environment:
- If using conda, ensure the environment is activated:
Python version incompatibility¶
Problem: Installation fails with Python version errors.
Solution: qpx requires Python 3.10 or higher. Check your version:
If needed, install a compatible Python version:
Missing dependencies¶
Problem: Import errors for packages like venn, pyopenms, or anndata.
Solution: Install the missing package:
Conversion Issues¶
File not found errors¶
Problem: FileNotFoundError when running convert commands.
Solutions:
- Use absolute paths:
- Verify the file exists:
Memory errors with large files¶
Problem: MemoryError or system becomes unresponsive with large datasets.
Solutions:
-
Process files in batches if supported by the command
-
Increase available memory or use a machine with more RAM
-
For DIA-NN reports, use the
--qvalue-thresholdto filter data: -
For MaxQuant, use the
--batch-sizeand--memory-limitoptions:
Invalid file format errors¶
Problem: ValueError or parsing errors when reading input files.
Solutions:
-
Verify the file format matches the expected format for the converter
-
Check for file corruption:
- Ensure the file encoding is UTF-8:
Output Issues¶
Empty output files¶
Problem: Parquet files are created but contain no data.
Solutions:
-
Check if input data passes quality filters (q-value, PEP thresholds)
-
Verify column names match expected format for the software
-
Use
--verboseflag to see processing details:
Missing columns in output¶
Problem: Expected columns are not present in the output Parquet file.
Solutions:
-
Check if the input file contains the required source columns
-
For spectral data, ensure
--spectral-dataflag is used:
- Review the Format Specification for required vs optional fields
Validation Issues¶
Validating a dataset¶
Use the validate command to check a dataset against the canonical schemas:
# Validate all structures in a dataset
qpxc validate --dataset-path ./PXD014414
# Validate a specific structure
qpxc validate --dataset-path ./PXD014414 --structure feature
# Validate a single Parquet file
qpxc validate --file ./data.feature.parquet
Common validation errors¶
Missing required column: A required column is absent from the Parquet file. Check the schema reference for required fields.
Type mismatch: A column has a different Arrow type than the schema expects. This usually means the data was written with an older version of qpx or a different tool.
Null values in non-nullable columns: Required columns should not contain null values. Check your input data and conversion pipeline.
Duplicate primary key: Rows with identical primary key values exist. This may indicate duplicate entries in the source data.
Programmatic validation¶
You can also validate from Python:
import qpx
with qpx.open("./PXD014414") as ds:
results = ds.validate()
for name, result in results.items():
print(result.summary)
for issue in result.issues:
print(f" [{issue.severity}] {issue.message}")
SDRF Issues¶
Sample name mismatches¶
Problem: Samples in data files don't match SDRF sample names.
Solutions:
-
Ensure
source namecolumn in SDRF matches file names (without extension) -
Check for whitespace or case sensitivity issues:
import pandas as pd
sdrf = pd.read_csv('experiment.sdrf.tsv', sep='\t')
print(sdrf['source name'].unique())
Missing factor values¶
Problem: Factor values are not extracted from SDRF.
Solution: Ensure factor columns follow the format factor value[factor_name]:
source name factor value[disease] factor value[organism part]
sample1 healthy liver
sample2 cancer liver
Converting SDRF to QPX metadata¶
Use the dedicated SDRF converter to produce sample.parquet and run.parquet:
Query Issues¶
SQL query errors¶
Problem: SQL queries fail with column or table not found errors.
Solutions:
-
Check available structures in the dataset:
-
Check the schema for a specific structure:
-
Table names in SQL match the QPX structure names:
psm,feature,pg,mz,sample,run,dataset,ontology,provenance.
Memory issues with large queries¶
Solutions:
-
Use
--duckdb-memoryto increase DuckDB memory: -
Use
--limitor SQLLIMITto restrict results -
Export large results directly to Parquet:
Performance Issues¶
Slow processing¶
Solutions:
-
Use SSD storage for input/output files
-
Increase available RAM
-
For large datasets, consider processing samples in parallel
-
Use compressed input files (.gz) to reduce I/O
High memory usage¶
Solutions:
-
Close other applications to free memory
-
Process smaller batches of data
-
Use streaming/chunked processing where available
Getting More Help¶
If your issue isn't listed here:
-
Search existing issues: GitHub Issues
-
Enable verbose logging: Add
--verboseto any command for detailed output -
Create a new issue: Include:
- qpx version (
qpxc --version) - Python version (
python --version) - Operating system
- Complete error message
- Minimal reproducible example