Skip to content

Query Commands

Query and inspect QPX datasets using SQL, filters, or quick previews.

Overview

The query command group provides tools for querying QPX datasets. You can run arbitrary SQL queries, filter data structures by conditions, or quickly preview the first rows of any structure.

Available Commands

  • sql - Run arbitrary SQL queries against a dataset
  • filter - Filter a data structure by a SQL condition
  • head - Show the first N rows of a structure

sql

Run an arbitrary SQL query against a QPX dataset.

Description

Opens the dataset, registers all (or selected) data structures as DuckDB tables, then executes the provided SQL. Table names match the QPX structure names: psm, feature, pg, mz, sample, run, etc. 

Parameters

ParameterTypeRequiredDefaultDescription
--dataset-path DIRECTORY Yes - Path to a QPX dataset directory
--sql TEXT Yes - SQL query to execute against the dataset
--output FILE No - Output file path (CSV). If not specified, results go to stdout.
--output-format CHOICE No csv Output format for results
--structures TEXT No all found Comma-separated list of structures to load (default: all found)
--duckdb-memory TEXT No 16GB DuckDB memory limit (e.g., '16GB')
--duckdb-threads INTEGER No 4 DuckDB thread count
--verbose FLAG No - Enable verbose logging

Usage Examples

Run SQL queries against QPX datasets:

# Count features per run
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT run_file_name, COUNT(*) AS n FROM feature GROUP BY 1"

# Join feature with run, export to CSV
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT f.sequence, r.run_file_name FROM feature f JOIN run r USING (run_file_name)" \
    --output results.csv

# Query specific structures only
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT COUNT(*) FROM psm" \
    --structures psm

filter

Filter a QPX data structure by a SQL condition.

Description

Opens the dataset, applies the filter condition to the specified structure, and returns the matching rows. 

Parameters

ParameterTypeRequiredDefaultDescription
--dataset-path DIRECTORY Yes - Path to a QPX dataset directory
--structure CHOICE Yes - QPX data structure to filter (e.g., feature, psm, pg)
--condition TEXT Yes - SQL WHERE condition (e.g., 'charge > 2 AND global_qvalue < 0.01')
--columns TEXT No all Comma-separated list of columns to include in output (default: all)
--limit INTEGER No - Maximum number of rows to return
--output FILE No - Output file path. If not specified, results go to stdout.
--output-format CHOICE No csv Output format for results
--verbose FLAG No - Enable verbose logging

Usage Examples

Filter data structures by conditions:

# Filter features by charge and q-value
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure feature \
    --condition "charge > 2 AND global_qvalue < 0.01"

# Filter PSMs, select specific columns
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure psm \
    --condition "is_decoy = false" \
    --columns "sequence,charge,score" \
    --limit 100

# Export filtered protein groups to parquet
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure pg \
    --condition "global_qvalue < 0.01" \
    --output filtered_pg.parquet \
    --output-format parquet

Show the first N rows of a QPX data structure.

Description

A quick way to peek at the contents of any QPX data structure within a dataset. 

Parameters

ParameterTypeRequiredDefaultDescription
--dataset-path DIRECTORY Yes - Path to a QPX dataset directory
--structure CHOICE Yes - QPX data structure to inspect (e.g., feature, psm, pg)
-n INTEGER No 10 Number of rows to display
--columns TEXT No all Comma-separated list of columns to include (default: all)
--output-format CHOICE No table Output format for results
--verbose FLAG No - Enable verbose logging

Usage Examples

Quick preview of data structure contents:

# Show first 10 rows of features
qpxc query head \
    --dataset-path ./PXD014414 \
    --structure feature

# Show 5 PSM rows with specific columns
qpxc query head \
    --dataset-path ./PXD014414 \
    --structure psm \
    -n 5 \
    --columns "sequence,charge,score"

Best Practices

  • Use --duckdb-threads and --duckdb-memory to control resource usage for large datasets
  • Use query head to preview data before running complex SQL queries
  • Use query filter for simple conditions; use query sql for joins and aggregations
  • Export large results to Parquet format with --output-format parquet to preserve types