Query Commands¶

Query and inspect QPX datasets using SQL, filters, or quick previews.

Overview¶

The query command group provides tools for querying QPX datasets. You can run arbitrary SQL queries, filter data structures by conditions, or quickly preview the first rows of any structure.

Available Commands¶

sql - Run arbitrary SQL queries against a dataset
filter - Filter a data structure by a SQL condition
head - Show the first N rows of a structure

sql¶

Run an arbitrary SQL query against a QPX dataset.

Description¶

Opens the dataset, registers all (or selected) data structures as DuckDB tables, then executes the provided SQL. Table names match the QPX structure names: psm, feature, pg, mz, sample, run, etc.

Parameters¶

Parameter	Type	Required	Default	Description
`--dataset-path`	DIRECTORY	Yes	-	Path to a QPX dataset directory
`--sql`	TEXT	Yes	-	SQL query to execute against the dataset
`--output`	FILE	No	-	Output file path (CSV). If not specified, results go to stdout.
`--output-format`	CHOICE	No	`csv`	Output format for results
`--structures`	TEXT	No	all found	Comma-separated list of structures to load (default: all found)
`--duckdb-memory`	TEXT	No	`16GB`	DuckDB memory limit (e.g., '16GB')
`--duckdb-threads`	INTEGER	No	4	DuckDB thread count
`--verbose`	FLAG	No	-	Enable verbose logging

Usage Examples¶

Run SQL queries against QPX datasets:

# Count features per run
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT run_file_name, COUNT(*) AS n FROM feature GROUP BY 1"

# Join feature with run, export to CSV
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT f.sequence, r.run_file_name FROM feature f JOIN run r USING (run_file_name)" \
    --output results.csv

# Query specific structures only
qpxc query sql \
    --dataset-path ./PXD014414 \
    --sql "SELECT COUNT(*) FROM psm" \
    --structures psm

filter¶

Filter a QPX data structure by a SQL condition.

Description¶

Opens the dataset, applies the filter condition to the specified structure, and returns the matching rows.

Parameters¶

Parameter	Type	Required	Default	Description
`--dataset-path`	DIRECTORY	Yes	-	Path to a QPX dataset directory
`--structure`	CHOICE	Yes	-	QPX data structure to filter (e.g., feature, psm, pg)
`--condition`	TEXT	Yes	-	SQL WHERE condition (e.g., 'charge > 2 AND global_qvalue < 0.01')
`--columns`	TEXT	No	all	Comma-separated list of columns to include in output (default: all)
`--limit`	INTEGER	No	-	Maximum number of rows to return
`--output`	FILE	No	-	Output file path. If not specified, results go to stdout.
`--output-format`	CHOICE	No	`csv`	Output format for results
`--verbose`	FLAG	No	-	Enable verbose logging

Usage Examples¶

Filter data structures by conditions:

# Filter features by charge and q-value
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure feature \
    --condition "charge > 2 AND global_qvalue < 0.01"

# Filter PSMs, select specific columns
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure psm \
    --condition "is_decoy = false" \
    --columns "sequence,charge,score" \
    --limit 100

# Export filtered protein groups to parquet
qpxc query filter \
    --dataset-path ./PXD014414 \
    --structure pg \
    --condition "global_qvalue < 0.01" \
    --output filtered_pg.parquet \
    --output-format parquet

head¶

Show the first N rows of a QPX data structure.

Description¶

A quick way to peek at the contents of any QPX data structure within a dataset.

Parameters¶

Parameter	Type	Required	Default	Description
`--dataset-path`	DIRECTORY	Yes	-	Path to a QPX dataset directory
`--structure`	CHOICE	Yes	-	QPX data structure to inspect (e.g., feature, psm, pg)
`-n`	INTEGER	No	10	Number of rows to display
`--columns`	TEXT	No	all	Comma-separated list of columns to include (default: all)
`--output-format`	CHOICE	No	`table`	Output format for results
`--verbose`	FLAG	No	-	Enable verbose logging

Usage Examples¶

Quick preview of data structure contents:

# Show first 10 rows of features
qpxc query head \
    --dataset-path ./PXD014414 \
    --structure feature

# Show 5 PSM rows with specific columns
qpxc query head \
    --dataset-path ./PXD014414 \
    --structure psm \
    -n 5 \
    --columns "sequence,charge,score"

Best Practices¶

Use --duckdb-threads and --duckdb-memory to control resource usage for large datasets
Use query head to preview data before running complex SQL queries
Use query filter for simple conditions; use query sql for joins and aggregations
Export large results to Parquet format with --output-format parquet to preserve types