Query Commands¶
Query and inspect QPX datasets using SQL, filters, or quick previews.
Overview¶
The query command group provides tools for querying QPX datasets. You can run arbitrary SQL queries, filter data structures by conditions, or quickly preview the first rows of any structure.
Available Commands¶
- sql - Run arbitrary SQL queries against a dataset
- filter - Filter a data structure by a SQL condition
- head - Show the first N rows of a structure
sql¶
Run an arbitrary SQL query against a QPX dataset.
Description¶
Opens the dataset, registers all (or selected) data structures as DuckDB tables, then executes the provided SQL. Table names match the QPX structure names: psm, feature, pg, mz, sample, run, etc.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--dataset-path |
DIRECTORY | Yes | - | Path to a QPX dataset directory |
--sql |
TEXT | Yes | - | SQL query to execute against the dataset |
--output |
FILE | No | - | Output file path (CSV). If not specified, results go to stdout. |
--output-format |
CHOICE | No | csv |
Output format for results |
--structures |
TEXT | No | all found | Comma-separated list of structures to load (default: all found) |
--duckdb-memory |
TEXT | No | 16GB |
DuckDB memory limit (e.g., '16GB') |
--duckdb-threads |
INTEGER | No | 4 | DuckDB thread count |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Run SQL queries against QPX datasets:
# Count features per run
qpxc query sql \
--dataset-path ./PXD014414 \
--sql "SELECT run_file_name, COUNT(*) AS n FROM feature GROUP BY 1"
# Join feature with run, export to CSV
qpxc query sql \
--dataset-path ./PXD014414 \
--sql "SELECT f.sequence, r.run_file_name FROM feature f JOIN run r USING (run_file_name)" \
--output results.csv
# Query specific structures only
qpxc query sql \
--dataset-path ./PXD014414 \
--sql "SELECT COUNT(*) FROM psm" \
--structures psm
filter¶
Filter a QPX data structure by a SQL condition.
Description¶
Opens the dataset, applies the filter condition to the specified structure, and returns the matching rows.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--dataset-path |
DIRECTORY | Yes | - | Path to a QPX dataset directory |
--structure |
CHOICE | Yes | - | QPX data structure to filter (e.g., feature, psm, pg) |
--condition |
TEXT | Yes | - | SQL WHERE condition (e.g., 'charge > 2 AND global_qvalue < 0.01') |
--columns |
TEXT | No | all | Comma-separated list of columns to include in output (default: all) |
--limit |
INTEGER | No | - | Maximum number of rows to return |
--output |
FILE | No | - | Output file path. If not specified, results go to stdout. |
--output-format |
CHOICE | No | csv |
Output format for results |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Filter data structures by conditions:
# Filter features by charge and q-value
qpxc query filter \
--dataset-path ./PXD014414 \
--structure feature \
--condition "charge > 2 AND global_qvalue < 0.01"
# Filter PSMs, select specific columns
qpxc query filter \
--dataset-path ./PXD014414 \
--structure psm \
--condition "is_decoy = false" \
--columns "sequence,charge,score" \
--limit 100
# Export filtered protein groups to parquet
qpxc query filter \
--dataset-path ./PXD014414 \
--structure pg \
--condition "global_qvalue < 0.01" \
--output filtered_pg.parquet \
--output-format parquet
head¶
Show the first N rows of a QPX data structure.
Description¶
A quick way to peek at the contents of any QPX data structure within a dataset.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--dataset-path |
DIRECTORY | Yes | - | Path to a QPX dataset directory |
--structure |
CHOICE | Yes | - | QPX data structure to inspect (e.g., feature, psm, pg) |
-n |
INTEGER | No | 10 | Number of rows to display |
--columns |
TEXT | No | all | Comma-separated list of columns to include (default: all) |
--output-format |
CHOICE | No | table |
Output format for results |
--verbose |
FLAG | No | - | Enable verbose logging |
Usage Examples¶
Quick preview of data structure contents:
# Show first 10 rows of features
qpxc query head \
--dataset-path ./PXD014414 \
--structure feature
# Show 5 PSM rows with specific columns
qpxc query head \
--dataset-path ./PXD014414 \
--structure psm \
-n 5 \
--columns "sequence,charge,score"
Best Practices¶
- Use
--duckdb-threadsand--duckdb-memoryto control resource usage for large datasets - Use
query headto preview data before running complex SQL queries - Use
query filterfor simple conditions; usequery sqlfor joins and aggregations - Export large results to Parquet format with
--output-format parquetto preserve types