Validate Command¶
Validate QPX datasets and individual structures against their canonical schemas.
Overview¶
The validate command checks QPX datasets and individual Parquet files against their canonical schemas. It verifies column presence, type matching, null values in required columns, and primary key uniqueness.
Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--dataset-path |
DIRECTORY | No | - | Path to a QPX dataset directory |
--file |
FILE | No | - | Path to a single QPX Parquet file to validate |
--structure |
CHOICE | No | - | Structure(s) to validate (repeatable). Default: all. |
--verbose |
FLAG | No | - | Enable verbose logging |
Description¶
Checks column presence, type matching, null values in required columns, and primary key uniqueness.
Usage Examples¶
Validate QPX datasets and files:
# Validate an entire dataset
qpxc validate --dataset-path ./PXD014414
# Validate specific structures
qpxc validate --dataset-path ./PXD014414 --structure feature
qpxc validate --dataset-path ./PXD014414 --structure feature --structure pg
# Validate a single Parquet file
qpxc validate --file ./data.feature.parquet
Validation Checks¶
The validator performs the following checks on each structure:
| Check | Description |
|---|---|
| Column presence | All required columns defined in the canonical schema must exist |
| Type matching | Column Arrow types must match the schema definition |
| Null values | Non-nullable columns must not contain null values |
| Primary key uniqueness | Primary key columns must have unique values |
Programmatic Validation¶
You can also validate from Python:
import qpx
with qpx.open_dataset("./PXD014414") as ds:
results = ds.validate()
for name, result in results.items():
print(result.summary)
for issue in result.issues:
print(f" [{issue.severity}] {issue.message}")
Exit Codes¶
| Code | Meaning |
|---|---|
0 |
All validated structures are valid |
1 |
One or more structures have validation errors |