Skip to content

Validate Command

Validate QPX datasets and individual structures against their canonical schemas.

Overview

The validate command checks QPX datasets and individual Parquet files against their canonical schemas. It verifies column presence, type matching, null values in required columns, and primary key uniqueness.

Parameters

ParameterTypeRequiredDefaultDescription
--dataset-path DIRECTORY No - Path to a QPX dataset directory
--file FILE No - Path to a single QPX Parquet file to validate
--structure CHOICE No - Structure(s) to validate (repeatable). Default: all.
--verbose FLAG No - Enable verbose logging

Description

Checks column presence, type matching, null values in required columns, and primary key uniqueness. 

Usage Examples

Validate QPX datasets and files:

# Validate an entire dataset
qpxc validate --dataset-path ./PXD014414

# Validate specific structures
qpxc validate --dataset-path ./PXD014414 --structure feature
qpxc validate --dataset-path ./PXD014414 --structure feature --structure pg

# Validate a single Parquet file
qpxc validate --file ./data.feature.parquet

Validation Checks

The validator performs the following checks on each structure:

Check Description
Column presence All required columns defined in the canonical schema must exist
Type matching Column Arrow types must match the schema definition
Null values Non-nullable columns must not contain null values
Primary key uniqueness Primary key columns must have unique values

Programmatic Validation

You can also validate from Python:

import qpx

with qpx.open_dataset("./PXD014414") as ds:
    results = ds.validate()
    for name, result in results.items():
        print(result.summary)
        for issue in result.issues:
            print(f"  [{issue.severity}] {issue.message}")

Exit Codes

Code Meaning
0 All validated structures are valid
1 One or more structures have validation errors