Scores & CV Terms¶

QPX provides two structured fields for attaching metadata to records beyond the core columns: additional_scores for numeric quality metrics and cv_params for controlled vocabulary annotations. Both are stored as arrays of structs, making them extensible without schema changes.

Additional scores¶

The additional_scores field captures search engine scores, quality metrics, and other numeric values associated with a PSM, feature, peptide, or protein group. Each entry is self-describing: it carries its name, value, and an indication of whether higher values are better.

Struct definition¶

additional_scores: array[struct{
    score_name:    string,       -- Score identifier in snake_case (e.g. "comet_xcorr")
    score_value:   float,        -- Numeric value
    higher_better: bool, null    -- true = higher is better; false = lower is better; null = unknown
}]

The higher_better field makes the data self-describing so that downstream consumers can interpret scores without looking up external ontology definitions.

snake_case naming convention

All score names in QPX use snake_case -- no colons, dots, spaces, or mixed case. This ensures score names are valid identifiers in SQL, Python, R, and any query language without quoting. The Ontology Mapping maps each snake_case name to its proper ontology term and accession.

Example¶

[
  {"score_name": "comet_xcorr",    "score_value": 3.42,   "higher_better": true},
  {"score_name": "global_qvalue",  "score_value": 0.0012, "higher_better": false},
  {"score_name": "rank",           "score_value": 1.0,    "higher_better": false}
]

Recommended score names¶

The following score names are commonly used across QPX views. All names are snake_case. For formal ontology names and accessions, see the Ontology Mapping.

Score name	Typical view(s)	Direction	Description
`posterior_error_probability`	PSM	lower is better	Posterior error probability — probability that the PSM is incorrect. Ranges 0.0–1.0. This is a top-level field in the PSM view, not stored in `additional_scores`
`global_qvalue`	PSM, Feature	lower is better	Global q-value at the experiment level
`pg_global_qvalue`	PSM, Feature	lower is better	Protein group global q-value used to filter at the protein group level
`rank`	PSM	lower is better	Rank of the peptide in the search engine results (1 = best)
`comet_xcorr`	PSM	higher is better	Cross-correlation score from the Comet search engine
`comet_deltacn`	PSM	higher is better	Delta CN score from Comet
`comet_expect`	PSM	lower is better	Expectation value from Comet
`msgf_raw_score`	PSM	higher is better	Raw score from MS-GF+
`msgf_spec_evalue`	PSM	lower is better	Spectral E-value from MS-GF+
`sage_hyperscore`	PSM	higher is better	Hyperscore from Sage search engine
`diann_qvalue`	Feature	lower is better	Run-level q-value from DIA-NN
`diann_global_qvalue`	Feature	lower is better	Global q-value from DIA-NN
`diann_cscore`	Feature	higher is better	Confidence score from DIA-NN
`consensus_support`	PSM	higher is better	Number of search engines supporting the identification

Naming convention for new scores

When adding tool-specific scores, use snake_case with the tool name as prefix: {tool}_{score}. For example, comet_xcorr, diann_qvalue, sage_hyperscore. Register the mapping to the proper ontology term in ontology.parquet (see Ontology Mapping).

Protein-level additional scores¶

At the protein group level, additional_scores follows the same struct definition. The score_value is a float that applies to the entire protein group entry. For per-protein scores within a group, the values array index corresponds to the pg_accessions array index.

Controlled vocabulary terms¶

The cv_params field stores key-value annotations drawn from controlled vocabularies. Unlike additional_scores (which are always numeric), CV parameters can carry string values or be value-less (presence-only).

Struct definition¶

cv_params: array[struct{
    cv_name:  string,        -- Term name from a controlled vocabulary
    cv_value: string, null   -- Term value; null if the term is a flag (presence-only)
}]

Example¶

[
  {"cv_name": "ms level",          "cv_value": "2"},
  {"cv_name": "deconvoluted data", "cv_value": null},
  {"cv_name": "prot:FDR threshold", "cv_value": "0.01"},
  {"cv_name": "number of unmatched peaks", "cv_value": "3"}
]

Note

The cv_name is always required. The cv_value is optional -- a null value indicates a boolean/flag-style term where mere presence is meaningful (e.g., "deconvoluted data").

Where these fields are used¶

View	`additional_scores`	`cv_params`
PSM (`psm_file`)	Yes	Yes
Feature (`feature_file`)	Yes	Yes
API Views	Yes (as `best_id_score`)	--
Protein Group (`pg_file`)	Yes	--
MZ (`mz_file`)	--	Yes

Warning

Do not store search engine scores in cv_params. Numeric scores belong in additional_scores, which provides the higher_better flag and enforces float typing. Reserve cv_params for non-numeric annotations and metadata.

Scores & CV Terms¶

Additional scores¶

Struct definition¶

Example¶

Recommended score names¶

Protein-level additional scores¶

Controlled vocabulary terms¶

Struct definition¶

Example¶

Where these fields are used¶

Further reading¶