Scores & CV Terms¶
QPX provides two structured fields for attaching metadata to records beyond the core columns: additional_scores for numeric quality metrics and cv_params for controlled vocabulary annotations. Both are stored as arrays of structs, making them extensible without schema changes.
Additional scores¶
The additional_scores field captures search engine scores, quality metrics, and other numeric values associated with a PSM, feature, peptide, or protein group. Each entry is self-describing: it carries its name, value, and an indication of whether higher values are better.
Struct definition¶
additional_scores: array[struct{
score_name: string, -- Score identifier in snake_case (e.g. "comet_xcorr")
score_value: float, -- Numeric value
higher_better: bool, null -- true = higher is better; false = lower is better; null = unknown
}]
The higher_better field makes the data self-describing so that downstream consumers can interpret scores without looking up external ontology definitions.
snake_case naming convention
All score names in QPX use snake_case -- no colons, dots, spaces, or mixed case. This ensures score names are valid identifiers in SQL, Python, R, and any query language without quoting. The Ontology Mapping maps each snake_case name to its proper ontology term and accession.
Example¶
[
{"score_name": "comet_xcorr", "score_value": 3.42, "higher_better": true},
{"score_name": "global_qvalue", "score_value": 0.0012, "higher_better": false},
{"score_name": "rank", "score_value": 1.0, "higher_better": false}
]
Recommended score names¶
The following score names are commonly used across QPX views. All names are snake_case. For formal ontology names and accessions, see the Ontology Mapping.
| Score name | Typical view(s) | Direction | Description |
|---|---|---|---|
posterior_error_probability |
PSM | lower is better | Posterior error probability — probability that the PSM is incorrect. Ranges 0.0–1.0. This is a top-level field in the PSM view, not stored in additional_scores |
global_qvalue |
PSM, Feature | lower is better | Global q-value at the experiment level |
pg_global_qvalue |
PSM, Feature | lower is better | Protein group global q-value used to filter at the protein group level |
rank |
PSM | lower is better | Rank of the peptide in the search engine results (1 = best) |
comet_xcorr |
PSM | higher is better | Cross-correlation score from the Comet search engine |
comet_deltacn |
PSM | higher is better | Delta CN score from Comet |
comet_expect |
PSM | lower is better | Expectation value from Comet |
msgf_raw_score |
PSM | higher is better | Raw score from MS-GF+ |
msgf_spec_evalue |
PSM | lower is better | Spectral E-value from MS-GF+ |
sage_hyperscore |
PSM | higher is better | Hyperscore from Sage search engine |
diann_qvalue |
Feature | lower is better | Run-level q-value from DIA-NN |
diann_global_qvalue |
Feature | lower is better | Global q-value from DIA-NN |
diann_cscore |
Feature | higher is better | Confidence score from DIA-NN |
consensus_support |
PSM | higher is better | Number of search engines supporting the identification |
Naming convention for new scores
When adding tool-specific scores, use snake_case with the tool name as prefix: {tool}_{score}. For example, comet_xcorr, diann_qvalue, sage_hyperscore. Register the mapping to the proper ontology term in ontology.parquet (see Ontology Mapping).
Protein-level additional scores¶
At the protein group level, additional_scores follows the same struct definition. The score_value is a float that applies to the entire protein group entry. For per-protein scores within a group, the values array index corresponds to the pg_accessions array index.
Controlled vocabulary terms¶
The cv_params field stores key-value annotations drawn from controlled vocabularies. Unlike additional_scores (which are always numeric), CV parameters can carry string values or be value-less (presence-only).
Struct definition¶
cv_params: array[struct{
cv_name: string, -- Term name from a controlled vocabulary
cv_value: string, null -- Term value; null if the term is a flag (presence-only)
}]
Example¶
[
{"cv_name": "ms level", "cv_value": "2"},
{"cv_name": "deconvoluted data", "cv_value": null},
{"cv_name": "prot:FDR threshold", "cv_value": "0.01"},
{"cv_name": "number of unmatched peaks", "cv_value": "3"}
]
Note
The cv_name is always required. The cv_value is optional -- a null value indicates a boolean/flag-style term where mere presence is meaningful (e.g., "deconvoluted data").
Where these fields are used¶
| View | additional_scores |
cv_params |
|---|---|---|
PSM (psm_file) |
Yes | Yes |
Feature (feature_file) |
Yes | Yes |
| API Views | Yes (as best_id_score) |
-- |
Protein Group (pg_file) |
Yes | -- |
MZ (mz_file) |
-- | Yes |
Warning
Do not store search engine scores in cv_params. Numeric scores belong in additional_scores, which provides the higher_better flag and enforces float typing. Reserve cv_params for non-numeric annotations and metadata.
Further reading¶
- Modifications -- localization scores use a similar score struct pattern
- Intensities -- additional processed values for features and protein groups
- QPX Format Overview -- full list of views and concepts