Feature View¶
The feature view captures quantified peptide information at the MS run level. Each row represents a peptide feature -- a quantified peptidoform in a specific run file -- including its intensity across labels and protein group mappings.
Use Cases¶
- Quantified peptide information: Stores peptide intensities linked to sample metadata, enabling downstream quantitative analysis and integration with SDRF annotations.
- Peptide-level statistics: Enables algorithms that operate at the peptide level (e.g., peptide-to-protein rollup, normalization, missing value imputation).
- Integration with sample metadata: Each feature carries label information, connecting quantification data to experimental design described in the SDRF.
Schema¶
Core Identification Fields¶
These fields are shared with the PSM view and describe the peptide identification associated with the feature.
| Field | Description | Type | Required |
|---|---|---|---|
sequence |
Unmodified peptide amino acid sequence | string | yes |
peptidoform |
Peptide sequence with modifications in ProForma notation | string | yes |
modifications |
Structured list of modifications with name, accession, position, and localization scores | array[struct], null | no |
charge |
Charge of the quantified analyte | int16 | yes |
posterior_error_probability |
Posterior error probability (PEP) for the peptide match | float64, null | no |
is_decoy |
Whether the peptide is a decoy match (true) or a target match (false) |
bool | yes |
calculated_mz |
Theoretical peptide mass-to-charge ratio based on identified sequence and modifications | float32 | yes |
observed_mz |
Experimental observed peptide mass-to-charge ratio | float32 | yes |
mass_error_ppm |
Mass error in ppm: 1e6 × (observed_mz − calculated_mz) / calculated_mz | float32, null | no |
missed_cleavages |
Number of missed enzymatic cleavages | int16, null | no |
rt |
Precursor retention time (in seconds) | float32, null | no |
rt_start |
Start of the retention time window for the feature | float32, null | no |
rt_stop |
End of the retention time window for the feature | float32, null | no |
predicted_rt |
Predicted retention time of the peptide (in seconds) | float32, null | no |
ion_mobility |
Ion mobility value for the precursor ion | float32, null | no |
ion_mobility_start |
Start ion mobility value for the precursor ion | float32, null | no |
ion_mobility_stop |
Stop ion mobility value for the precursor ion | float32, null | no |
additional_scores |
List of score structures with name, value, and direction indicator | array[struct], null | no |
cv_params |
Optional list of controlled vocabulary parameters for additional metadata | array[struct], null | no |
Quantification Fields¶
| Field | Description | Type | Required |
|---|---|---|---|
intensities |
Primary intensity-based abundance of the feature across labels | array[struct], null | no |
additional_intensities |
Pre-computed intensity values from the upstream tool (e.g., normalized, LFQ, iBAQ) as named key-value pairs per label | array[struct], null | no |
run_file_name |
The run file name that contains the feature | string | yes |
Intensity structure
For details on the intensities and additional_intensities data structures, including examples for LFQ and TMT experiments, see Intensities.
Protein Group Fields¶
| Field | Description | Type | Required |
|---|---|---|---|
pg_accessions |
Protein group accessions of all proteins that the peptide maps to | array[string], null | no |
anchor_protein |
One protein accession that represents the protein group | string | yes |
unique |
Whether the peptide maps uniquely to a single protein group | bool, null | no |
pg_positions |
Peptide start and end positions within each protein in the protein group | array[struct], null | no |
pg_global_qvalue |
Global q-value of the protein group at the experiment level | float64, null | optional |
gg_accessions |
Gene accessions associated with the protein group | array[string], null | optional |
gg_names |
Gene names associated with the protein group | array[string], null | optional |
Each entry in pg_positions contains:
| Sub-field | Description | Type |
|---|---|---|
protein_accession |
Protein accession within the protein group | string |
start |
1-based start position of the peptide in the protein sequence | int |
end |
1-based end position of the peptide in the protein sequence (inclusive) | int |
Gene and protein inference data
Gene accessions, gene names, and unique peptide indicators are optionally included in the feature file for convenience. Protein-level scores are stored in the Protein Group View. For the complete protein-level perspective with aggregated intensities and peptide counts, join on pg_accessions + run_file_name with the PG view.
Optional vs nullable
pg_global_qvalue is optional — the column may be absent from the file entirely if the search engine does not provide a protein group q-value. When present, individual values may be null.
Spectra Reference¶
| Field | Description | Type | Required |
|---|---|---|---|
id_run_file_name |
The run file containing the best PSM that identified the feature (may differ from run_file_name) |
string, null | no |
scan |
Scan identifier of the best PSM that identified the feature, as an array of integer components (e.g., [43920] for single-scan instruments, [10, 1, 345] for Waters function/process/scan) |
array[int32] | yes |
Shared Fields¶
Several fields in the feature view use structures shared across other QPX views:
- For details on the
modificationsfield structure, see Modifications. - For details on
intensitiesandadditional_intensities, see Intensities. - For details on
additional_scoresand score semantics, see Scores. - For details on
cv_paramsusage and recommended terms, see Scores & CV Terms.
Example¶
Feature with TMT Intensities¶
{
"sequence": "AADLLTSFLGHK",
"peptidoform": "AADLLTSFLGHK",
"modifications": null,
"charge": 3,
"is_decoy": false,
"calculated_mz": 424.2345,
"observed_mz": 424.2350,
"rt": 2345.67,
"rt_start": 2340.12,
"rt_stop": 2351.23,
"predicted_rt": 2348.00,
"ion_mobility": null,
"ion_mobility_start": null,
"ion_mobility_stop": null,
"run_file_name": "20200101_TMT_fraction01",
"scan": [15234],
"id_run_file_name": "20200101_TMT_fraction01",
"pg_accessions": ["P04217", "P04217-2"],
"anchor_protein": "P04217",
"pg_positions": [
{"protein_accession": "P04217", "start": 25, "end": 36},
{"protein_accession": "P04217-2", "start": 25, "end": 36}
],
"intensities": [
{"label": "TMT126", "intensity": 15234.5},
{"label": "TMT127N", "intensity": 18456.7},
{"label": "TMT127C", "intensity": 12890.3},
{"label": "TMT128N", "intensity": 21045.8}
],
"additional_intensities": [
{
"label": "TMT126",
"intensities": [
{"intensity_name": "normalize_intensity", "intensity_value": 0.1523},
{"intensity_name": "ibaq", "intensity_value": 4567.8}
]
},
{
"label": "TMT127N",
"intensities": [
{"intensity_name": "normalize_intensity", "intensity_value": 0.1846},
{"intensity_name": "ibaq", "intensity_value": 5432.1}
]
}
],
"additional_scores": [
{
"score_name": "global_qvalue",
"score_value": 0.0012,
"higher_better": false
}
],
"cv_params": null
}
Feature with LFQ Intensity¶
{
"sequence": "VLHPLEGAVVIIFK",
"peptidoform": "[UniMod:1]-VLHPLEGAVVIIFK",
"modifications": [
{
"name": "Acetyl",
"accession": "UniMod:1",
"positions": [
{"position": 0, "amino_acid": null, "scores": null}
]
}
],
"charge": 2,
"is_decoy": false,
"calculated_mz": 782.4721,
"observed_mz": 782.4725,
"rt": 3567.89,
"run_file_name": "20200101_LFQ_rep1",
"scan": [28901],
"anchor_protein": "P68871",
"pg_accessions": ["P68871"],
"pg_positions": [
{"protein_accession": "P68871", "start": 2, "end": 15}
],
"intensities": [
{"label": "LFQ", "intensity": 98765.4}
],
"additional_intensities": null
}
File Metadata¶
Feature Parquet files store file-level metadata as key-value pairs in the Parquet footer. The following metadata fields are defined:
| Field | Description |
|---|---|
qpx_version |
Version of the QPX format used to generate the file |
software_provider |
Name and version of the software that generated the data |
scan_format |
Format of scan identifiers: scan, index, or nativeId |
creator |
Name of the tool or person who created the file |
file_type |
Type of the file (value: feature_file) |
creation_date |
Date when the file was created |
compression_format |
Compression algorithm used: zstd (default), snappy, gzip, lzo, or none |
Reading file metadata in Python
Notes¶
Works for both DDA and DIA
Unlike the PSM view (which is DDA-specific), the feature view supports both DDA and DIA workflows. For DIA experiments, the feature view is the primary peptide-level output.
- Relationship to PSM view: A feature aggregates one or more PSMs into a single quantified peptide entry. The
scanandid_run_file_namefields link back to the best PSM that identified the feature. In DDA-LFQ workflows, both views may exist; in DIA workflows, only the feature view is typically generated. - Relationship to protein group view: The feature view contains protein group mappings (
pg_accessions,anchor_protein) and peptide positions (pg_positions) that connect each peptide to its inferred protein groups. Gene annotations, unique peptide indicators, and protein-level scores are stored in the Protein Group View. The PG view provides the complete protein-level perspective with aggregated intensities and peptide counts. - Intensities vs additional_intensities: Use
intensitiesfor raw/primary measurements across experimental labels (TMT/iTRAQ tags or LFQ). Useadditional_intensitiesfor values pre-computed by the upstream tool (e.g., normalized intensities, LFQ, iBAQ). QPX reads these from the tool's output — it does not compute them. This separation keeps experimental design aspects distinct from data processing aspects. - Protein group accessions: The
pg_accessionsfield should contain all proteins within a protein group. Theanchor_proteinis the representative protein selected by the search engine or inference algorithm to represent the group. Thepg_positionsfield maps the peptide's start and end coordinates within each protein in the group.