Skip to content

Feature View

The feature view captures quantified peptide information at the MS run level. Each row represents a peptide feature -- a quantified peptidoform in a specific run file -- including its intensity across labels and protein group mappings.

Use Cases

  • Quantified peptide information: Stores peptide intensities linked to sample metadata, enabling downstream quantitative analysis and integration with SDRF annotations.
  • Peptide-level statistics: Enables algorithms that operate at the peptide level (e.g., peptide-to-protein rollup, normalization, missing value imputation).
  • Integration with sample metadata: Each feature carries label information, connecting quantification data to experimental design described in the SDRF.

Schema

Core Identification Fields

These fields are shared with the PSM view and describe the peptide identification associated with the feature.

Field Description Type Required
sequence Unmodified peptide amino acid sequence string yes
peptidoform Peptide sequence with modifications in ProForma notation string yes
modifications Structured list of modifications with name, accession, position, and localization scores array[struct], null no
charge Charge of the quantified analyte int16 yes
posterior_error_probability Posterior error probability (PEP) for the peptide match float64, null no
is_decoy Whether the peptide is a decoy match (true) or a target match (false) bool yes
calculated_mz Theoretical peptide mass-to-charge ratio based on identified sequence and modifications float32 yes
observed_mz Experimental observed peptide mass-to-charge ratio float32 yes
mass_error_ppm Mass error in ppm: 1e6 × (observed_mz − calculated_mz) / calculated_mz float32, null no
missed_cleavages Number of missed enzymatic cleavages int16, null no
rt Precursor retention time (in seconds) float32, null no
rt_start Start of the retention time window for the feature float32, null no
rt_stop End of the retention time window for the feature float32, null no
predicted_rt Predicted retention time of the peptide (in seconds) float32, null no
ion_mobility Ion mobility value for the precursor ion float32, null no
ion_mobility_start Start ion mobility value for the precursor ion float32, null no
ion_mobility_stop Stop ion mobility value for the precursor ion float32, null no
additional_scores List of score structures with name, value, and direction indicator array[struct], null no
cv_params Optional list of controlled vocabulary parameters for additional metadata array[struct], null no

Quantification Fields

Field Description Type Required
intensities Primary intensity-based abundance of the feature across labels array[struct], null no
additional_intensities Pre-computed intensity values from the upstream tool (e.g., normalized, LFQ, iBAQ) as named key-value pairs per label array[struct], null no
run_file_name The run file name that contains the feature string yes

Intensity structure

For details on the intensities and additional_intensities data structures, including examples for LFQ and TMT experiments, see Intensities.

Protein Group Fields

Field Description Type Required
pg_accessions Protein group accessions of all proteins that the peptide maps to array[string], null no
anchor_protein One protein accession that represents the protein group string yes
unique Whether the peptide maps uniquely to a single protein group bool, null no
pg_positions Peptide start and end positions within each protein in the protein group array[struct], null no
pg_global_qvalue Global q-value of the protein group at the experiment level float64, null optional
gg_accessions Gene accessions associated with the protein group array[string], null optional
gg_names Gene names associated with the protein group array[string], null optional

Each entry in pg_positions contains:

Sub-field Description Type
protein_accession Protein accession within the protein group string
start 1-based start position of the peptide in the protein sequence int
end 1-based end position of the peptide in the protein sequence (inclusive) int

Gene and protein inference data

Gene accessions, gene names, and unique peptide indicators are optionally included in the feature file for convenience. Protein-level scores are stored in the Protein Group View. For the complete protein-level perspective with aggregated intensities and peptide counts, join on pg_accessions + run_file_name with the PG view.

Optional vs nullable

pg_global_qvalue is optional — the column may be absent from the file entirely if the search engine does not provide a protein group q-value. When present, individual values may be null.

Spectra Reference

Field Description Type Required
id_run_file_name The run file containing the best PSM that identified the feature (may differ from run_file_name) string, null no
scan Scan identifier of the best PSM that identified the feature, as an array of integer components (e.g., [43920] for single-scan instruments, [10, 1, 345] for Waters function/process/scan) array[int32] yes

Shared Fields

Several fields in the feature view use structures shared across other QPX views:

  • For details on the modifications field structure, see Modifications.
  • For details on intensities and additional_intensities, see Intensities.
  • For details on additional_scores and score semantics, see Scores.
  • For details on cv_params usage and recommended terms, see Scores & CV Terms.

Example

Feature with TMT Intensities

{
  "sequence": "AADLLTSFLGHK",
  "peptidoform": "AADLLTSFLGHK",
  "modifications": null,
  "charge": 3,
  "is_decoy": false,
  "calculated_mz": 424.2345,
  "observed_mz": 424.2350,
  "rt": 2345.67,
  "rt_start": 2340.12,
  "rt_stop": 2351.23,
  "predicted_rt": 2348.00,
  "ion_mobility": null,
  "ion_mobility_start": null,
  "ion_mobility_stop": null,
  "run_file_name": "20200101_TMT_fraction01",
  "scan": [15234],
  "id_run_file_name": "20200101_TMT_fraction01",
  "pg_accessions": ["P04217", "P04217-2"],
  "anchor_protein": "P04217",
  "pg_positions": [
    {"protein_accession": "P04217", "start": 25, "end": 36},
    {"protein_accession": "P04217-2", "start": 25, "end": 36}
  ],
  "intensities": [
    {"label": "TMT126", "intensity": 15234.5},
    {"label": "TMT127N", "intensity": 18456.7},
    {"label": "TMT127C", "intensity": 12890.3},
    {"label": "TMT128N", "intensity": 21045.8}
  ],
  "additional_intensities": [
    {
      "label": "TMT126",
      "intensities": [
        {"intensity_name": "normalize_intensity", "intensity_value": 0.1523},
        {"intensity_name": "ibaq", "intensity_value": 4567.8}
      ]
    },
    {
      "label": "TMT127N",
      "intensities": [
        {"intensity_name": "normalize_intensity", "intensity_value": 0.1846},
        {"intensity_name": "ibaq", "intensity_value": 5432.1}
      ]
    }
  ],
  "additional_scores": [
    {
      "score_name": "global_qvalue",
      "score_value": 0.0012,
      "higher_better": false
    }
  ],
  "cv_params": null
}

Feature with LFQ Intensity

{
  "sequence": "VLHPLEGAVVIIFK",
  "peptidoform": "[UniMod:1]-VLHPLEGAVVIIFK",
  "modifications": [
    {
      "name": "Acetyl",
      "accession": "UniMod:1",
      "positions": [
        {"position": 0, "amino_acid": null, "scores": null}
      ]
    }
  ],
  "charge": 2,
  "is_decoy": false,
  "calculated_mz": 782.4721,
  "observed_mz": 782.4725,
  "rt": 3567.89,
  "run_file_name": "20200101_LFQ_rep1",
  "scan": [28901],
  "anchor_protein": "P68871",
  "pg_accessions": ["P68871"],
  "pg_positions": [
    {"protein_accession": "P68871", "start": 2, "end": 15}
  ],
  "intensities": [
    {"label": "LFQ", "intensity": 98765.4}
  ],
  "additional_intensities": null
}

File Metadata

Feature Parquet files store file-level metadata as key-value pairs in the Parquet footer. The following metadata fields are defined:

Field Description
qpx_version Version of the QPX format used to generate the file
software_provider Name and version of the software that generated the data
scan_format Format of scan identifiers: scan, index, or nativeId
creator Name of the tool or person who created the file
file_type Type of the file (value: feature_file)
creation_date Date when the file was created
compression_format Compression algorithm used: zstd (default), snappy, gzip, lzo, or none

Reading file metadata in Python

import pyarrow.parquet as pq

parquet_file = pq.ParquetFile("experiment.feature.parquet")
metadata = parquet_file.schema_arrow.metadata
for key, value in metadata.items():
    print(f"{key.decode()}: {value.decode()}")

Notes

Works for both DDA and DIA

Unlike the PSM view (which is DDA-specific), the feature view supports both DDA and DIA workflows. For DIA experiments, the feature view is the primary peptide-level output.

  • Relationship to PSM view: A feature aggregates one or more PSMs into a single quantified peptide entry. The scan and id_run_file_name fields link back to the best PSM that identified the feature. In DDA-LFQ workflows, both views may exist; in DIA workflows, only the feature view is typically generated.
  • Relationship to protein group view: The feature view contains protein group mappings (pg_accessions, anchor_protein) and peptide positions (pg_positions) that connect each peptide to its inferred protein groups. Gene annotations, unique peptide indicators, and protein-level scores are stored in the Protein Group View. The PG view provides the complete protein-level perspective with aggregated intensities and peptide counts.
  • Intensities vs additional_intensities: Use intensities for raw/primary measurements across experimental labels (TMT/iTRAQ tags or LFQ). Use additional_intensities for values pre-computed by the upstream tool (e.g., normalized intensities, LFQ, iBAQ). QPX reads these from the tool's output — it does not compute them. This separation keeps experimental design aspects distinct from data processing aspects.
  • Protein group accessions: The pg_accessions field should contain all proteins within a protein group. The anchor_protein is the representative protein selected by the search engine or inference algorithm to represent the group. The pg_positions field maps the peptide's start and end coordinates within each protein in the group.