Skip to content

Project Management Commands

Manage project metadata and file attachments for QPX projects.

Overview

The project command group provides tools for creating and managing project-level metadata, including integration with PRIDE Archive, SDRF file handling, and file attachment management.

Available Commands

  • create - Create project.json from PRIDE accession
  • attach - Attach files to project metadata

create

Generate a project file from a PRIDE project accession and SDRF metadata.

Description

Creates a comprehensive project.json file by combining metadata from the PRIDE Archive with sample information from an SDRF file. This command automatically fetches project details, publication information, and experimental metadata from PRIDE.

Parameters

ParameterTypeRequiredDefaultDescription
--project-accession TEXT Yes - PRIDE project accession
--sdrf-file FILE Yes - SDRF file path for metadata extraction
--output-folder DIRECTORY Yes - Output directory for generated files
--software-name TEXT No - Software name used to generate the data
--software-version TEXT No - Software version used to generate the data
--delete-existing FLAG No - Delete existing files in the output folder

Usage Examples

Basic Example

Create project metadata with full parameters:

qpxc project create \
    --project-accession PXD007683 \
    --sdrf-file ./metadata.sdrf.tsv \
    --output-folder ./project_metadata \
    --software-name MaxQuant \
    --software-version 2.0.3.0 \
    --delete-existing

With Software Information

qpxc project create \
    --project-accession PXD016999 \
    --sdrf-file tests/examples/AE/PXD016999-first-instrument.sdrf.tsv \
    --output-folder ./project_metadata \
    --software-name MaxQuant \
    --software-version 2.0.3.0 \
    --delete-existing

Complete Workflow Example

# Create project metadata
qpxc project create \
    --project-accession PXD033169 \
    --sdrf-file ./PXD033169.sdrf.tsv \
    --output-folder ./project \
    --software-name OpenMS \
    --software-version 2.8.0

# The SDRF file is automatically attached to the project
echo "Project file created: ./project/project.json"

Output Files

The command generates the following files in the output folder:

  1. project.json: Main project metadata file
  2. {project_accession}.sdrf.tsv: Copy of the SDRF file attached to the project

project.json Structure

The generated project.json file contains:

{
  "accession": "PXD001234",
  "title": "Project title from PRIDE",
  "description": "Project description from PRIDE",
  "organism": ["Homo sapiens"],
  "instrument": ["Q Exactive HF"],
  "quantification_method": "label free",
  "publication": {
    "title": "Publication title",
    "doi": "10.1234/journal.1234567",
    "pubmed_id": "12345678"
  },
  "samples": [
    {
      "sample_id": "Sample_001",
      "condition": "Control",
      "biological_replicate": "1"
    }
  ],
  "software": {
    "name": "MaxQuant",
    "version": "2.0.3.0"
  },
  "qpx_version": "1.0.0",
  "files": [
    {
      "name": "PXD001234.sdrf.tsv",
      "type": "sdrf",
      "checksum": "abc123..."
    }
  ]
}

Metadata Sources

The command integrates metadata from multiple sources:

Source Information Retrieved
PRIDE Archive Project title, description, organism, publication info, instrument details
SDRF File Sample metadata, experimental design, conditions, replicates
Command Parameters Software information, qpx version

Common Issues

Issue: Project not found in PRIDE

  • Solution: Verify the project accession is correct and the project is public in PRIDE

Issue: Network timeout when fetching PRIDE metadata

  • Solution: Check internet connection or try again later

Issue: SDRF sample names don't match data files

  • Solution: Ensure SDRF file is correctly formatted and sample names match your data

Best Practices

  • Run this command at the beginning of data processing to establish provenance
  • Include software name and version for reproducibility
  • Verify SDRF file format before running (use PRIDE SDRF validator)
  • Keep the project.json file with your processed data
  • Use --delete-existing flag carefully to avoid accidental data loss

Validation

After creating the project file, validate it:

# Check the project file was created
ls -lh ./project_metadata/project.json

# View the project metadata
cat ./project_metadata/project.json | python -m json.tool

# Verify SDRF was attached
ls -lh ./project_metadata/*.sdrf.tsv

attach

Attach additional files to an existing project metadata file.

Description

Adds references to data files in the project.json metadata. This command is useful for tracking all files associated with a project.

Parameters

Parameter Type Required Default Description
--project-file Path Yes - Existing project.json file path
--attach-file Path Yes - File to attach to the project
--category Choice Yes - File category: sdrf-file, psm-file, feature-file, absolute-file, differential-file
--is-folder Flag No False Indicates if the file is a folder
--partitions String No - Fields used for splitting files, separated by comma
--replace-existing Flag No False Whether to delete old files

Usage Examples

Attach PSM File

qpxc project attach \
    --project-file ./project/project.json \
    --attach-file ./output/psm-abc123.psm.parquet \
    --category psm-file

Attach Multiple Files

# Attach PSM file
qpxc project attach \
    --project-file ./project/project.json \
    --attach-file ./output/psm.parquet \
    --category psm-file

# Attach feature file
qpxc project attach \
    --project-file ./project/project.json \
    --attach-file ./output/feature.parquet \
    --category feature-file

# Attach absolute expression file
qpxc project attach \
    --project-file ./project/project.json \
    --attach-file ./output/ae.parquet \
    --category absolute-file

Complete Project Assembly

#!/bin/bash

PROJECT_FILE="./project/project.json"

# Create project metadata
qpxc project create \
    --project-accession PXD001234 \
    --sdrf-file ./metadata.sdrf.tsv \
    --output-folder ./project \
    --software-name MaxQuant \
    --software-version 2.0.3.0

# Attach all processed files
for file in ./output/*.parquet; do
    # Determine file category from filename
    if [[ $file == *"psm"* ]]; then
        category="psm-file"
    elif [[ $file == *"feature"* ]]; then
        category="feature-file"
    elif [[ $file == *"absolute"* ]]; then
        category="absolute-file"
    elif [[ $file == *"differential"* ]]; then
        category="differential-file"
    else
        continue
    fi

    qpxc project attach \
        --project-file "$PROJECT_FILE" \
        --attach-file "$file" \
        --category "$category"
done

echo "All files attached to project"

File Categories

Supported file category values:

Category Description
psm-file Peptide-spectrum match data
feature-file Feature-level quantification
absolute-file Absolute expression data
differential-file Differential expression results
sdrf-file Sample and data relationship format

Output

The command updates the project.json file by adding a file entry to the appropriate section based on the category.

Best Practices

  • Attach files immediately after creating them to maintain accurate file tracking
  • Use correct file categories for proper organization
  • Keep project.json file backed up as it tracks all project data

Project Metadata Best Practices

Complete Project Setup Workflow

#!/bin/bash

# Define variables
PROJECT_ID="PXD001234"
SDRF_FILE="./metadata/${PROJECT_ID}.sdrf.tsv"
OUTPUT_DIR="./processed"
PROJECT_DIR="./project"

# Step 1: Create project metadata
echo "Creating project metadata..."
qpxc project create \
    --project-accession "$PROJECT_ID" \
    --sdrf-file "$SDRF_FILE" \
    --output-folder "$PROJECT_DIR" \
    --software-name MaxQuant \
    --software-version 2.0.3.0

# Step 2: Process data (example with MaxQuant)
echo "Processing data..."
qpxc convert maxquant-psm \
    --msms-file ./raw/msms.txt \
    --output-folder "$OUTPUT_DIR"

qpxc convert maxquant-feature \
    --evidence-file ./raw/evidence.txt \
    --sdrf-file "$SDRF_FILE" \
    --output-folder "$OUTPUT_DIR"

qpxc convert maxquant-pg \
    --protein-groups-file ./raw/proteinGroups.txt \
    --sdrf-file "$SDRF_FILE" \
    --output-folder "$OUTPUT_DIR"

# Step 3: Attach all processed files
echo "Attaching processed files to project..."
for file in "$OUTPUT_DIR"/*.parquet; do
    filename=$(basename "$file")

    # Determine category
    if [[ $filename == *"psm"* ]]; then
        category="psm-file"
    elif [[ $filename == *"feature"* ]]; then
        category="feature-file"
    fi

    qpxc project attach \
        --project-file "$PROJECT_DIR/project.json" \
        --attach-file "$file" \
        --category "$category"
done

echo "Project setup complete!"
echo "Project metadata: $PROJECT_DIR/project.json"

Version Control Integration

Track project metadata with git:

# Initialize git repository for project
cd ./project
git init
git add project.json *.sdrf.tsv
git commit -m "Initial project metadata for $PROJECT_ID"

# After attaching files
git add project.json
git commit -m "Attached processed data files"

# Tag releases
git tag -a v1.0 -m "Initial data release"

Data Sharing

Prepare project for sharing:

#!/bin/bash

PROJECT_DIR="./project"
ARCHIVE_NAME="project_data_$(date +%Y%m%d).tar.gz"

# Create archive with project metadata and data
tar -czf "$ARCHIVE_NAME" \
    "$PROJECT_DIR/project.json" \
    "$PROJECT_DIR"/*.sdrf.tsv \
    ./output/*.parquet

echo "Project archive created: $ARCHIVE_NAME"
echo "SHA256: $(sha256sum $ARCHIVE_NAME)"