This page was composed with the aid of generative artificial intelligence; it is partially curated.

[CURATION NEEDED]

Contributing Guide

How to contribute to ZaroPGx development.

Getting Started

Prerequisites

  • Git

  • Docker and Docker Compose

  • Python 3.12+

  • Basic understanding of pharmacogenomics

  • Familiarity with FastAPI and SQLAlchemy

Development Setup

  1. Fork the repository on GitHub

  2. Clone your fork:

    git clone https://github.com/your-username/ZaroPGx.git
    cd ZaroPGx
    
  3. Set up development environment:

    cp .env.local .env
    docker compose up -d --build
    
  4. Verify setup:

    curl http://localhost:8765/health
    

Contribution Types

Bug Reports

Before reporting a bug:

  • Check existing issues

  • Search closed issues

  • Test with latest version

  • Gather relevant information

Bug report template:

## Bug Description
Brief description of the bug

## Steps to Reproduce
1. Step one
2. Step two
3. Step three

## Expected Behavior
What should happen

## Actual Behavior
What actually happens

## Environment
- OS: [e.g., Ubuntu 20.04]
- Docker version: [e.g., 20.10.7]
- ZaroPGx version: [e.g., 1.0.0]

## Additional Context
Screenshots, logs, or other relevant information

Feature Requests

Before requesting a feature:

  • Check existing feature requests

  • Consider if it fits the project scope

  • Think about implementation complexity

  • Consider backward compatibility

Feature request template:

## Feature Description
Brief description of the feature

## Use Case
Why is this feature needed?

## Proposed Solution
How should this feature work?

## Alternatives Considered
What other approaches were considered?

## Additional Context
Mockups, examples, or references

Code Contributions

Types of contributions:

  • Bug fixes

  • New features

  • Documentation improvements

  • Performance optimizations

  • Test coverage improvements

Development Workflow

1. Create Feature Branch

# Update main branch
git checkout main
git pull origin main

# Create feature branch
git checkout -b feature/your-feature-name

Branch naming conventions:

  • feature/description: New features

  • bugfix/description: Bug fixes

  • docs/description: Documentation

  • refactor/description: Code refactoring

  • test/description: Test improvements

2. Make Changes

Code style guidelines:

  • Follow PEP 8 for Python code

  • Use type hints

  • Write docstrings for functions

  • Use meaningful variable names

  • Keep functions small and focused

Example code style:

from typing import List, Optional
from sqlalchemy.orm import Session

def process_genomic_data(
    file_path: str,
    db: Session,
    sample_identifier: Optional[str] = None
) -> dict:
    """
    Process genomic data file and return analysis results.
    
    Args:
        file_path: Path to the genomic data file
        db: Database session
        sample_identifier: Optional sample identifier
        
    Returns:
        Dictionary containing analysis results
        
    Raises:
        ValueError: If file format is not supported
    """
    # Implementation here
    pass

3. Write Tests

Test requirements:

  • Write tests for new functionality

  • Maintain test coverage above 80%

  • Use descriptive test names

  • Test both success and failure cases

Example test:

import pytest
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_upload_genomic_data_success():
    """Test successful genomic data upload."""
    with open("test_data/sample.vcf", "rb") as f:
        response = client.post(
            "/upload/genomic-data",
            files={"file": f},
            data={"sample_identifier": "test_sample"}
        )
    
    assert response.status_code == 200
    assert "job_id" in response.json()
    assert response.json()["status"] == "uploaded"

def test_upload_genomic_data_invalid_file():
    """Test upload with invalid file format."""
    with open("test_data/invalid.txt", "rb") as f:
        response = client.post(
            "/upload/genomic-data",
            files={"file": f}
        )
    
    assert response.status_code == 400
    assert "error" in response.json()

4. Update Documentation

Documentation requirements:

  • Update relevant documentation

  • Add docstrings for new functions

  • Update API documentation

  • Add examples for new features

Documentation types:

  • Code comments and docstrings

  • API documentation

  • User guides

  • Developer guides

  • README updates

5. Run Tests and Linting

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Format code
black app/
isort app/

# Lint code
flake8 app/
mypy app/

6. Commit Changes

# Stage changes
git add .

# Commit with descriptive message
git commit -m "feat: add support for CRAM file processing

- Add CRAM file detection and validation
- Implement GATK preprocessing for CRAM files
- Add tests for CRAM processing workflow
- Update documentation with CRAM support"

Commit message format:

type(scope): description

Longer description if needed

- Bullet point 1
- Bullet point 2

Closes #123

Commit types:

  • feat: New feature

  • fix: Bug fix

  • docs: Documentation

  • style: Code style changes

  • refactor: Code refactoring

  • test: Test additions/changes

  • chore: Maintenance tasks

7. Push and Create Pull Request

# Push branch
git push origin feature/your-feature-name

# Create pull request on GitHub

Pull request template:

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## Testing
- [ ] Tests pass locally
- [ ] New tests added
- [ ] All tests pass in CI

## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes

Code Review Process

Review Criteria

Code quality:

  • Follows coding standards

  • Has appropriate tests

  • Handles errors gracefully

  • Is well-documented

Functionality:

  • Solves the intended problem

  • Doesn’t break existing functionality

  • Is performant

  • Is secure

Documentation:

  • Code is self-documenting

  • Docstrings are present

  • README is updated if needed

  • API docs are updated

Review Process

  1. Automated checks must pass

  2. At least one reviewer must approve

  3. All conversations must be resolved

  4. CI/CD pipeline must pass

  5. Maintainer approval for significant changes

Responding to Reviews

Be responsive:

  • Address feedback promptly

  • Ask questions if unclear

  • Explain your reasoning

  • Be open to suggestions

Common responses:

  • “Done” - Simple fix applied

  • “Good catch, fixed” - Bug found and fixed

  • “I disagree because…” - Explain reasoning

  • “Can you clarify…” - Ask for more details

Testing Guidelines

Test Types

Unit tests:

  • Test individual functions

  • Mock external dependencies

  • Test edge cases

  • Aim for high coverage

Integration tests:

  • Test service interactions

  • Use test database

  • Test API endpoints

  • Test error handling

End-to-end tests:

  • Test complete workflows

  • Use real data (anonymized)

  • Test user scenarios

  • Validate outputs

Test Data

Test data requirements:

  • Use anonymized data only

  • Include edge cases

  • Cover different file formats

  • Include invalid data

Test data organization:

tests/
├── fixtures/           # Test fixtures
├── data/              # Test data files
│   ├── valid/         # Valid test files
│   ├── invalid/       # Invalid test files
│   └── edge_cases/    # Edge case files
└── conftest.py        # Test configuration

Documentation Guidelines

Code Documentation

Docstring format:

def process_file(file_path: str, options: dict) -> dict:
    """
    Process a genomic file and return analysis results.
    
    This function handles file validation, preprocessing,
    and analysis orchestration.
    
    Args:
        file_path: Path to the genomic file to process
        options: Dictionary of processing options
        
    Returns:
        Dictionary containing analysis results with keys:
        - status: Processing status
        - results: Analysis results
        - metadata: File metadata
        
    Raises:
        FileNotFoundError: If file doesn't exist
        ValueError: If file format is not supported
        
    Example:
        >>> results = process_file("sample.vcf", {"reference": "hg38"})
        >>> print(results["status"])
        "completed"
    """

API Documentation

Endpoint documentation:

@router.post("/upload/genomic-data", response_model=UploadResponse)
async def upload_genomic_data(
    files: List[UploadFile] = File(...),
    sample_identifier: Optional[str] = Form(None),
    db: Session = Depends(get_db)
):
    """
    Upload genomic data files for pharmacogenomic analysis.
    
    This endpoint accepts various genomic file formats (VCF, BAM, CRAM, SAM, FASTQ)
    and initiates the Nextflow-based processing pipeline.
    
    Args:
        files: List of genomic data files to upload
        sample_identifier: Optional patient/sample identifier
        db: Database session dependency
        
    Returns:
        UploadResponse containing job information
        
    Raises:
        HTTPException: If upload fails or file format is invalid
        
    Example:
        ```bash
        curl -X POST \\
          -F "file=@sample.vcf" \\
          -F "sample_identifier=patient_001" \\
          http://localhost:8765/upload/genomic-data
        ```
    """

Release Process

Version Numbering

Semantic versioning:

  • MAJOR.MINOR.PATCH

  • MAJOR: Breaking changes

  • MINOR: New features (backward compatible)

  • PATCH: Bug fixes (backward compatible)

Examples:

  • 1.0.0: Initial release

  • 1.1.0: New features added

  • 1.1.1: Bug fixes

  • 2.0.0: Breaking changes

Release Checklist

Before release:

  • All tests pass

  • Documentation updated

  • Version numbers updated

  • CHANGELOG updated

  • Release notes prepared

Release process:

  1. Update version in pyproject.toml

  2. Update CHANGELOG.md

  3. Create release branch

  4. Tag release

  5. Create GitHub release

  6. Update documentation

Community Guidelines

Code of Conduct

Be respectful:

  • Use welcoming language

  • Be respectful of differing viewpoints

  • Accept constructive criticism

  • Focus on what’s best for the community

Be collaborative:

  • Help others learn

  • Share knowledge

  • Be patient with newcomers

  • Work together constructively

Getting Help

Resources:

  • GitHub Discussions

  • Issue tracker

  • Documentation

  • Code comments

Asking questions:

  • Search existing issues first

  • Provide context and examples

  • Be specific about the problem

  • Include relevant logs or error messages

Recognition

Contributors

Contributor recognition:

  • Listed in CONTRIBUTORS.md

  • Mentioned in release notes

  • GitHub contributor status

  • Community recognition

Types of contributions:

  • Code contributions

  • Documentation improvements

  • Bug reports

  • Feature requests

  • Community support

Next Steps