--- title: Supported File Formats --- # Supported File Formats [NEEDS CURATION] ZaroPGx supports multiple genomic data formats with automatic conversion and processing. ## Variant Call Format (VCF) VCF is the standard format for storing genetic variant information. ZaroPGx can process VCF 4.x files directly without preprocessing. ### Processing Path ``` VCF → Header Analysis → PyPGx → PharmCAT → Reports ``` ## Binary Alignment Map (BAM) BAM files contain aligned sequencing reads and are commonly used for variant calling and analysis. ### Processing Path ``` BAM → HLA Typing → PyPGx → PharmCAT → Reports ``` ## Compressed BAM (CRAM) CRAM is a compressed version of BAM that uses reference-based compression for smaller file sizes. ### Processing Path ``` CRAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports ``` ## Sequence Alignment Map (SAM) SAM is the text-based format for aligned sequences, often used as an intermediate format. ### Processing Path ``` SAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports ``` ## FASTQ Format FASTQ files contain raw sequencing reads with quality scores and are the starting point for most genomic analyses. ### Processing Path ``` FASTQ → HLA Typing → GATK (BAM generation) → PyPGx → PharmCAT → Reports ``` ## Reference Genome Support - **GRCh38/hg38** (Recommended) - **GRCh37/hg19** (Legacy) ## File Size Considerations ### Typical File Sizes | Format | Whole Genome | Exome | Targeted Panel | |--------|--------------|-------|----------------| | **VCF** | 1-5 GB | 50-200 MB | 1-10 MB | | **BAM** | 50-100 GB | 2-5 GB | 50-500 MB | | **CRAM** | 15-30 GB | 500 MB-1 GB | 10-100 MB | | **FASTQ** | 100-200 GB | 5-10 GB | 100 MB-1 GB | ## Next Steps - **Learn about usage**: {doc}`usage` - **Understand reports**: {doc}`reports` - **Configure processing**: {doc}`../advanced-configuration` - **Troubleshoot issues**: {doc}`troubleshooting`