Supported File Formats [NEEDS CURATION]
ZaroPGx supports multiple genomic data formats with automatic conversion and processing.
Variant Call Format (VCF)
VCF is the standard format for storing genetic variant information. ZaroPGx can process VCF 4.x files directly without preprocessing.
Processing Path
VCF → Header Analysis → PyPGx → PharmCAT → Reports
Binary Alignment Map (BAM)
BAM files contain aligned sequencing reads and are commonly used for variant calling and analysis.
Processing Path
BAM → HLA Typing → PyPGx → PharmCAT → Reports
Compressed BAM (CRAM)
CRAM is a compressed version of BAM that uses reference-based compression for smaller file sizes.
Processing Path
CRAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports
Sequence Alignment Map (SAM)
SAM is the text-based format for aligned sequences, often used as an intermediate format.
Processing Path
SAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports
FASTQ Format
FASTQ files contain raw sequencing reads with quality scores and are the starting point for most genomic analyses.
Processing Path
FASTQ → HLA Typing → GATK (BAM generation) → PyPGx → PharmCAT → Reports
Reference Genome Support
GRCh38/hg38 (Recommended)
GRCh37/hg19 (Legacy)
File Size Considerations
Typical File Sizes
Format |
Whole Genome |
Exome |
Targeted Panel |
|---|---|---|---|
VCF |
1-5 GB |
50-200 MB |
1-10 MB |
BAM |
50-100 GB |
2-5 GB |
50-500 MB |
CRAM |
15-30 GB |
500 MB-1 GB |
10-100 MB |
FASTQ |
100-200 GB |
5-10 GB |
100 MB-1 GB |
Next Steps
Learn about usage: User Guide
Understand reports: Understanding Reports
Configure processing: Advanced Configuration
Troubleshoot issues: Troubleshooting Guide