This page was composed with the aid of generative artificial intelligence; it is partially curated.

Supported File Formats [NEEDS CURATION]

ZaroPGx supports multiple genomic data formats with automatic conversion and processing.

Variant Call Format (VCF)

VCF is the standard format for storing genetic variant information. ZaroPGx can process VCF 4.x files directly without preprocessing.

Processing Path

VCF → Header Analysis → PyPGx → PharmCAT → Reports

Binary Alignment Map (BAM)

BAM files contain aligned sequencing reads and are commonly used for variant calling and analysis.

Processing Path

BAM → HLA Typing → PyPGx → PharmCAT → Reports

Compressed BAM (CRAM)

CRAM is a compressed version of BAM that uses reference-based compression for smaller file sizes.

Processing Path

CRAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports

Sequence Alignment Map (SAM)

SAM is the text-based format for aligned sequences, often used as an intermediate format.

Processing Path

SAM → GATK (BAM conversion) → HLA Typing → PyPGx → PharmCAT → Reports

FASTQ Format

FASTQ files contain raw sequencing reads with quality scores and are the starting point for most genomic analyses.

Processing Path

FASTQ → HLA Typing → GATK (BAM generation) → PyPGx → PharmCAT → Reports

Reference Genome Support

  • GRCh38/hg38 (Recommended)

  • GRCh37/hg19 (Legacy)

File Size Considerations

Typical File Sizes

Format

Whole Genome

Exome

Targeted Panel

VCF

1-5 GB

50-200 MB

1-10 MB

BAM

50-100 GB

2-5 GB

50-500 MB

CRAM

15-30 GB

500 MB-1 GB

10-100 MB

FASTQ

100-200 GB

5-10 GB

100 MB-1 GB

Next Steps