Chapter 8 Human Pangenomes

8.1 Draft Human Pangenome Reference

8.1.1 Samples

47 phased, diploid genomes (aim for 350)

  • 29 lymphoblastoid cell lines
    • “limiting selection to those lines classified as karyotypically normal and with low passage (to avoid artefacts from cell culture)”
  • 18 sequenced by others (some supplemented)
    • Aimed for “genetic and biogeographic diversity”

Population_Code Description Super_Population_Code

ASW American’s of African Ancestry in SW USA AFR
ACB African Carribean in Barbados AFR
PUR Puerto Rican from Puerto Rica AMR
CLM Colombian from Medellian, Colombia AMR
PEL Peruvian from Lima, Peru AMR

MSL Mende in Sierra Leone AFR
GWD Gambian in Western Division AFR
YRI Yoruba in Ibadan, Nigera AFR
ESN Esan in Nigera AFR
MKK Maasai in Kinyawa, Kenya AFR

PJL Punjabi in Lahore, Pakistan SAS

CHS Southern Han Chinese EAS
KHV Kinh in Ho Chi Minh City, Vietnam EAS

Super-Populations

AFR, African
AMR, Ad Mixed American
EAS, East Asian
EUR, European
SAS, South Asian

8.1.2 Strategy

Sequencing

PacBio HiFi
Oxford Nanopore
Bionano optical maps
High-coverage Hi-C
Illumina short-read sequencing
High-coverage Illumina sequencing data for both parents

Assembly

Trio-HiFiasm

Graphs

Minigraph
* Fast pangenome graph builder based on the minimap2 aligner
* Only structural variation >=50nt

Minigraph-Cactus (MC)
* Refines minigraph output to include SNPs and other small variants
* Rewrote minigraph to write chains of minimizers
* Rewrote cactus to be able to read in minigraph output

PanGenome Graph Builder (PGGB)
* All pairwise genome assembly alignments -> graph
* Uses graph normalization to make sure that chromosome paths are linear
* Allows for cyclic graph structures that capture structural variation.

8.1.3 Results

More Genetic Variation Captured

The pangenome captures more polymorphic sequences

  • 119 Mb of euchromatic polymorphic sequences
    • 90 MB = structural variation
  • 1,115 gene duplications

We can align more (short) reads to the pangenome

We can call more variants more accurately

Aligning short reads to the pangenome

  • lowered error in small variants by 34%
  • increased structural variants calls per haplotype by 104% (“vast majority”)

We can call variants across a broad set of populations

Variation in complex, medically-relevant Regions

HLA region (helps the immune system distinguish between self and invader)

HLA genes
HLA genes
HLA haplotypes
HLA haplotypes

Rh region (involved in Rh blood type)

Rh genes
Rh genes
Rh haplotypes
Rh haplotypes

8.2 Complex Variation

8.2.1 Samples

65 phased, diploid human genomes

Closed 92% gaps from previous assemblies

Telomere-to-telomere (T2T) for 39% of chromosomes

8.2.2 Results

26,115 SVs per sample

1,852 complex structural variants (SVs) resolved

MHC region
MHC region