Chapter 12 Pangenomics in the Cloud
12.1 Constructing Pangenomes in the Cloud
12.1.1 toil-vg
https://github.com/vgteam/toil-vg
- “a Toil-based framework for running common vg pipelines at scale”
- Supports vg
construct
,index
,map
, andcall
pipelines- Giraffe mapping algorithm can be used in
map
pipeline
- Giraffe mapping algorithm can be used in
12.1.2 minigraph
- Does not support cluster/cloud computation
- Can be done manually by building a separate graph for each chromosome
- These graphs can be combined using
vg combine
12.1.4 pggb
https://github.com/nf-core/pangenome
- Nextflow pipline
- Can be run on cluster or in the cloud
- Most significantly parallelizes all-vs-all alignments
12.2 NIH Cloud Lab and NIGMS Sandbox
Cloud Lab: https://cloud.nih.gov/resources/cloudlab/
- Provides National Institutes of Health (NIH) staff, affiliated researchers (e.g. NIH funded researchers), and students up to 90 days of access to a cloud account and $500 of credits to explore cloud capabilities for research and access bioinformatic tutorials and data sets
- Cloud Lab currently offers Amazon Web Services, Google Cloud Platform, and Microsoft Azure accounts
NIGMS Sandbox: https://github.com/NIGMS/NIGMS-Sandbox
- A cloud-based learning platform that teaches students, researchers, and clinicians how to harness cloud technology for life sciences applications and research.
- Hosted on GitHub
- Currently has 12 learning modules that are publicly available for self-learning and for use in a classroom setting
- Each module represents a unique use case or scientific workflow and is delivered through interactive step-by-step tutorials, quizzes, and visualizations

Existing NIGMS Sandbox modules

NIGMS Sandbox article
12.2.1 NM-INBRE Pangenomics Module
https://github.com/ncgr/NIGMS-Sandbox-Pangenomics-Module
- Intro to pangenomics
- Graph building with pggb
- Read mapping and variant calling with vg
- Google Cloud Platform (GCP) Vertex AI Workbench
- Free to use for anyone