• Pangenomics Workshop NCGR
  • 1 License and Copyright
  • 2 Prerequisites
    • 2.1 NCGR Workshop Server
    • 2.2 Software
    • 2.3 Runnning Singularity
    • 2.4 Data
  • 3 Agenda
  • 4 Linux
    • 4.1 Connecting to the linux server
    • 4.2 What are Linux, bash, and Ghostwheel?
      • A little shell… aka the $ prompt is the command line interface
      • Directory Structure
      • 4.2.1 Find the shell in system you’ll use to log into the NCGR’s server
      • 4.2.2 Log on to Ghostwheel server
      • 4.2.3 Now that I logged on, where am I?
    • 4.3 Linux basics: Part I:
      • 4.3.1 Understanding Directories
      • 4.3.2 Listing options
      • 4.3.3 Navigation
      • 4.3.4 Files: creating with touch command
      • 4.3.5 History command
      • 4.3.6 Files: creating by redirecting standard out
      • 4.3.7 File name completion with tab
      • 4.3.8 Files: moving files from one filename to another
      • 4.3.9 Files: copying files from one filename to another
      • 4.3.10 Files: securely copying files between your laptop and Ghostwheel
      • 4.3.11 Files and directories: removing files is deleting files
      • 4.3.12 Tool box: How to abort a command/process
    • 4.4 Linux basics: Part II
      • 4.4.1 Files: Symbolic links and the soft link (-s)
      • 4.4.2 Understanding a fasta file format
      • 4.4.3 Understanding fastq (fq) file format
      • 4.4.4 Using grep (global regular expression print) to extract metrics
      • 4.4.5 Working with compressed files
      • 4.4.6 Start ^ and end $ symbols
      • 4.4.7 Files: parsing and creating data-subsets
      • 4.4.8 Files: parsing and creating data-subsets
      • 4.4.9 Files: parsing and creating data-subsets
      • 4.4.10 Files: parsing and creating data-subsets
      • 4.4.11 Revisiting table1 and previous awk command
      • 4.4.12 Files: Stream EDitor (sed)
      • 4.4.13 The Bash “for” Loop
      • 4.4.14 Help with command syntax
      • 4.4.15 Exercises
    • 4.5 Linux basics: Part III
      • Using the Screen Command
      • Exercise
      • 4.5.1 More Exercises
  • 5 Introduction to Pangenomics
    • 5.1 What is a “pangenome”?
      • 5.1.1 Open vs. Closed Genomes
      • 5.1.2 Then vs. Now
      • 5.1.3 “Pangenome” Today
      • 5.1.4 The Benefit of Pangenomes
      • 5.1.5 What are pangenomes good for?
    • 5.2 Computational Pangenomics
      • 5.2.1 Pangenome Representations
      • 5.2.2 Variation Graphs
      • 5.2.3 Types of Variation Graphs
      • 5.2.4 Mapping Reads to Variation Graphs
    • 5.3 Pangenome Data Sets
      • 5.3.1 Data/Yeast Genomes:
      • 5.3.2 Yeast Assemblies
      • 5.3.3 SK1 Illumina Reads
      • 5.3.4 CUP1 Gene
      • 5.3.5 We Changed the Names
  • 6 VG Toolkit
    • 6.1 Variation Graph (VG) Toolkit
    • 6.2 Reference Graphs
      • 6.2.1 Pipeline
      • 6.2.2 Preparing the Input
      • 6.2.3 Set up directories
      • 6.2.4 Variant Call (VCF) Format
      • 6.2.5 Variation Graph (VG) Format
      • 6.2.6 Graphical Fragment Assembly (GFA) Format
      • 6.2.7 Yeast Data
      • 6.2.8 Construct Graph
      • 6.2.9 Viewing with Bandage
      • 6.2.10 Converting to GFA
    • 6.3 Graph Indexing
      • 6.3.1 Pipeline
      • 6.3.2 VG Index Formats
      • 6.3.3 Indexing
    • 6.4 Read Mapping
      • 6.4.1 Pipeline
      • 6.4.2 Graph Alignment/Map (GAM) Format
      • 6.4.3 Read Mapping
      • 6.4.4 Bringing Alignments Back to Single Genomes
      • Exercise:
      • 6.4.5 Preparing the BAM IGV (or other genome viewer)
    • 6.5 Calling Graph Supported Variants
      • 6.5.1 Pipeline
      • 6.5.2 Packed-Graph Format
      • 6.5.3 Pack (pileup support) Format
      • 6.5.4 Snarls Format
      • 6.5.5 Calling Graph-Supported Variants
    • 6.6 Calling Novel Variants
      • 6.6.1 Pipeline
      • 6.6.2 Calling Variants
    • 6.7 Calling Variants from Graph Structure
      • 6.7.1 Pipeline
    • 6.8 Calling Variants Already in the Graph
    • 6.9 Pros and Cons Reference Graphs
    • 6.10 Additional vg Commands that are Useful
      • 6.10.1 vg combine
      • 6.10.2 Modifying and Simplifying Graphs
    • 6.11 Drawing Graphs
  • 7 Visualizing Pangenome Graphs
    • 7.1 Set up Directories
    • 7.2 Graphical Fragment Assembly (GFA) format
    • 7.3 Bandage
      • Group exercise:
      • Group Exercise
      • VG Graph Exercise
    • 7.4 BLAST the graph manually
      • VG Graph Exercise
  • 8 Human Pangenomes
    • 8.1 Draft Human Pangenome Reference
      • 8.1.1 Samples
      • 8.1.2 Strategy
      • 8.1.3 Results
    • 8.2 Complex Variation
      • 8.2.1 Samples
      • 8.2.2 Results
      • 8.2.3 Data Availability
  • 9 Minigraph
    • 9.1 Minigraph Functions
    • 9.2 Minigraph Overview
    • 9.3 Minimizers and Minimap2
      • Minimizers
      • Minimap2
    • 9.4 Pipeline
    • 9.5 Yeast Assemblies
    • 9.6 Prepare the Input
    • 9.7 Graphical Fragment Assembly (GFA) format
    • 9.8 reference Graphical Fragment Assembly (rGFA)
    • 9.9 Build rGFA Graphs
    • 9.10 Reference Graph
      • Reference Graph Bandage Visualization
    • 9.11 YPRP Graphs
      • YPRP Graph Statistics
      • YPRP Graph in Bandage
    • 9.12 Structures in the graph
      • Insertions and Diverged Regions
      • Group Exercise
      • Inversions
      • Inversions in the GFA
    • 9.13 Questions
    • 9.14 Graph to Fasta
      • FASTA questions
    • 9.15 GAF format (read alignments)
    • 9.16 Read Mapping
    • 9.17 Read Mapping Stats
    • 9.18 Structural Variant Calling
      • Structural Variant Stats
    • 9.19 CUP1
      • Visualize the CUP1 region
      • CUP1 Paths in Y12
      • CUP1 Paths in all yeast genomes
    • 9.20 Minigraph Pros and Cons
    • 9.21 Minigraph Exercises
      • Start with another reference
      • Another Yeast Dataset
      • Human GFA
      • Convert to VG and call variants
  • 10 Minigraph-Cactus
    • 10.1 Cactus
    • 10.2 Cactus Graphs
    • 10.3 Cactus Algorithm
    • 10.4 Minigraph-Cactus
      • 10.4.1 Pipeline
      • 10.4.2 Set up Directories
    • 10.5 Building the Graph
      • 10.5.1 Preparing the Input (exercise)
      • 10.5.2 Singularity
      • 10.5.3 Cactus
    • 10.6 Read Mapping and Variant Calling
      • 10.6.1 Reading Mapping
      • 10.6.2 Calling Graph-Supported Variants
      • 10.6.3 Calling Novel Variants
      • 10.6.4 Exercise
    • 10.7 BLAST the graph manually
    • 10.8 Viewing with Bandage
      • 10.8.1 Exercise
    • 10.9 Pros and Cons minigraph-cactus
  • 11 Reference-Free Graphs with pggb
    • 11.1 pggb
    • 11.2 PanGenome Graph Builder
    • 11.3 Reference-Free Graphs
    • 11.4 pggb Algorithm
    • 11.5 Pipeline
      • 11.5.1 Set up Directories
    • 11.6 Yeast Data
    • 11.7 Prepare the Input
    • 11.8 Running pggb on Chromosome VIII
    • 11.9 Viewing with Bandage
    • 11.10 Exercises
    • 11.11 Running pggb on all Chromosomes
      • 11.11.1 Exercise
    • 11.12 Pros and Cons
  • 12 Pangenomics in the Cloud
    • 12.1 Constructing Pangenomes in the Cloud
      • 12.1.1 toil-vg
      • 12.1.2 minigraph
      • 12.1.3 minigraph-cactus
      • 12.1.4 pggb
    • 12.2 NIH Cloud Lab and NIGMS Sandbox
      • 12.2.1 NM-INBRE Pangenomics Module
  • 13 Wrap Up
    • 13.1 Acknowledgements
    • 13.2 License and Copyright
    • 13.3 Survey
    • 13.4 Questions
    • 13.5 Server access and acknowledgements
    • 13.6 Bookdown document
    • 13.7 Zoom recordings
  • © National Center for Genome Resources
  • Published with bookdown

Pangenomics Workshop NCGR

Pangenomics Workshop NCGR

Chapter 1 License and Copyright

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 https://creativecommons.org/licenses/by-nc-nd/4.0/

© 2023-2025 National Center for Genome Resources

Please let us know who is using this document so we can account for them on our annual reports (thank you!) (you can email Ethan: eprice@ncgr.org and cc inbre@ncgr.org).

This document is available at https://inbre.ncgr.org/pangenomics-workshop