Pangenomics Workshop NCGR
1
License and Copyright
2
Prerequisites
2.1
NCGR Workshop Server
2.2
Software
2.3
Runnning Singularity
2.4
Data
3
Agenda
4
Linux
4.1
Connecting to the linux server
4.2
What are Linux, bash, and Ghostwheel?
A little shell… aka the $ prompt is the command line interface
Directory Structure
4.2.1
Find the shell in system you’ll use to log into the NCGR’s server
4.2.2
Log on to Ghostwheel server
4.2.3
Now that I logged on, where am I?
4.3
Linux basics: Part I:
4.3.1
Understanding Directories
4.3.2
Listing options
4.3.3
Navigation
4.3.4
Files: creating with touch command
4.3.5
History command
4.3.6
Files: creating by redirecting standard out
4.3.7
File name completion with tab
4.3.8
Files: moving files from one filename to another
4.3.9
Files: copying files from one filename to another
4.3.10
Files: securely copying files between your laptop and Ghostwheel
4.3.11
Files and directories: removing files is deleting files
4.3.12
Tool box: How to abort a command/process
4.4
Linux basics: Part II
4.4.1
Files: Symbolic links and the soft link (-s)
4.4.2
Understanding a fasta file format
4.4.3
Understanding fastq (fq) file format
4.4.4
Using grep (global regular expression print) to extract metrics
4.4.5
Working with compressed files
4.4.6
Start ^ and end $ symbols
4.4.7
Files: parsing and creating data-subsets
4.4.8
Files: parsing and creating data-subsets
4.4.9
Files: parsing and creating data-subsets
4.4.10
Files: parsing and creating data-subsets
4.4.11
Revisiting table1 and
previous
awk command
4.4.12
Files:
S
tream
ED
itor (sed)
4.4.13
The Bash “for” Loop
4.4.14
Help with command syntax
4.4.15
Exercises
4.5
Linux basics: Part III
Using the Screen Command
Exercise
4.5.1
More Exercises
5
Introduction to Pangenomics
5.1
What is a “pangenome”?
5.1.1
Open vs. Closed Genomes
5.1.2
Then vs. Now
5.1.3
“Pangenome” Today
5.1.4
The Benefit of Pangenomes
5.1.5
What are pangenomes good for?
5.2
Computational Pangenomics
5.2.1
Pangenome Representations
5.2.2
Variation Graphs
5.2.3
Types of Variation Graphs
5.2.4
Mapping Reads to Variation Graphs
5.3
Pangenome Data Sets
5.3.1
Data/Yeast Genomes:
5.3.2
Yeast Assemblies
5.3.3
SK1 Illumina Reads
5.3.4
CUP1 Gene
5.3.5
We Changed the Names
6
VG Toolkit
6.1
Variation Graph (VG) Toolkit
6.2
Reference Graphs
6.2.1
Pipeline
6.2.2
Preparing the Input
6.2.3
Set up directories
6.2.4
Variant Call (VCF) Format
6.2.5
Variation Graph (VG) Format
6.2.6
Graphical Fragment Assembly (GFA) Format
6.2.7
Yeast Data
6.2.8
Construct Graph
6.2.9
Viewing with Bandage
6.2.10
Converting to GFA
6.3
Graph Indexing
6.3.1
Pipeline
6.3.2
VG Index Formats
6.3.3
Indexing
6.4
Read Mapping
6.4.1
Pipeline
6.4.2
Graph Alignment/Map (GAM) Format
6.4.3
Read Mapping
6.4.4
Bringing Alignments Back to Single Genomes
Exercise:
6.4.5
Preparing the BAM IGV (or other genome viewer)
6.5
Calling Graph Supported Variants
6.5.1
Pipeline
6.5.2
Packed-Graph Format
6.5.3
Pack (pileup support) Format
6.5.4
Snarls Format
6.5.5
Calling Graph-Supported Variants
6.6
Calling Novel Variants
6.6.1
Pipeline
6.6.2
Calling Variants
6.7
Calling Variants from Graph Structure
6.7.1
Pipeline
6.8
Calling Variants Already in the Graph
6.9
Pros and Cons Reference Graphs
6.10
Additional vg Commands that are Useful
6.10.1
vg combine
6.10.2
Modifying and Simplifying Graphs
6.11
Drawing Graphs
7
Visualizing Pangenome Graphs
7.1
Set up Directories
7.2
Graphical Fragment Assembly (GFA) format
7.3
Bandage
Group exercise:
Group Exercise
VG Graph Exercise
7.4
BLAST the graph manually
VG Graph Exercise
8
Human Pangenomes
8.1
Draft Human Pangenome Reference
8.1.1
Samples
8.1.2
Strategy
8.1.3
Results
8.2
Complex Variation
8.2.1
Samples
8.2.2
Results
8.2.3
Data Availability
9
Minigraph Cactus
9.1
Minigraph Functions
9.2
Minigraph Overview
9.3
Minimizers and Minimap2
Minimizers
Minimap2
9.4
Pipeline
9.5
Yeast Assemblies
9.6
Prepare the Input
9.7
Graphical Fragment Assembly (GFA) format
9.8
reference
Graphical Fragment Assembly (rGFA)
9.9
Build rGFA Graphs
9.10
Reference Graph
Reference Graph Bandage Visualization
9.11
YPRP Graphs
YPRP Graph Statistics
YPRP Graph in Bandage
9.12
Structures in the graph
Insertions and Diverged Regions
Group Exercise
Inversions
Inversions in the GFA
9.13
Questions
9.14
Graph to Fasta
FASTA questions
9.15
GAF format (read alignments)
9.16
Read Mapping
9.17
Read Mapping Stats
9.18
Structural Variant Calling
Structural Variant Stats
9.19
CUP1
Visualize the CUP1 region
CUP1 Paths in Y12
CUP1 Paths in all yeast genomes
9.20
Minigraph Pros and Cons
9.21
Minigraph Exercises
Start with another reference
Another Yeast Dataset
Human GFA
Convert to VG and call variants
10
Minigraph-Cactus
10.1
Cactus
10.2
Cactus Graphs
10.3
Cactus Algorithm
10.4
Minigraph-Cactus
10.4.1
Pipeline
10.4.2
Set up Directories
10.5
Building the Graph
10.5.1
Preparing the Input (exercise)
10.5.2
Singularity
10.5.3
Cactus
10.6
Read Mapping and Variant Calling
10.6.1
Reading Mapping
10.6.2
Calling Graph-Supported Variants
10.6.3
Calling Novel Variants
10.6.4
Exercise
10.7
BLAST the graph manually
10.8
Viewing with Bandage
10.8.1
Exercise
10.9
Pros and Cons PGGB
11
Reference-Free Graphs with
pggb
11.1
pggb
11.2
PanGenome Graph Builder
11.3
Reference-Free Graphs
11.4
pggb Algorithm
11.5
Pipeline
11.5.1
Set up Directories
11.6
Yeast Data
11.7
Prepare the Input
11.8
Running pggb on Chromosome VIII
11.9
Viewing with Bandage
11.10
Exercises
11.11
Running pggb on all Chromosomes
11.11.1
Exercise
11.12
How do you handle larger graphs? Nextflow
11.13
Pros and Cons
12
NIGMS Sandbox
13
Wrap Up
13.1
Acknowledgements
13.2
License and Copyright
13.3
Survey
13.4
Questions
13.5
Server access and acknowledgements
13.6
Bookdown document
13.7
Zoom recordings
© National Center for Genome Resources
Published with bookdown
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
Spacing -
Spacing +
PDF
EPUB
Pangenomics Workshop NCGR
Chapter 12
NIGMS Sandbox