Chapter 7 Visualizing Pangenome Graphs

Compressed de Bruijn graphs “hairballs”: https://academic.oup.com/bioinformatics/article/30/24/3476/2422268

7.1 Set up Directories

Make sure you’re working in a screen
Make directory

mkdir ~/viz

Navigate to the directory

cd ~/viz

Link to data

ln -s /home/data/pangenomics-2503/yprp/ .

7.2 Graphical Fragment Assembly (GFA) format

Originally developed for representing genomes during assembly
Now used for pangenomics
More on this later…

7.3 Bandage

Bandage: https://rrwick.github.io/Bandage/

BLAST integration
- Can build a local BLAST database of the graph
- Can do a web BLAST search with sequences from nodes

More details on making CSV labels: https://github.com/rrwick/Bandage/wiki/CSV-labels

Group exercise:

Copy the following example graph from inbre to your computer using a GUI or the scp command below.

scp -P 2503 jm@inbre.ncgr.org:~/viz/yprp/example/S288C.SK1.minigraph.gfa .

Open Bandage and load the graph
Spend some time exploring the graph
If you were able to install BLAST, find CUP1 and YHR054C via BLAST. You can download them from the links or there is a fasta file on the server with both genes sequences (/home/data/pangenomics-2503/yprp/CUP1/cup1.yhr054c.fa)

How many copies are in the graph?
What does the structure it’s in look like?
Take a screenshot of the region that CUP1 is in with the gene colored

Group Exercise

Find an interesting structure in Bandage
Get its node ID(s)
Do a web-blast
Share it with the group

VG Graph Exercise

Take a look at the VG graphs you just made (S288C.unchop.gfa and S288C.gfa) in BANDAGE
BLAST the CUP1 and YHR054C genes against the graph, and view the BLAST hits.

7.4 BLAST the graph manually

You can also blast the graph in linux by first converting the graph to fasta format. Doublecheck that you are still in the ~/viz/ directory. Otherwise, cd into that directory.

Create a FASTA file containing the graph sequence

gfatools gfa2fa yprp/example/S288C.SK1.minigraph.gfa > S288C.SK1.fa

Build a BLAST database for the FASTA

makeblastdb -in S288C.SK1.fa -input_type fasta -dbtype nucl

-in S288C.SK1.fa
- the file to build a database for
-input_type fasta
- the input file is a FASTA
-dbtype nucl
- the type of sequence in the input file is DNA

Query the database for CUP1 and YHR054C

blastn -outfmt 6 -db S288C.SK1.fa -query yprp/CUP1/cup1.yhr054c.fa > genesXpangenomefasta.txt

VG Graph Exercise

BLAST the CUP1 and YHR054C genes against the VG graphs (S288C.unchop.gfa and S288C.gfa).