Chapter 7 Visualizing Pangenome Graphs

Compressed de Bruijn graphs “hairballs”: https://academic.oup.com/bioinformatics/article/30/24/3476/2422268
Compressed de Bruijn graphs “hairballs”: https://academic.oup.com/bioinformatics/article/30/24/3476/2422268

7.1 Set up Directories

  1. Make sure you’re working in a screen

  2. Make directory

mkdir ~/viz
  1. Navigate to the directory
cd ~/viz
  1. Link to data
ln -s /home/data/pangenomics-2503/yprp/ .

7.2 Graphical Fragment Assembly (GFA) format

  • Originally developed for representing genomes during assembly
  • Now used for pangenomics
  • More on this later…

7.3 Bandage

  • BLAST integration
    • Can build a local BLAST database of the graph
    • Can do a web BLAST search with sequences from nodes
  1. More details on making CSV labels: https://github.com/rrwick/Bandage/wiki/CSV-labels

Group exercise:

  1. Copy the following example graph from inbre to your computer using a GUI or the scp command below.
scp -P 2503 jm@inbre.ncgr.org:~/viz/yprp/example/S288C.SK1.minigraph.gfa .
  1. Open Bandage and load the graph
  2. Spend some time exploring the graph
  3. If you were able to install BLAST, find CUP1 and YHR054C via BLAST. You can download them from the links or there is a fasta file on the server with both genes sequences (/home/data/pangenomics-2503/yprp/CUP1/cup1.yhr054c.fa)
  • How many copies are in the graph?
  • What does the structure it’s in look like?
  • Take a screenshot of the region that CUP1 is in with the gene colored

Group Exercise

  1. Find an interesting structure in Bandage
  2. Get its node ID(s)
  3. Do a web-blast
  4. Share it with the group

VG Graph Exercise

  1. Take a look at the VG graphs you just made (S288C.unchop.gfa and S288C.gfa) in BANDAGE
  2. BLAST the CUP1 and YHR054C genes against the graph, and view the BLAST hits.

7.4 BLAST the graph manually

You can also blast the graph in linux by first converting the graph to fasta format. Doublecheck that you are still in the ~/viz/ directory. Otherwise, cd into that directory.

Create a FASTA file containing the graph sequence

gfatools gfa2fa yprp/example/S288C.SK1.minigraph.gfa > S288C.SK1.fa

Build a BLAST database for the FASTA

makeblastdb -in S288C.SK1.fa -input_type fasta -dbtype nucl
  • -in S288C.SK1.fa
    • the file to build a database for
  • -input_type fasta
    • the input file is a FASTA
  • -dbtype nucl
    • the type of sequence in the input file is DNA

Query the database for CUP1 and YHR054C

blastn -outfmt 6 -db S288C.SK1.fa -query yprp/CUP1/cup1.yhr054c.fa > genesXpangenomefasta.txt

VG Graph Exercise

BLAST the CUP1 and YHR054C genes against the VG graphs (S288C.unchop.gfa and S288C.gfa).