Chapter 8 Metagenome Assembly
Let’s do an assembly of the Oxford Nanopore (ONT) reads from the red alder nodules. We will use the reads that have had red alder removed so we are only assembling the microbes. We’ll use the metagenomics module of the Flye assembler.
Make a directory to store the Red Alder reference genome mkdir -p /home/username/minion/fasta
Copy or create a soft link to the Red Alder genome
ln -s /home/data/metagenomics-2310/red-alder-reads/red-alder-genome.fasta
/home/username/minion/fasta
export the metaflye environment to you path
export PATH=/sw7/compbio/flye/Flye/bin:\(PATH Make a directory to store your output files. Something like. mkdir -p /home/username/minion/flye Assemble the metagenome with flye took ~17 minutes. flye --nano-raw /home/username/minion/fastq/B15_unmap_sort.fastq \ -i 1 --meta -t 8 -g 45m -o /home/username/minion/flye Deactivate PYCOQC_ENV deactivate Assess the Quality of the Root Nodule Metagenome Assemblies with CheckM 3 Activate CheckM environment (CHECKM_ENV) source /sw7/compbio/environments/CHECKM_ENV/bin/activate Export CheckM dependencies to your path export PATH=/sw7/compbio/pplacer/pplacer-Linux-v1.1.alpha19:\)PATH
export PATH=/sw7/compbio/hmmer/hmmer-3.3/bin:\(PATH export PATH=/sw7/compbio/prodigal/Prodigal-v.2.6.3:\)PATH
make a directory to store the results of checkm
mkdir -p /home/username/minion/checkm
CheckM at the genus level
checkm taxonomy_wf genus Frankia /home/username/minion/flye
/home/username/minion/checkm -x fasta -t 8 -f
/home/username/minion/checkm/B15_tax_results.txt
Plot the genome completeness, contamination and strain heterogeneity stats.
checkm marker_plot -x fasta –image_type png –dpi 600
–font_size 14 –height 5
/home/username/minion/checkm
/home/username/minion/flye
/home/username/minion/checkm
Let’s look at the assembly metrics generated by checkM
less /home/username/minion/checkm/storage/bin_stats.analyze.tsv
Download the marker_plot to your computer
Taxonomic Classification of Assembled Metagenome Contigs Using Centrifuge
In this portion of the tutorial we will use the pre-built indexes that contain complete RefSeq genomes for archea, bacteria, viruses and fungi.
• Classify reads with Centrifuge.
• command flags:
-x Path to the indexes
– Index location /home/metag/centrifuge/complete_genomes/Arc_Bac_Vir_Hum_Eupath_v2
-1 Read 1 fastq file
-2 Read 2 fastq file
-t Print wall-clock time taken by search phase. This is optional. -p Number of alignment threads to launch
–met-file Send metrics to file at
-U home/username/minion/flye/assembly.fasta
-t -p 16 –met-file
/home/username/minion/centrifuge/B15_sample_assem_met.txt
–met-stderr
-S /home/username/minion/centrifuge/B15_assem.centrifuge.out
–report-file /home/username/minion/centrifuge/B15_assem_report.tsv
awk ’NR%4==1{a=substr($0,2);}NR%4==2{print “>”a”“\(0}' \
/home/username/minion/fastq/B15_unmap_sort.fastq \
> /home/username/minion/fasta/B15_unmap_sort.fasta # Run Centrifuge. centrifuge -x /home/metag/centrifuge/complete_genomes/Arc_Bac_Vir_Hum_Eupath_v2 -f \
-U /home/username/minion/fasta/B15_unmap_sort.fasta \
-t -p 4 \
--met-file /home/username/minion/centrifuge/B15_reads_met.txt \
--met-stderr \
-S /home/username/minion/centrifuge/B15_reads.centrifuge.out \
--report-file /home/username/minion/centrifuge/B15_reads_report.tsv Convert the Centrifuge Output to a Kraken Style Report • Create a Kraken style report from the Centrifuge output using the centrifuge-kreport command. This command used about 17 GB of memory and took about 1 minute per file to complete during testing. • command flags: -x Path to the indexes. Visualize the Kraken Reports with Pavian • First, download your *.krn file(s) to your computer by copying the files into your www directory or using secure copy (scp) with unix, linux, or git bash terminals. • Install Pavian on you computers by following the instruction below. • This assumes that you have installed R on your computer. • Install Pavian if you haven’t done so. 5 # Use a for loop to convert the centrifuge output files to # kraken style reports. for cent in /home/username/minion/centrifuge/*.out; do \
centrifuge-kreport -x /home/metag/centrifuge/complete_genomes/ \Arc_Bac_Vir_Hum_Eupath_v2 \
\)cent > ${cent}.krn; done
# Copy the kraken style reports to your computer.
scp -oPort=44111 jsena@gateway.training.ncgr.org:/home/jsena/minion/centrifuge/*.krn .
# Run R by opening a terminal
R
# Or click on your R application from R-studio or R-console.
if (!require(remotes)) { install.packages(“remotes”) } ## Loading required package: remotes
remotes::install_github(“fbreitwieser/pavian”) Skipping install of ’pavian’ from a github remote, the SHA1 (81d784d8) has not changed since last install. Use ’force = TRUE‘ to force installation • Start Pavian in your browser from R • A web browser should pop up that looks like Figure 1. If not, Pavian can be accessed by entering this url into your browser. http://127.0.0.1:5000 • Load data into Pavian by clicking on the “Upload sample set” tab on the left hand side of the screen. See Figure 2. A new window should appear. Select your data files from the new window and click “Open”. • Click on the “Results Overview” tab on the left side of the screen. You should see a page that looks like Figure 3. • Click on the “Sample” tab on the left side of the screen. You should see a page that looks like Figure 4. You will notice an interactive Sankey chart that displays different classified taxa. You can click and drag nodes to reorder the nodes if you want. You can select different samples by clicking on the down arrow next to the “Select sample” header. • From the “Sample” page, click on the “Table” tab. You should see an interactive, filterable and searchable table that summarizes the results of Centrifuge. For example, type “Frankia” into the “Search:” box. Figure 1: Pavian Browser Interface pavian::runApp(port=5000)
6
Figure 2: Load Data into Pavian
Figure 3: Result Overview Page 7
Figure 4: Sample Page
Figure 5: Sample Page Table View 8
Remove

The tree should pop up. The cladogram just shows the groupings. If you choose real, it will change the branch lengths to the actual branch lengths.

What do you notice about the tree?
Click for Possible Answers
The branch lengths are really small indicating that everything is closely related.
Alpha and Omicron are the most closely related, at least according to this tree.
conda activate raxml-ng