Chapter 8 Pacbio Shotgun Assembly and Binning

8.1 Data

We’ll use one of the PacBio shotgun metagenome samples obtained from European beech (Fagus sylvatica L.) deadwood. See the “Additional Data” chapter to see more information about the study and additional samples and sequence datatypes that are available for practice.

The data for SRR28211699 has been downloaded (/home/data/metagenomics/pacbio_shotgun/SRR28211699.fastq.gz). It is a sample collected from deadwood from a tree that died between 20 and 41 years before collection. It has ~5.4Gb of data.

8.2 Assembly

Open up a screen. We can just reconnect to the assembly screen we made earlier.

screen -dr assembly

Set up a workspace.

mkdir -p ~/deadwood-assembly/SRR28211699
cd ~/deadwood-assembly/SRR28211699

Activate the environment.

conda activate flye

Assemble the sample with metaFlye.

flye --pacbio-hifi /home/data/metagenomics/pacbio_shotgun/SRR28211699.fastq.gz \
  --iterations 1 --meta --threads 20 -o .

Compress the output. This will create a file called assembly.fasta.gz

gzip assembly.fasta

Check the number of sequences.

zgrep -c '>' assembly.fasta.gz

At this point, you could check the assembly with checkm as we did in the previous chapter. But let’s go ahead and do the binning, then we’ll run checkm on the bins.

8.3 Binning

We will use MetaBAT2 (Metagenome Binning based on Abundance and Tetranucleotide frequency) to bin our assembly sequences by organism. It is in the flye environment, which should already be activated. MetaBAT2 used tetra-nucleotide frequencies and abundance (coverage) to bin sequences from the same organism.

The paper:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6662567/

Best binning practices:
https://bitbucket.org/berkeleylab/metabat/wiki/Best%20Binning%20Practices

Manscript:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6662567/

Make a directory for the binning output.

mkdir metabat2_bins

Bin the assembly contigs.

Parameters –inFile    Assembly contigs (gzipped fasta)
–outFile    Basename and path for output bins
–minContig    Minimum contig size for binning (default: 2500; must be >=1500)
–verbose    Gives more output information

runMetaBat.sh --minContig 1500 \
  assembly.fasta.gz readsxassembly.sort.bam

Get the first 3 columns from the assembly_info.txt file.

cut -f 1-3 assembly_info.txt > depth_file.txt
metabat2 -i assembly.fasta.gz -o metabat2_bins/bin --cvExt -a depth_file.txt --minContig 1500 --verbose

8.4 Quality Check

Now try to run checkm on the bins.