Chapter 8 Pacbio Shotgun Assembly and Binning
8.1 Data
We’ll use one of the PacBio shotgun metagenome samples obtained from European beech (Fagus sylvatica L.) deadwood. See the “Additional Data” chapter to see more information about the study and additional samples and sequence datatypes that are available for practice.
The data for SRR28211699 has been downloaded (/home/data/metagenomics/pacbio_shotgun/SRR28211699.fastq.gz). It is a sample collected from deadwood from a tree that died between 20 and 41 years before collection. It has ~5.4Gb of data.
8.2 Assembly
Open up a screen. We can just reconnect to the assembly screen we made earlier.
Set up a workspace.
Activate the environment.
Assemble the sample with metaFlye.
flye --pacbio-hifi /home/data/metagenomics/pacbio_shotgun/SRR28211699.fastq.gz \
--iterations 1 --meta --threads 20 -o .Compress the output. This will create a file called assembly.fasta.gz
Check the number of sequences.
At this point, you could check the assembly with checkm as we did in the previous chapter. But let’s go ahead and do the binning, then we’ll run checkm on the bins.
8.3 Binning
We will use MetaBAT2 (Metagenome Binning based on Abundance and Tetranucleotide frequency) to bin our assembly sequences by organism. It is in the flye environment, which should already be activated. MetaBAT2 used tetra-nucleotide frequencies and abundance (coverage) to bin sequences from the same organism.
The paper:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6662567/
Best binning practices:
https://bitbucket.org/berkeleylab/metabat/wiki/Best%20Binning%20Practices
Manscript:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6662567/
Make a directory for the binning output.
Bin the assembly contigs.
Parameters
–inFile Assembly contigs (gzipped fasta)
–outFile Basename and path for output bins
–minContig Minimum contig size for binning (default: 2500; must be >=1500)
–verbose Gives more output information
Get the first 3 columns from the assembly_info.txt file.