Chapter 7 Nanopore Read QC
Quality Control
- “PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data” (https://a-slide.github.io/pycoQC/ )

- What do we need in order to run a Quality Control Check?
- Sequencing Summary File
- Automatically produced by the MinIon basecaller.
- PycoQC package + dependencies
- Already downloaded for you in Ghostwheel.
- Line of code to produce .html file.
- Line of code to secure copy file to your computer.
7.2 PycoQC package & code
- Where is it? How to activate it?
- Log in to Ghostwheel
- Stay in your home directory (check with pwd)
- Type the following:
7.3 Secure Copy
- What terminal to copy from? What is the code?
- Open new Terminal window but don’t connect to the linux server
- Type:
7.4 Open your URL
Find your file on your desktop.
Double-click to open, or right-click to select browser
7.4.1 Normalization
With normalization we are trying to get the correct relative gene expression abundances between cells.
Gene expression between cells is based on count data.
What does a count in a count matrix represent?
- mRNA Capture
- Reverse transcription of mRNA
- sequencing of a molecule of mRNA
The most common normalization protocol is:
- count depth scaling
- aka CPM or counts per million
- it assumes that all cells in the dataset initially contain an equal number of mRNA molecules
- it assumes that count depth differences arise from sampling
Normalize complete
- But wait!
- We still have unwanted variability in the data.
- What kind of unwanted variability?
- What is the solution? Data Correction.
7.4.2 Data correction and integration
Biological Covariates
- Cell-Cycle effects
- Batch
- Dropout
Which Covariates to Consider?
- Depends on downstream analysis
- Correct for biological and technical to be considered separately
- Corrections are used for different purposes
- Each approach to correction presents unique challenges
What are the Correction methods?
- Regressing out biological effects
- Regressing out technical effects
- Batch effects and data integration
- Expression recovery