Picard
Mark Duplicates
MarkDuplicates uses the information contained in BAM/SAM files to present duplication levels in samples.
Figure 1 : horizontal bar plots of the differents type of paired reads (as percentages)
source : nf-core SAREK MultiQC
Unique pairs refers to reads paired with only one other read. Duplicate Pairs Optical refers to duplicates associated with incorrect cluster identification by Illumina sequencing. Duplicate Pairs refers to paired duplicates that are not related to Illumina's optical problem. This may be due to overexpression of a gene, a PCR problem, clustering (a situation where a cluster occupies two wells during its generation) or by Sister, where duplicates appear following the creation of complementary strands of sequences from the original cluster. Duplicate Unpaired refers to duplicate readings that are neither sequenced nor mapped. Unmapped corresponds to non-aligned reads and Unique Unpaired refers to reads without pairing or duplicates
Figure 2 : duplicate causes in Illumina sequencing.
source : Illumina description
Figure 3 : horizontal barplot of the differents type of paired reads (as percentages) for multiple samples.
source : nf-core RNAseq MultiQC
In RNAseq data, it is common to find such rates of duplications. However, it is important to note that these rates are significant. This situation could be due to an overexpression of certain genes or problems related to Illumina sequencing.
For data other than RNAseq :
Figure 4 : conclusion to do on the differents graphics results.
Source : nf-core eager
Fewer duplicates are expected with DNAseq.