FastQC

Per base sequence content

Analysis by base along each sequence

PerBaseSeqContentheatmap

Figure 1 : Heatmap of the samples indicating the nucleotide composition.

source : Babraham Training Courses

Clic on one sample to see bases along each sequence:

PerBaseSeqContentoneSample

Figure 2 : graphical representation of the percentages of A, T, C, or G in the reads for a sample.

source : Babraham Training Courses

Each of the 4 DNA bases is normally found with about the same percentage (little or no difference between bases) as the sequence is read.

Therefore, the lines on the graph should be parallel to each other. The relative amount of each base should reflect the overall amount of these bases in your genome, but in any case there should not be huge imbalances from one another.

If there is an imbalance between the different bases, this usually indicates that a sequence is overrepresented and therefore your library is contaminated. If this bias is consistent across all bases, it indicates either :

  • that the original library was biased,
  • or that there is a systematic problem during the sequencing of the library.

Warning

This module issues a warning if the difference between A and T, or U and C is greater than 10% in any position.

Failure

This module fails if the difference between A and T, or U and C is greater than 20% in any position.

Common reasons for warnings

Example of good data

Figure 3 : exemple of good result.

Good data:

  • Smooth overlength: the lines run parallel with each other.
  • Organims dependant (GC content)

Example of bad data

Figure 4 : exemple of bad result.

Bad data: Sequence position bias.

Figure 5 : schematic of expected results.

source : eager pipeline nf-core

results matching ""

    No results matching ""