Filtred Reads
All readings pass through the fastp filter.
Here's a result you might expect for SAREK data,:
Caption : Pass filter indicates the number of reads that have passed the filters. The first filter is a quality filter, which only passes reads with a prhed quality score greater than or equal to 15. The second is centered on the minimum or maximum length accepted, and is set to 0 but it can be modified. The third is a complexity filter : if a read has too many similar nucleotides side by side, its complexity will be too low. The default value is 30, i.e. 30% complexity is required to pass the filter. Low quality indicates the number of reads that did not pass the above filters. TOO MANY N indicates the number of reads containing too many Ns (which are substitutions made by the sequencers when there was too much doubt during sequencing). TOO short indicates the number of reads that are too small. TOO long indicates the number of reads that are too long.
Figure 1 shows a good result where the vast majority of reads pass the filter. For preserved DNA, we would expect at least 80% of reads to pass the filter. There are, however, some cases of low-quality sequences, with too much N or too little in length, which is normal in Illumina sequencing. The fact that there are no excessively long reads is also explained by Illumina sequencing.