This script considers the read mapping start position and the UMI to determine whether a read is a PCR duplicate. All PCR duplicates are then removed and one entry per read is kept. In case of paired-end reads (MAPCap/RAMPAGE), only one end (R1) is kept after filtering.

filterDuplicates(CSobject, outdir, ncores = 1, keepPairs = FALSE)

# S4 method for CapSet
filterDuplicates(CSobject, outdir, ncores = 1,
  keepPairs = FALSE)

Arguments

CSobject

an object of class CapSet

outdir

character. output directory for filtered BAM files

ncores

integer. No. of cores to use

keepPairs

logical. indicating whether to keep pairs in the paired-end data. (note: the pairs are treated as independent reads during duplicate removal)

Value

modified CapSet object with filtering information. Filtered BAM files are saved in `outdir`.

Examples

# before running this # 1. Create a CapSet object # 2. de-multiplex the fastqs # 3. map them # load a previously saved CapSet object cs <- exampleCSobject()
#> Checking de-multiplexed R1 reads
#> Checking de-multiplexed R2 reads
#> Checking mapped file
#> Checking de-duplicated file
# filter duplicate reads from mapped BAM files dir.create("filtered_bam")
#> Warning: 'filtered_bam' already exists
cs <- filterDuplicates(cs, outdir = "filtered_bam")
#> Removing PCR duplicates : /data/akhtar/bhardwaj/Gitrepo/icetea/inst/extdata/bam/embryo1.bam
#> Removing PCR duplicates : /data/akhtar/bhardwaj/Gitrepo/icetea/inst/extdata/bam/embryo2.bam
#> Removing PCR duplicates : /data/akhtar/bhardwaj/Gitrepo/icetea/inst/extdata/bam/embryo3.bam
#> Removing PCR duplicates : /data/akhtar/bhardwaj/Gitrepo/icetea/inst/extdata/bam/embryo4.bam