summarizeOverlaps {GenomicRanges} | R Documentation |
summarizeOverlaps
extends findOverlaps
by providing
options to resolve reads that overlap multiple features. Each read
is counted a maximum of once.
## S4 method for signature 'GRanges,GappedAlignments' summarizeOverlaps( features, reads, mode, ignore.strand=FALSE, ...) ## S4 method for signature 'GRangesList,GappedAlignments' summarizeOverlaps( features, reads, mode, ignore.strand=FALSE, ...) ## S4 method for signature 'GRanges,GappedAlignmentPairs' summarizeOverlaps( features, reads, mode, ignore.strand=FALSE, ...) ## S4 method for signature 'GRangesList,GappedAlignmentPairs' summarizeOverlaps( features, reads, mode, ignore.strand=FALSE, ...)
features |
A GRanges or a GRangesList containing genomic
regions of interest. This will commonly be a |
reads |
A GappedAlignments (single-end), GappedAlignmentPairs (paired-end), or BamFileList or BamViews objects containing the reads to be mapped to the genomic regions of interest. The BamFileList and BamViews methods
can handle single or paired-end reads by appropriately specifying the
|
mode |
Character name of a function that defines the counting method to be used.
Modes include ‘Union’, ‘IntersectionStrict’, or ‘IntersectionNotEmpty’ and
are designed after the counting modes in the HTSeq package by Simon Anders
(see references). All methods are gap-aware and count a read a maximum of
once. A user can provide their own count function as the
|
ignore.strand |
A logical value indicating if strand should be considered when matching. |
... |
Additional arguments for other methods. If using multiple cores, you can pass arguments in here to be used by mclapply to indicate the number of cores to use etc. |
A ‘feature’ can be any portion of a genomic region such as a gene,
transcript, exon etc. When the features
argument is a
GRanges the rows define the features. The result
will be the same length as the GRanges. When
features
is a GRangesList the highest list-level
defines the features and the result will be the same length as the
GRangesList.
Each count ‘mode’ attempts to assign a read that overlapps multiple features to a single feature. If there are ranges that should be considered together (e.g., exons by transcript or cds regions by gene) the GRangesList would be appropriate. If there is no grouping in the data then a GRanges would be appropriate.
Paired-end reads should be provided in a GappedAlignmentPairs container. Paired-end reads are counted the same as single-end reads with gaps.
The BamFileList and BamViews
methods have an additional argument, singleEnd
, to indicate if
the bam files contain single or paired-end reads.
See ?summarizeOverlaps,GRanges,BamFileList-method
for
details.
A SummarizedExperiment object. The assays
slot holds
the counts, rowData
holds the features
, colData
will either be NULL
or hold any metadata that was present in the
reads
.
Valerie Obenchain <vobencha@fhcrc.org>
HTSeq : http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
htseq-count : http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
DESeq
, DEXSeq
and edgeR
packages
BamFileList and BamViews classes
GappedAlignments and GappedAlignmentPairs classes
reads <- GappedAlignments( names = c("a","b","c","d","e","f","g"), seqnames = Rle(c(rep(c("chr1", "chr2"), 3), "chr1")), pos = as.integer(c(1400, 2700, 3400, 7100, 4000, 3100, 5200)), cigar = c("500M", "100M", "300M", "500M", "300M", "50M200N50M", "50M150N50M"), strand = strand(rep("+", 7))) ## --------------------------------------------------------------------- ## 'features' as GRanges ## features <- GRanges( seqnames = c(rep("chr1", 7), rep("chr2", 4)), strand = "+", ranges = IRanges(c(1000, 3000, 3600, 4000, 4000, 5000, 5400, 2000, 3000, 7000, 7500), width = c(500, 500, 300, 500, 900, 500, 500, 900, 500, 600, 300)), group_id = c("A", "B", "C", "C", "D", "D", "E", "F", "G", "H", "H")) ## Each row is a feature the read can 'hit'. rowsAsFeatures <- data.frame( union = assays(summarizeOverlaps(features, reads))$counts, intStrict = assays(summarizeOverlaps(features, reads, mode="IntersectionStrict"))$counts, intNotEmpty = assays(summarizeOverlaps(features, reads, mode="IntersectionNotEmpty"))$counts, countOverlaps = countOverlaps(features, reads)) ## Results from countOverlaps() are included to highlight how ## summarizeOverlaps() counts a read only once. For these 7 ## reads, 'Union' is the most conservative counting mode, followed ## by 'Intersectionstrict' and then 'IntersectionNotEmpty'. colSums(rowsAsFeatures) ## --------------------------------------------------------------------- ## 'features' as GRangesList ## ## Each highest list-level is a feature the read can 'hit'. lst <- split(features, mcols(features)[["group_id"]]) listAsFeatures <- data.frame( union = assays(summarizeOverlaps(lst, reads))$counts, intStrict = assays(summarizeOverlaps(lst, reads, mode="IntersectionStrict"))$counts, intNotEmpty = assays(summarizeOverlaps(lst, reads, mode="IntersectionNotEmpty"))$counts, countOverlaps = countOverlaps(lst, reads)) ## --------------------------------------------------------------------- ## 'reads' as BamFileList ## ## Examples of using summarizeOverlaps with Bam files are found ## at ?BamFile or ?'summarizeOverlaps,GRanges,BamFileList-method' ## Count BAM files and prepare output for DESeq or edgeR analysis library(Rsamtools) fls <- list.files(system.file("extdata",package="GenomicRanges"), recursive=TRUE, pattern="*bam$", full=TRUE) names(fls) <- basename(fls) bf <- BamFileList(fls, index=character()) features <- GRanges( seqnames = c(rep("chr2L", 4), rep("chr2R", 5), rep("chr3L", 2)), ranges = IRanges(c(1000, 3000, 4000, 7000, 2000, 3000, 3600, 4000, 7500, 5000, 5400), width=c(rep(500, 3), 600, 900, 500, 300, 900, 300, 500, 500)), "-", group_id=c(rep("A", 4), rep("B", 5), rep("C", 2))) se <- summarizeOverlaps(features, bf) library(DESeq) deseq <- newCountDataSet(assays(se)$counts, rownames(colData(se))) library(edgeR) edger <- DGEList(assays(se)$counts, group=rownames(colData(se))) ## --------------------------------------------------------------------- ## paired-end reads ## ## When paired-end reads are counted in a Bam file, the 'singleEnd' ## argument must be set to 'FALSE'. See the examples on the ## ?BamFile man page. ## Paired-end can also be counted in a GappedAlignmentPairs object. ## The readGappedAlignmentPairs() function outputs a GappedAliginmentPairs. library("TxDb.Dmelanogaster.UCSC.dm3.ensGene") exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") fl <- system.file("extdata", "untreated3_chr4.bam", package="pasillaBamSubset") reads <- readGappedAlignmentPairs(fl) res <- summarizeOverlaps(exbygene, reads, "Union") stopifnot(length(assays(res)$counts) == length(exbygene))