findOverlaps-methods {GenomicRanges} | R Documentation |
Finds interval overlaps between a GRanges, GRangesList, GappedAlignments, GAlignmentsList, or GappedAlignmentPairs object, and another object containing ranges.
## S4 method for signature 'GenomicRanges,GenomicRanges' findOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), ignore.strand = FALSE) ## S4 method for signature 'GenomicRanges,GenomicRanges' countOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE) ## S4 method for signature 'GenomicRanges,GenomicRanges' overlapsAny(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE) ## S4 method for signature 'GenomicRanges,GenomicRanges' subsetByOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), ignore.strand = FALSE)
query, subject |
A GRanges, GRangesList, GappedAlignments,
GAlignmentsList, or GappedAlignmentPairs object.
RangesList and RangedData are
also accepted for one of |
maxgap, minoverlap, type, select |
See |
ignore.strand |
When set to |
When the query and the subject are GRanges or
GRangesList objects, findOverlaps
uses the triplet
(sequence name, range, strand) to determine which features (see
paragraph below for the definition of feature) from the query
overlap which features in the subject
, where a strand value
of "*"
is treated as occurring on both the "+"
and
"-"
strand.
An overlap is recorded when a feature in the query
and a feature
in the subject
have the same sequence name, have a compatible
pairing of strands (e.g. "+"
/"+"
, "-"
/"-"
,
"*"
/"+"
, "*"
/"-"
, etc.), and satisfy the
interval overlap requirements. Strand is taken as "*"
for RangedData
and RangesList
.
In the context of findOverlaps
, a feature is a collection of
ranges that are treated as a single entity. For GRanges objects,
a feature is a single range; while for GRangesList objects,
a feature is a list element containing a set of ranges. In the results,
the features are referred to by number, which run from 1 to
length(query)
/length(subject)
.
When the query or the subject (or both) is a GappedAlignments
object, it is first turned into a GRangesList object (with
as( , "GRangesList")
) and then the rules described previously
apply. GAlignmentsList objects are coerced to GappedAlignments
then to a GRangesList. Feature indices are mapped back to the
original GAlignmentsList list elements.
When the query is a GappedAlignmentPairs object, it is first
turned into a GRangesList object (with as( , "GRangesList")
)
and then the rules described previously apply.
For findOverlaps
either a Hits object when
select = "all"
or an integer vector otherwise.
For countOverlaps
an integer vector containing the tabulated
query overlap hits.
For overlapsAny
a logical vector of length equal to the number of
ranges in query
indicating those that overlap any of the ranges
in subject
.
For subsetByOverlaps
an object of the same class as query
containing the subset that overlapped at least one entity in subject
.
For RangedData
and RangesList
, with the exception of
subsetByOverlaps
, the results align to the unlisted
form of the object. This turns out to be fairly convenient for
RangedData
(not so much for RangesList
, but something
has to give).
P. Aboyoun, S. Falcon, M. Lawrence, N. Gopalakrishnan and H. Pages
## --------------------------------------------------------------------- ## WITH GRanges AND/OR GRangesList OBJECTS ## --------------------------------------------------------------------- ## GRanges object: gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10)) gr ## GRangesList object: gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(4:3, 6), strand = "+", score = 5:4, GC = 0.45) gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5)) gr3 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(1, 4), c(3, 9)), strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1)) grl <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3) ## Overlapping two GRanges objects: table(!is.na(findOverlaps(gr, gr1, select="arbitrary"))) countOverlaps(gr, gr1) findOverlaps(gr, gr1) subsetByOverlaps(gr, gr1) countOverlaps(gr, gr1, type = "start") findOverlaps(gr, gr1, type = "start") subsetByOverlaps(gr, gr1, type = "start") findOverlaps(gr, gr1, select = "first") findOverlaps(gr, gr1, select = "last") findOverlaps(gr1, gr) findOverlaps(gr1, gr, type = "start") findOverlaps(gr1, gr, type = "within") findOverlaps(gr1, gr, type = "equal") ## Overlapping a GRanges and a GRangesList object: table(!is.na(findOverlaps(grl, gr, select="first"))) countOverlaps(grl, gr) findOverlaps(grl, gr) subsetByOverlaps(grl, gr) countOverlaps(grl, gr, type = "start") findOverlaps(grl, gr, type = "start") subsetByOverlaps(grl, gr, type = "start") findOverlaps(grl, gr, select = "first") ## Overlapping two GRangesList objects: countOverlaps(grl, rev(grl)) findOverlaps(grl, rev(grl)) subsetByOverlaps(grl, rev(grl)) ## --------------------------------------------------------------------- ## WITH GappedAlignments AND/OR GAlignmentsList OBJECTS ## --------------------------------------------------------------------- library(Rsamtools) # because file ex1.bam is in this package ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") galn <- readGappedAlignments(ex1_file) subject <- granges(galn)[1] ## Note the absence of query no. 9 (i.e. 'galn[9]') in this result: as.matrix(findOverlaps(galn, subject)) ## This is because, by default, findOverlaps()/countOverlaps() are ## strand specific: galn[8:10] countOverlaps(galn[8:10], subject) countOverlaps(galn[8:10], subject, ignore.strand=TRUE) ## Count alignments in 'galn' that DO overlap with 'subject' vs those ## that do NOT: table(overlapsAny(galn, subject)) ## Extract those that DO: subsetByOverlaps(galn, subject) ## GAlignmentsList galist <- GAlignmentsList(galn[8:10], galn[3000:3002]) gr <- GRanges(c("seq1", "seq1", "seq2"), IRanges(c(15, 18, 1233), width=1), strand=c("-", "+", "+")) countOverlaps(galist, gr) countOverlaps(galist, gr, ignore.strand=TRUE) findOverlaps(galist, gr) findOverlaps(galist, gr, ignore.strand=TRUE)