trimLRPatterns {Biostrings}R Documentation

Trim Flanking Patterns from Sequences

Description

The trimLRPatterns function trims left and/or right flanking patterns from sequences.

Usage

  trimLRPatterns(Lpattern = "", Rpattern = "", subject,
                 max.Lmismatch = 0, max.Rmismatch = 0,
                 with.Lindels = FALSE, with.Rindels = FALSE,
                 Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)

Arguments

Lpattern The left pattern.
Rpattern The right pattern.
subject An XString object, XStringSet object, or character vector containing the target sequence(s).
max.Lmismatch Either an integer vector of length nLp = nchar(Lpattern) whose elements max.Lmismatch[i] represent the maximum number of acceptable mismatching letters when aligning substring(Lpattern, nLp - i + 1, nLp) with substring(subject, 1, i) or a single numeric value in (0, 1) that represents a constant maximum mismatch rate for each of the nL alignments. Negative numbers in integer vector inputs are used to prevent trimming at the i-th location. If an integer vector input has length(max.Lmismatch) < nLp, then max.Lmismatch will be augmented with enough -1's at the beginning of the vector to bring it up to length nLp.

If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).

max.Rmismatch Either an integer vector of length nRp = nchar(Rpattern) whose elements max.Rmismatch[i] represent the maximum number of acceptable mismatching letters when aligning substring(Rpattern, 1, i) with substring(subject, nS - i + 1, nS), where nS = nchar(subject), or a single numeric value in (0, 1) that represents a constant maximum mismatch rate for each of the nR alignments. Negative numbers in integer vector inputs are used to prevent trimming at the i-th location. If an integer vector input has length(max.Rmismatch) < nRp, then max.Rmismatch will be augmented with enough -1's at the beginning of the vector to bring it up to length nRp.

If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).

with.Lindels If TRUE then indels are allowed in the left pattern. In that case max.Lmismatch is interpreted as the maximum "edit distance" allowed in the left pattern.

See the with.indels argument of the matchPattern function for more information.

with.Rindels Same as with.Lindels but for the right pattern.
Lfixed, Rfixed Whether IUPAC extended letters in the left or right pattern should be interpreted as ambiguities (see ?`lowlevel-matching` for the details).
ranges If TRUE, then return the ranges to use to trim subject. If FALSE, then returned the trimmed subject.

Value

A new XString object, XStringSet object, or character vector with the flanking patterns within the specified edit distances removed.

Author(s)

P. Aboyoun

See Also

matchPattern, matchLRPatterns, lowlevel-matching, XString-class, XStringSet-class

Examples

  Lpattern <- "TTCTGCTTG"
  Rpattern <- "GATCGGAAG"
  subject <- DNAString("TTCTGCTTGACGTGATCGGA")
  subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", "TTCTGCTTGGATCGGAAG"))

  ## Only allow for perfect matches on the flanks
  trimLRPatterns(Lpattern = Lpattern, subject = subject)
  trimLRPatterns(Rpattern = Rpattern, subject = subject)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet)

  ## Allow for perfect matches on the flanking overlaps
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9))

  ## Allow for mismatches on the flanks
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2)
  maxMismatches <- as.integer(0.2 * 1:9)
  maxMismatches
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = maxMismatches, max.Rmismatch = maxMismatches)

  ## Produce ranges that can be an input into other functions
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9),
                 ranges = TRUE)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2, ranges = TRUE)

[Package Biostrings version 2.14.12 Index]