Issue
I'm trying to do some fuzzy matching on a string of DNA reads. I'd like to allow for up to 1 substitution error while at the same time allowing a particular basepair to be one of two options (A or G in this case).
I've started with the following:
>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "ATTAGATACCCTGGTAGTCA")
['ATTAGATACCCTGGTAGTCA']
matches as expected because I'm matching against the exact string
>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCA")
['GTTAGATACCCTGGTAGTCA']
matches as expected because I'm matching against the exact string except the first base pair has been switched from an A to a G (allowed)
>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "GTTAGATACCCTGGTAGTCx")
['GTTAGATACCCTGGTAGTCx']
matches as expected because a single substitution occurs (C->x)
>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCx")
[]
does not match (as expected) because there are two substitutions
>>> regex.findall("([A|G]TTAGATACCCTGGTAGTCC){0<=s<=1}", "xTTAGATACCCTGGTAGTCA")
[]
should have matched, since the first basepair error (x instead of A or G) should have been counted as a substitution.
Solution
You have two substitutions in your last example: the first basepair has been substituted with an x
while the last has been changed to an A
. You only allow one substitution, so there's no match.
Answered By - Henry Keiter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.