bioinformatics - Generating a mutation frequency on a DNA Strand using Python -
i input dna sequence , make sort of generator yields sequences have frequency of mutations. instance, have dna strand "atgtcgtcacacaccgcagatccgtgtttgac", , want create mutations t->a frequency of 5%. how go creating this? know creating random mutations can done code this:
import random def mutate(string, mutation, threshold): dna = list(string) index, char in enumerate(dna): if char in mutation: if random.random() < threshold: dna[index] = mutation[char] return ''.join(dna)
but not sure how make fixed mutation frequency. know how that? thanks.
edit:
so should formatting if i'm using byte array, because i'm getting error:
import random dna = "atgtcgtacgtttgacgtagag" def mutate(dna, mutation, threshold): dna = bytearray(dna) #if don't want modify original index in range(len(dna)): if dna[index] in mutation , random.random() < threshold: dna[index] = mutation[char] return dna
mutate(dna, {"a": "t"}, 0.05) print("my dna now:", dna)
error: "typeerror: string argument without encoding"
edit 2:
import random mydna = bytearray("atgtcgtcacacaccgcagatccgtgtttgac") def mutate(dna, mutation, threshold): dna = mydna # if don't want modify original index in range(len(dna)): if dna[index] in mutation , random.random() < threshold: dna[index] = mutation[char] return dna mutate(dna, {"a": "t"}, 0.05) print("my dna now:", dna)
yields error
you asked me function prints possible mutations, here is. number of outputs grows exponentially input data length, function prints possibilities , not store them somehow (that consume memory). created recursive function, function should not used large input, add non-recursive function should work without problems or limits.
def print_all_possibilities(dna, mutations, index = 0, print = print): if index < 0: return #invalid value index while index < len(dna): if chr(dna[index]) in mutations: print_all_possibilities(dna, mutations, index + 1) dnacopy = bytearray(dna) dnacopy[index] = ord(mutations[chr(dna[index])]) print_all_possibilities(dnacopy, mutations, index + 1) return index += 1 print(dna.decode("ascii")) # testing print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"})
this works me on python 3, can explain code if want.
note: function requires bytearray given in function test.
explanation:
function searches place in dna mutation can happen, starts @ index, begins 0 , goes end. that's why while-loop, increases index every time loop executed, (it's normal iteration loop). if function finds place mutation can happen (if chr(dna[index]) in mutations:
), copies dna , lets second 1 mutate (dnacopy[index] = ord(mutations[chr(dna[index])])
, note bytearray array of numeric values, use chr , ord time change between string , int). after function called again more possible mutations, functions again possible mutations in both possible dna's, skip point have scanned, begin @ index + 1
. after order print passed called functions print_all_possibilities, don't have anymore , quit executioning return
. if don't find mutations anymore print our possible dna, because don't call function again, no 1 else it.
may sound complicated, more or less elegant solution. also, understand recursion have understand recursion, don't bother if don't understand now. if try out on sheet of paper: take easy dna string "ttattatta" possible mutation "a" -> "t" (so have 8 possible mutations) , this: go through string left right , if find position, sequence can mutate (here "a"'s), write string down again, time let string mutate @ given position, second string different original. in original , copy, mark how far came (maybe put "|" after letter let mutate) , repeat procedure copy new original. if don't find possible mutation, underline string (this equivalent printing it). @ end should have 8 different strings underlined. hope can understand it.
edit: here non-recursive function:
def print_all_possibilities(dna, mutations, printings = -1, print = print): mut_possible = [] index in range(len(dna)): if chr(dna[index]) in mutations: mut_possible.append(index) if printings < 0: printings = 1 << len(mut_possible) number in range(min(printings, 1 << len(mut_possible)): dnacopy = bytearray(dna) # don't change original counter = 0 while number: if number & (1 << counter): index = mut_possible[counter] dnacopy[index] = ord(mutations[chr(dna[index])]) number &= ~(1 << counter) counter += 1 print(dnacopy.decode("ascii")) # testing print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"})
this function comes additional parameter, can control number of maximum outputs, e.g.
print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"}, 5)
will print 5 results.
explanation:
if dna has x possible positions can mutate, have 2 ^ x possible mutations, because @ every place dna can mutate or not. function finds positions dna can mutate , stores them in mut_possible
(that's code of for-loop). mut_possible
contains positions dna can mutate , have 2 ^ len(mut_possible)
(len(mut_possible)
number of elements in mut_possible) possible mutations. wrote 1 << len(mut_possible)
, it's same, faster. if printings
negative number function decide print possibilities , set printings number of possibilities. if printings positive, lower number of possibilities, function print printings
mutations, because min(printings, 1 << len(mut_possible))
return smaller number, printings
. else, function print out possibilities. have number
go through range(...)
, loop, prints 1 mutation every time, execute desired number of times. also, number increase 1 every time. (e.g., range(4) similar! [0, 1, 2, 3]). next use number create mutation. understand step have understand binary number. if our number 10, it's in binary 1010. these numbers tell @ places have modify out code of dna (dnacopy
). first bit 0, don't modify first position mutation can happen, next bit 1, modify position, after there 0 , on... "read" bits use variable counter
. number & (1 << counter)
return non-zero value if counter
th bit set, if bit set modify our dna @ counter
th position mutation can happen. written in mut_possible, our desired position mut_possible[counter]
. after mutated our dna @ position set bit 0 show modified position. done number &= ~(1 << counter)
. after increase counter @ other bits. while-loop continue execute if number not 0, if number has @ least 1 bit set (if have modify @ least 1 position of dna). after modified our dnacopy while-loop finished , print our result.
i hope these explanations help. see new python, take time let sink in , contact me if have further questions.
Comments
Post a Comment