bioinformatics - Generating a mutation frequency on a DNA Strand using Python -


i input dna sequence , make sort of generator yields sequences have frequency of mutations. instance, have dna strand "atgtcgtcacacaccgcagatccgtgtttgac", , want create mutations t->a frequency of 5%. how go creating this? know creating random mutations can done code this:

import random def mutate(string, mutation, threshold):     dna = list(string)     index, char in enumerate(dna):         if char in mutation:             if random.random() < threshold:                 dna[index] = mutation[char]      return ''.join(dna) 

but not sure how make fixed mutation frequency. know how that? thanks.

edit:

so should formatting if i'm using byte array, because i'm getting error:

import random  dna = "atgtcgtacgtttgacgtagag"  def mutate(dna, mutation, threshold):     dna = bytearray(dna) #if don't want modify original     index in range(len(dna)):         if dna[index] in mutation , random.random() < threshold:                 dna[index] = mutation[char]      return dna 

mutate(dna, {"a": "t"}, 0.05) print("my dna now:", dna)

error: "typeerror: string argument without encoding"

edit 2:

import random  mydna = bytearray("atgtcgtcacacaccgcagatccgtgtttgac")  def mutate(dna, mutation, threshold):     dna = mydna # if don't want modify original     index in range(len(dna)):         if dna[index] in mutation , random.random() < threshold:                 dna[index] = mutation[char]      return dna  mutate(dna, {"a": "t"}, 0.05) print("my dna now:", dna) 

yields error

you asked me function prints possible mutations, here is. number of outputs grows exponentially input data length, function prints possibilities , not store them somehow (that consume memory). created recursive function, function should not used large input, add non-recursive function should work without problems or limits.

def print_all_possibilities(dna, mutations, index = 0, print = print):     if index < 0: return #invalid value index      while index < len(dna):         if chr(dna[index]) in mutations:             print_all_possibilities(dna, mutations, index + 1)             dnacopy = bytearray(dna)             dnacopy[index] = ord(mutations[chr(dna[index])])             print_all_possibilities(dnacopy, mutations, index + 1)              return         index += 1     print(dna.decode("ascii"))  # testing print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"}) 

this works me on python 3, can explain code if want.
note: function requires bytearray given in function test.

explanation:
function searches place in dna mutation can happen, starts @ index, begins 0 , goes end. that's why while-loop, increases index every time loop executed, (it's normal iteration loop). if function finds place mutation can happen (if chr(dna[index]) in mutations:), copies dna , lets second 1 mutate (dnacopy[index] = ord(mutations[chr(dna[index])]), note bytearray array of numeric values, use chr , ord time change between string , int). after function called again more possible mutations, functions again possible mutations in both possible dna's, skip point have scanned, begin @ index + 1. after order print passed called functions print_all_possibilities, don't have anymore , quit executioning return. if don't find mutations anymore print our possible dna, because don't call function again, no 1 else it.
may sound complicated, more or less elegant solution. also, understand recursion have understand recursion, don't bother if don't understand now. if try out on sheet of paper: take easy dna string "ttattatta" possible mutation "a" -> "t" (so have 8 possible mutations) , this: go through string left right , if find position, sequence can mutate (here "a"'s), write string down again, time let string mutate @ given position, second string different original. in original , copy, mark how far came (maybe put "|" after letter let mutate) , repeat procedure copy new original. if don't find possible mutation, underline string (this equivalent printing it). @ end should have 8 different strings underlined. hope can understand it.

edit: here non-recursive function:

def print_all_possibilities(dna, mutations, printings = -1, print = print):     mut_possible = []     index in range(len(dna)):         if chr(dna[index]) in mutations: mut_possible.append(index)      if printings < 0: printings = 1 << len(mut_possible)     number in range(min(printings, 1 << len(mut_possible)):         dnacopy = bytearray(dna) # don't change original         counter = 0         while number:             if number & (1 << counter):                 index = mut_possible[counter]                 dnacopy[index] = ord(mutations[chr(dna[index])])                 number &= ~(1 << counter)             counter += 1          print(dnacopy.decode("ascii"))  # testing print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"}) 

this function comes additional parameter, can control number of maximum outputs, e.g.

print_all_possibilities(bytearray(b"aaaatttt"), {"a": "t"}, 5) 

will print 5 results.

explanation:
if dna has x possible positions can mutate, have 2 ^ x possible mutations, because @ every place dna can mutate or not. function finds positions dna can mutate , stores them in mut_possible (that's code of for-loop). mut_possible contains positions dna can mutate , have 2 ^ len(mut_possible) (len(mut_possible) number of elements in mut_possible) possible mutations. wrote 1 << len(mut_possible), it's same, faster. if printings negative number function decide print possibilities , set printings number of possibilities. if printings positive, lower number of possibilities, function print printings mutations, because min(printings, 1 << len(mut_possible)) return smaller number, printings. else, function print out possibilities. have number go through range(...) , loop, prints 1 mutation every time, execute desired number of times. also, number increase 1 every time. (e.g., range(4) similar! [0, 1, 2, 3]). next use number create mutation. understand step have understand binary number. if our number 10, it's in binary 1010. these numbers tell @ places have modify out code of dna (dnacopy). first bit 0, don't modify first position mutation can happen, next bit 1, modify position, after there 0 , on... "read" bits use variable counter. number & (1 << counter) return non-zero value if counterth bit set, if bit set modify our dna @ counterth position mutation can happen. written in mut_possible, our desired position mut_possible[counter]. after mutated our dna @ position set bit 0 show modified position. done number &= ~(1 << counter). after increase counter @ other bits. while-loop continue execute if number not 0, if number has @ least 1 bit set (if have modify @ least 1 position of dna). after modified our dnacopy while-loop finished , print our result.

i hope these explanations help. see new python, take time let sink in , contact me if have further questions.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -