sequencing - Biopython SeqIO processing NNNNN in *.ab1 files -
thanks help. apologize in advance if there function built biopython handles this, read whole manual , couldn't find anything.
goal: read in raw sequencing file (*.ab1) , process using sequence.seq.translate(11) however, error - "bio.data.codontable.translationerror: codon 'nnn' invalid"
my solution: added additional table codontable , commented out ambiguous checker in bio.data.codontable (had make work)
register_ncbi_table( name = 'bacteria sequencing table', alt_name = none, id = 24, table = { 'ttt': 'f', 'ttc': 'f', 'tta': 'l', 'ttg': 'l', 'tct': 's', 'tcc': 's', 'tca': 's', 'tcg': 's', 'tat': 'y', 'tac': 'y', 'tgt': 'c', 'tgc': 'c', 'tgg': 'w', 'ctt': 'l', 'ctc': 'l', 'cta': 'l', 'ctg': 'l', 'cct': 'p', 'ccc': 'p', 'cca': 'p', 'ccg': 'p', 'cat': 'h', 'cac': 'h', 'caa': 'q', 'cag': 'q', 'cgt': 'r', 'cgc': 'r', 'cga': 'r', 'cgg': 'r', 'att': 'i', 'atc': 'i', 'ata': 'i', 'atg': 'm', 'act': 't', 'acc': 't', 'aca': 't', 'acg': 't', 'aat': 'n', 'aac': 'n', 'aaa': 'k', 'aag': 'k', 'agt': 's', 'agc': 's', 'aga': 'r', 'agg': 'r', 'gtt': 'v', 'gtc': 'v', 'gta': 'v', 'gtg': 'v', 'gct': 'a', 'gcc': 'a', 'gca': 'a', 'gcg': 'a', 'gat': 'd', 'gac': 'd', 'gaa': 'e', 'gag': 'e', 'ggt': 'g', 'ggc': 'g', 'gga': 'g', 'ggg': 'g', 'aan': 'x', 'tan': 'x', 'gan': 'x', 'can': 'x', 'atn': 'x', 'ttn': 'x', 'gtn': 'x', 'ctn': 'x', 'acn': 'x', 'tcn': 'x', 'gcn': 'x', 'ccn': 'x', 'agn': 'x', 'tgn': 'x', 'ggn': 'x', 'cgn': 'x', 'ana': 'x', 'tna': 'x', 'gna': 'x', 'cna': 'x', 'ant': 'x', 'tnt': 'x', 'gnt': 'x', 'cnt': 'x', 'anc': 'x', 'tnc': 'x', 'gnc': 'x', 'cnc': 'x', 'ang': 'x', 'tng': 'x', 'gng': 'x', 'cng': 'x', 'naa': 'x', 'nta': 'x', 'nga': 'x', 'nca': 'x', 'nat': 'x', 'ntt': 'x', 'ngt': 'x', 'nct': 'x', 'nac': 'x', 'ntc': 'x', 'ngc': 'x', 'ncc': 'x', 'nag': 'x', 'ntg': 'x', 'ngg': 'x', 'ncg': 'x', 'nnn': 'x', 'ann': 'x', 'tnn': 'x', 'gnn': 'x', 'cnn': 'x', 'nan': 'x', 'ntn': 'x', 'ngn': 'x', 'ncn': 'x', 'nna': 'x', 'nnt': 'x', 'nng': 'x', 'nnc': 'x', 'nnn': 'x'}, stop_codons = ['taa', 'tag', 'tga'], start_codons = ['ttg', 'ctg', 'att', 'atc', 'ata', 'atg', 'gtg'])
ambiguous checker
for n in ambiguous_generic_by_id: assert ambiguous_rna_by_id[n].forward_table["guu"] == "v" assert ambiguous_rna_by_id[n].forward_table["gun"] == "v" if n != 23 : #for table 23, uun = f, l or stop. assert ambiguous_rna_by_id[n].forward_table["uun"] == "x" # f or l #r = or g, urr = uaa or uga / tra = taa or tga = stop codons if "uaa" in unambiguous_rna_by_id[n].stop_codons and\ "uga" in unambiguous_rna_by_id[n].stop_codons: try: print(ambiguous_dna_by_id[n].forward_table["tra"]) assert false, "should stop only" except keyerror: pass assert "ura" in ambiguous_generic_by_id[n].stop_codons assert "ura" in ambiguous_rna_by_id[n].stop_codons assert "tra" in ambiguous_generic_by_id[n].stop_codons assert "tra" in ambiguous_dna_by_id[n].stop_codons del n
question 1: prefer not edit root codontable.py
file. suggestions on how avoid that?
question 2: don't want comment out ambiguous checker. can me write exception ambiguous checker ignore new codon table?
when load abi file, biopython set seq alphabet iupacunambiguousdna()
. first approach set alphabet singleletteralphabet()
:
from bio import seqio bio.alphabet import singleletteralphabet rec in seqio.parse("prots.ab1", "abi", alphabet=singleletteralphabet()): print rec.seq.translate(11)
now seq translates "x" , "n".
Comments
Post a Comment