python - Use numpy to translate huge array of 2-byte strings to corresponding 1-byte strings according to a fixed mapping -


i have set of 12 distinct 2-byte strings map set of 12 corresponding 1-byte strings according following translation dictionary:

translation_dict = {'ac': '2', 'ag': '3', 'at': '4',                     'ca': '5', 'cg': '6', 'ct': '7',                      'ga': '8', 'gc': '9', 'gt': 'a',                      'ta': 'b', 'tc': 'c', 'tg': 'd'} 

i need method translating huge numpy.char.array of 2-byte strings corresponding 1-byte string mapping, shown in following example:

>>> input_array = numpy.char.array(['ca', 'ca', 'gc', 'tc', 'at', 'gt', 'ag', 'ct']) >>> output_array = some_method(input_arr) >>> output_array chararray(['5', '5', '9', 'c', '4', 'a', '3', '7'], dtype='s1') 

i want know if there fast numpy.char.array method translating huge arrays of 2-byte strings; aware can use 'numpy.vectorize' function explicitly looks 1-byte dictionary value each 2-byte key, relatively slow. can't figure out use numpy.chararray.translate, although seems works 1-byte:1-byte mapping in event.

for such search operations, numpy has np.searchsorted, allow me suggest approach -

def search_dic(dic, search_keys):     # extract out keys , values     k = dic.keys()     v = dic.values()      # use searchsorted locate indices     sidx = np.argsort(k)     idx = np.searchsorted(k,search_keys, sorter=sidx)      # index , extract out corresponding values     return np.take(v,sidx[idx]) 

sample run -

in [46]: translation_dict = {'ac': '2', 'ag': '3', 'at': '4',     ...:                     'ca': '5', 'cg': '6', 'ct': '7',      ...:                     'ga': '8', 'gc': '9', 'gt': 'a',      ...:                     'ta': 'b', 'tc': 'c', 'tg': 'd'}  in [47]: s = np.char.array(['ca', 'ca', 'gc', 'tc', 'at', 'gt', 'ag', 'ct'])  in [48]: search_dic(translation_dict, s) out[48]:  array(['5', '5', '9', 'c', '4', 'a', '3', '7'],        dtype='|s1') 

Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -