python - Use numpy to translate huge array of 2-byte strings to corresponding 1-byte strings according to a fixed mapping -
i have set of 12 distinct 2-byte strings map set of 12 corresponding 1-byte strings according following translation dictionary:
translation_dict = {'ac': '2', 'ag': '3', 'at': '4', 'ca': '5', 'cg': '6', 'ct': '7', 'ga': '8', 'gc': '9', 'gt': 'a', 'ta': 'b', 'tc': 'c', 'tg': 'd'}
i need method translating huge numpy.char.array
of 2-byte strings corresponding 1-byte string mapping, shown in following example:
>>> input_array = numpy.char.array(['ca', 'ca', 'gc', 'tc', 'at', 'gt', 'ag', 'ct']) >>> output_array = some_method(input_arr) >>> output_array chararray(['5', '5', '9', 'c', '4', 'a', '3', '7'], dtype='s1')
i want know if there fast numpy.char.array method translating huge arrays of 2-byte strings; aware can use 'numpy.vectorize' function explicitly looks 1-byte dictionary value each 2-byte key, relatively slow. can't figure out use numpy.chararray.translate
, although seems works 1-byte:1-byte mapping in event.
for such search operations, numpy has np.searchsorted
, allow me suggest approach -
def search_dic(dic, search_keys): # extract out keys , values k = dic.keys() v = dic.values() # use searchsorted locate indices sidx = np.argsort(k) idx = np.searchsorted(k,search_keys, sorter=sidx) # index , extract out corresponding values return np.take(v,sidx[idx])
sample run -
in [46]: translation_dict = {'ac': '2', 'ag': '3', 'at': '4', ...: 'ca': '5', 'cg': '6', 'ct': '7', ...: 'ga': '8', 'gc': '9', 'gt': 'a', ...: 'ta': 'b', 'tc': 'c', 'tg': 'd'} in [47]: s = np.char.array(['ca', 'ca', 'gc', 'tc', 'at', 'gt', 'ag', 'ct']) in [48]: search_dic(translation_dict, s) out[48]: array(['5', '5', '9', 'c', '4', 'a', '3', '7'], dtype='|s1')
Comments
Post a Comment