dataframe - Create internal python loop based on index and groupings for all combinations -


i have script looks @ rows , columns headers belonging group (reg_id) , sums values. code runs on matrix (small subset) follows:

outputs

my code runs calculating sum ids based on rows , columns belonging each internal group (reg_id). example row , column ids belong reg_id 1 summed total flows between region 1 , region 1 (internal flows) calculated , on each region. wish extend code calculating (summing) flows between regions example region 1 region 2, 3, 4 ,5.... figure need include loop within existing while loop appreciate figure out should , how construct it. code runs on internal flow sum (1-1, 2-2, 3-3 etc) follows:

global index index = 1 x = index while index < len(idgroups):     ward_list = idgroups[index] #select list of ward ids each region list of lists     df6 = mergedcsv.loc[ward_list] #select rows values in list     dfcols = mergedcsv.loc[ward_list, :] #select columns values in list     ward_liststr = map(str, ward_list) #convert ward_list strings can used select columns, won't work integers.     ward_listint = map(int, ward_list)     #dfrowscols = mergedcsv.loc[ward_list, ward_listint]     df7 = df6.loc[:, ward_liststr]     print df7     regflowsum = df7.values.sum() #sum values in dataframe     intflow = [regflowsum]     print intflow     dfintflow = pd.dataframe(intflow)     dfintflow.reset_index(level=0, inplace=true)     dfintflow.columns = ["regid", "regflowsum"]     dfflows.set_value(index, 'regid', index)     dfflows.set_value(index, 'regid2', index)     dfflows.set_value(index, 'regflow', regflowsum)     mergedcsv.set_value(ward_list, 'totregflows', regflowsum)     index += 1 #increment index number print dfflows new_df = pd.merge(pairlist, dfflows,  how='left', left_on=['origid','destid'], right_on = ['regid', 'regid2']) print new_df #useful checking dataframe merges regionflows = r"c:\temp\allni\regionflows.csv" header = ["wardid","label","reg_id","total","totregflows"] mergedcsv.to_csv(regionflows, columns = header, index=false) regregflows = r"c:\temp\allni\reg_regflows.csv" headerreg = ["reg_id_orig", "reg_id_dest", "flow"]  pairlistcsv = r"c:\temp\allni\pairlist_regions.csv" new_df.to_csv(pairlistcsv) 

the output follows:

idgroups dataframe: (see image 1 - second part of image 1)

df7 , intflows each region reg_id:(third part of image 1 - on right)

ddflows dataframe:(fourth part of image 2)

and final output new_df:(fifth part of image 2)

i wish populate sums possible combinations of flows between regions not internal.

i figure need add loop while loop. possibly add enumerate function like:

while index < len(idgroups):     #add line(s) calculate flows between regions     index, item in enumerate(idgroups):         ward_list = idgroups[index]         print ward_list         df6 = mergedcsv.loc[ward_list] #select rows values in list         dfcols = mergedcsv.loc[ward_list, :] #select columns values in list         ward_liststr = map(str, ward_list) #convert ward_list strings can used select columns, won't work integers.         ward_listint = map(int, ward_list)         #dfrowscols = mergedcsv.loc[ward_list, ward_listint]         df7 = df6.loc[:, ward_liststr]         print df7         regflowsum = df7.values.sum() #sum values in dataframe         intflow = [regflowsum]         print intflow         dfintflow = pd.dataframe(intflow)         dfintflow.reset_index(level=0, inplace=true)         dfintflow.columns = ["regid", "regflowsum"]         dfflows.set_value(index, 'regid', index)         dfflows.set_value(index, 'regid2', index)         dfflows.set_value(index, 'regflow', regflowsum)         mergedcsv.set_value(ward_list, 'totregflows', regflowsum)         index += 1 #increment index number 

i'm unsure how integrate item struggling extend code combinations. advice appreciated.

update based on flows function:

    w=pysal.rook_from_shapefile("c:/temp/allni/niw01_sort.shp",idvariable='label') simil = pysal.open("c:/temp/allni/simni.csv") similarity = np.array(simil) db = pysal.open('c:\temp\sqlite\matrixcsv2.csv', 'r') dbf = pysal.open(r'c:\temp\allni\niw01_sortc.dbf', 'r') ids = np.array((dbf.by_col['label'])) commuters = np.array((dbf.by_col['total'],dbf.by_col['idno'])) commutersint = commuters.astype(int) comm = commutersint[0] floor = int(min_com_ct + 100) solution = pysal.region.maxp(w=w,z=similarity,floor=floor,floor_variable=comm) regions = solution.regions #print regions writecsv = r"c:\temp\allni\reg_output.csv" csv = open(writecsv,'w') csv.write('"label","reg_id"\n') in range(len(regions)):         lines in regions[i]:             csv.write('"' + lines + '","' + str(i+1) + '"\n') csv.close() flows = r"c:\temp\sqlite\matrixcsv2.csv" regs = r"c:\temp\allni\reg_output.csv" wardflows = pd.read_csv(flows) regoutput = pd.read_csv(regs) merged = pd.merge(wardflows, regoutput) #duplicate reg_id column index used later merged['reg_id2'] = merged['reg_id'] merged.to_csv("c:\temp\allni\merged.csv", index=false) mergedcsv = pd.read_csv("c:\temp\allni\merged.csv",index_col='wardid_1') #index dataframe using wardid_1 column flabellist = pd.read_csv("c:\temp\allni\merged.csv", usecols = ["wardid", "reg_id"]) #create list of flabel values  reg_id = "reg_id" ward_flows = "regintflows" flds = [reg_id, ward_flows] #create list of fields use in search  dict_ref = {} # create dictionary each reg_id list of corresponding flabel fields   #group dataframe reg_id column idgroups = flabellist.groupby('reg_id')['wardid'].apply(lambda x: x.tolist()) print idgroups  idgrp_df = pd.dataframe(idgroups)  csvcols = mergedcsv.columns  #create list of column names pass index select columns columnlist = list(mergedcsv.columns.values)  mergedcsvgroup = mergedcsv.groupby('reg_id').sum() mergedcsvgroup.describe() idlist = idgroups[2] df4 = pd.dataframe() df5 = pd.dataframe() col_ids = idlist #ward id no  regiddf = idgroups.index.get_values() print regiddf #total number of region ids #print regiddf #create pairlist combinations region ids #combinations replacement allows repeated items #pairs = list(itertools.combinations_with_replacement(regiddf, 2)) pairs = list(itertools.product(regiddf, repeat=2)) #print len(pairs)  #create new dataframe pairlists , summed data pairlist = pd.dataframe(pairs,columns=['origid','destid']) print pairlist.tail() header_pairlist = ["origid","destid","flow"] header_intflow = ["regid", "regid2", "regflow"] dfflows = pd.dataframe(columns=header_intflow)  print mergedcsv.index print mergedcsv.dtypes #mergedcsv = mergedcsv.select_dtypes(include=['int64']) #print mergedcsv.columns #mergedcsv.rename(columns = lambda x: int(x), inplace=true)  def flows():     pass  #def flows(mergedcsv, region_a, region_b): def flows(mergedcsv, ward_lista, ward_listb):     """return sum of cells in row/column intersections     of ward_lista , ward_listb."""      mergedcsv = mergedcsv.loc[:, mergedcsv.dtypes == 'int64']     regionflows = mergedcsv.loc[ward_lista, ward_listb]     regionflowsum = regionflows.values.sum()       #grid = [ax, bx, regflowsuma, regflowsumb]     gridoutput = [ax, bx, regionflowsum]     print gridoutput      return regflowsuma     return regflowsumb  #print mergedcsv.index  #mergedcsv.columns = mergedcsv.columns.str.strip()  ax, group_a in enumerate(idgroups):     ward_lista = map(int, group_a)     print ward_lista       bx, group_b in enumerate(idgroups[ax:], start=ax):         ward_listb = map(int, group_b)         #print ward_listb          flow_ab = flows(mergedcsv, ward_lista, ward_listb)             #flow_ab = flows(mergedcsv, group_a, group_b) 

this results in keyerror: 'none of [[189, 197, 198, 201]] in [columns]'

i have tried using ward_lista = map(str, group_a) , map(int, group_a) list objects not found in dataframe.loc. columns mixed datatypes columns containing labels should sliced of type int64. have tried many solutions around datatypes no avail. suggestions?

i can't speak computations you're doing, seems want arrange combinations of groups. question whether directed or undirected- is, need compute flows(a,b) , flows(b,a), or one?

if one, this:

for i,ward_list in enumerate(idgroups):     j,ward_list2 in enumerate(idgroups[i:],start=i): 

this iterate on i,j pairs like:

0,0 0,1 0,2 ... 0,n 1,1 1,2 ... 1,n 2,2 ... 2,n 

which serve in undirected case.

if need compute both flows(a,b) , flows(b,a), can push code function called flows, , call reversed args, shown. ;-)

update

let's define function called flows:

def flows():     pass 

now, parameters?

well, looking @ code, gets data dataframe. , want 2 different wards, let's start those. result seems sum of resulting grid.

def flows(df, ward_a, ward_b):     """return sum of cells in row/column intersections     of ward_a , ward_b."""      return 0 

now i'm going copy lines of code:

    ward_list = idgroups[index]     print ward_list     df6 = mergedcsv.loc[ward_list] #select rows values in list     dfcols = mergedcsv.loc[ward_list, :] #select columns values in list     ward_liststr = map(str, ward_list) #convert ward_list strings can used select columns, won't work integers.     ward_listint = map(int, ward_list)     #dfrowscols = mergedcsv.loc[ward_list, ward_listint]     df7 = df6.loc[:, ward_liststr]     print df7     regflowsum = df7.values.sum() #sum values in dataframe     intflow = [regflowsum]     print intflow 

i think of flow function right here. let's look.

  1. the ward_list either ward_a or ward_b parameters.

  2. i'm not sure df6 is, because sort of recompute in df7. need clarified.

  3. regflowsum our desired output, think.

rewriting function:

def flows(df, ward_a, ward_b):     """return sum of cells in row/column intersections     of ward_a , ward_b."""      print "computing flows from:"     print "    ", ward_a     print ""     print "flows into:"     print "    ", ward_b      # filter rows ward_a, cols ward_b:     grid = df.loc[ward_a, ward_b]      print "grid:"     print grid      flowsum = grid.values.sum()      print "flows:", flowsum      return flowsum 

now, have assumed ward_a , ward_b values in correct format. we'll have str-ify them or whatever outside function. let's that:

for ax, group_a in enumerate(idgroups):     ward_a = map(str, group_a)      bx, group_b in enumerate(idgroups[ax:], start=ax):         ward_b = map(str, group_b)          flow_ab = flows(mergedcsv, ward_a, ward_b)          if ax != bx:             flow_ba = flows(mergedcsv, ward_b, ward_a)         else:             flow_ba = flow_ab          # what? 

at point, have 2 numbers. equal when wards same (internal flow?). @ point original code stops being helpful because deals internal flows, , not a->b flows, don't know do. values in variables, ...


Comments

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -

ios - Change Storyboard View using Seague -