python - Finding rows in a Pandas DataFrame with columns that violate a one-to-one mapping -


i have dataframe kinda this:

| index | col_1 | col_2 | | 0     |     | 11    | | 1     | b     | 12    | | 2     | b     | 12    | | 3     | c     | 13    | | 4     | c     | 13    | | 5     | c     | 14    | 

where col_1 , col_2 may not one-to-one due corrupt data.

how can use pandas determine rows have col_1 , col_2 entries violate one-to-one relationship?

in case last 3 rows since c can either map 13 or 14.

you use transform, counting length of unique objects in each group. first @ subset of these columns, , groupby single column:

in [11]: g = df[['col1', 'col2']].groupby('col1')  in [12]: counts = g.transform(lambda x: len(x.unique()))  in [13]: counts out[13]:   col2 0    1 1    1 2    1 3    2 4    2 5    2 

the columns remaining columns (if not all)

in [14]: (counts == 1).all(axis=1) out[14]: 0     true 1     true 2     true 3    false 4    false 5    false dtype: bool 

Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

ios - Change Storyboard View using Seague -