python - Finding rows in a Pandas DataFrame with columns that violate a one-to-one mapping -
i have dataframe kinda this:
| index | col_1 | col_2 | | 0 | | 11 | | 1 | b | 12 | | 2 | b | 12 | | 3 | c | 13 | | 4 | c | 13 | | 5 | c | 14 |
where col_1
, col_2
may not one-to-one due corrupt data.
how can use pandas determine rows have col_1
, col_2
entries violate one-to-one relationship?
in case last 3 rows since c can either map 13 or 14.
you use transform, counting length of unique objects in each group. first @ subset of these columns, , groupby single column:
in [11]: g = df[['col1', 'col2']].groupby('col1') in [12]: counts = g.transform(lambda x: len(x.unique())) in [13]: counts out[13]: col2 0 1 1 1 2 1 3 2 4 2 5 2
the columns remaining columns (if not all)
in [14]: (counts == 1).all(axis=1) out[14]: 0 true 1 true 2 true 3 false 4 false 5 false dtype: bool
Comments
Post a Comment