pandas - Handling duplicate rows in python -
i have date frame df, let's 5 columns : a, b, c, d, e.
b c d e 1 6 x 8 3 2 3 y 2 3 3 5 d 1 1 3 4 g 3 4 5 3 z 3 1
this want do, rows same value of column a, want drop duplicates, value of column b should summed across rows, , rest of columns, want keep first value.
final data frame :
b c d e 1 6 x 8 3 2 3 y 2 3 3 9 d 1 1 5 3 z 3 1
how this?
i'd assign column 'b' result of grouping on 'a' , summing, can drop duplicates:
in [171]: df['b'] = df.groupby('a')['b'].transform('sum') df out[171]: b c d e 0 1 6 x 8 3 1 2 3 y 2 3 2 3 9 d 1 1 3 3 9 g 3 4 4 5 3 z 3 1 in [172]: df.drop_duplicates('a') out[172]: b c d e 0 1 6 x 8 3 1 2 3 y 2 3 2 3 9 d 1 1 4 5 3 z 3 1
Comments
Post a Comment