pandas - Handling duplicate rows in python -

i have date frame df, let's 5 columns : a, b, c, d, e.

    b   c    d    e    1   6   x    8    3   2   3   y    2    3   3   5   d    1    1   3   4   g    3    4   5   3   z    3    1

this want do, rows same value of column a, want drop duplicates, value of column b should summed across rows, , rest of columns, want keep first value.

final data frame :

    b   c    d    e    1   6   x    8    3   2   3   y    2    3   3   9   d    1    1   5   3   z    3    1

how this?

i'd assign column 'b' result of grouping on 'a' , summing, can drop duplicates:

in [171]:  df['b'] = df.groupby('a')['b'].transform('sum') df out[171]:     b  c  d  e 0  1  6  x  8  3 1  2  3  y  2  3 2  3  9  d  1  1 3  3  9  g  3  4 4  5  3  z  3  1 in [172]:  df.drop_duplicates('a') out[172]:     b  c  d  e 0  1  6  x  8  3 1  2  3  y  2  3 2  3  9  d  1  1 4  5  3  z  3  1

Search This Blog

Politics

pandas - Handling duplicate rows in python -

Comments

Post a Comment

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -