python - pandas: concatenate dataframes, forward-fill and multiindex on column data -


i have 2 csv files same column names, different values.

the first column index (time) , 1 of data columns unique identifier (id)

the index (time) different each csv file.

i have read data 2 dataframes using read_csv, giving me following:

        +-------+------+-------+         | id    | size | price | +-------+-------+------+-------+ | time  |       |      |       | +-------+-------+------+-------+ | t0    | id1   | 10   | 110   | | t2    | id1   | 12   | 109   | | t6    | id1   | 20   | 108   | +-------+-------+------+-------+          +-------+------+-------+         | id    | size | price | +-------+-------+------+-------+ | time  |       |      |       | +-------+-------+------+-------+ | t1    | id2   |  9   |  97   | | t3    | id2   | 15   |  94   | | t5    | id2   | 13   | 100   | +-------+-------+------+-------+ 

i create single large dataframe entries both, , use ffill forward fill values previous time-step.

i able achieve using combination of concat, sort , ffill.

however, requires renaming columns of 1 of dataframes first, there aren't name clashes

df2.columns = [ 'id', 'id2_size', 'id2_price' ] df = pd.concat([df1, df2]).sort().ffill() 

this results in following dataframe:

        +------+------+-------+----------+-----------+         | id   | size | price | id2_size | id2_price | +-------+------+------+-------+----------+-----------+ | time  |      |      |       |          |           | +-------+------+------+-------+----------+-----------+ | t0    | id1  | 10   | 110   |     nan  |     nan   | | t1    | id2  | 10   | 110   |      9   |      97   | | t2    | id1  | 12   | 109   |      9   |      97   | | t3    | id2  | 12   | 109   |     15   |      94   | | t5    | id2  | 12   | 109   |     13   |     100   | | t6    | id1  | 20   | 108   |     13   |     100   | +-------+------+------+-------+----------+-----------+ 

my current method klunky in have rename columns of 1 of dataframes.

i believe better way represent data use multiindex 2nd dimension's value coming id column.

the resulting dataframe this:

        +--------------+--------------+         | id1          | id2          |         +------+-------+------+-------+         | size | price | size | price | +-------+------+-------+------+-------+ | time  |      |       |      |       | +-------+------+-------+------+-------+ | t0    | 10   | 110   | nan  | nan   | | t1    | 10   | 110   |  9   |  97   | | t2    | 12   | 109   |  9   |  97   | | t3    | 12   | 109   | 15   |  94   | | t5    | 12   | 109   | 13   | 100   | | t6    | 20   | 108   | 13   | 100   | +-------+------+-------+------+-------+ 

is possible?
if so, steps required go 2 dataframes read csv, final merged multiindexed dataframe?

here's one-liner ask, although it's bit convoluted in terms of stacking/unstacking:

df1.append(df2).set_index(['time','id']).sort().stack().unstack(level=[1,2]).ffill()  id    id1        id2            size price size price time                       t0     10   110  nan   nan t1     10   110    9    97 t2     12   109    9    97 t3     12   109   15    94 t5     12   109   13   100 t6     20   108   13   100 

fwiw, default approach have been following, little more straightforward (less stacking/unstacking) , give same basic results, different column organization:

df1.append(df2).set_index(['time','id']).sort().unstack().ffill()       size     price      id    id1 id2   id1  id2 time                     t0     10 nan   110  nan t1     10   9   110   97 t2     12   9   109   97 t3     12  15   109   94 t5     12  13   109  100 t6     20  13   108  100 

and along lines, add swaplevel , sort columns reorganized in first approach:

df1.append(df2).set_index(['time','id']).sort().unstack().ffill().swaplevel(0,1,axis=1).sort(axis=1) 

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -