Friday, July 22, 2022

[FIXED] In pandas, how to reindex(fill 0) in level 2 in multiindex

July 22, 2022 dataframe, missing-data, multi-index, pandas, reindex No comments

Issue

I have a dataframe pivot table with 2 level of index: month and rating. The rating should be 1,2,3 (not to be confused with the columns 1,2,3). I found that for some months, the rating could be missing. E.g, (Population and 2021-10) only has rating 1,2. I need every month to have ratings 1,2,3. So I need to fill 0 for the missing rating index.

        tbl = pd.pivot_table(self.df, values=['ID'], index=['month', 'risk'],
                             columns=["Factor"], aggfunc='count', fill_value=0)
        tbl = tbl.droplevel(None, axis=1).rename_axis(None, axis=1).rename_axis(index={'month': None,
                                                                                       'Risk': 'Client Risk Rating'})

# show Low for rating 1, Moderate for rating 2, Potential High for rating 3
    rating = {1: 'Low',
              2: 'Moderate',
              3: 'Potential High'
              }
    pop = {'N': 'Refreshed Clients', 'Y': 'Population'}
        tbl.rename(index={**rating,**pop}, inplace=True)
        tbl = tbl.applymap(lambda x: x.replace(',', '')).astype(np.int64)
        tbl = tbl.div(tbl.sum(axis=1), axis=0)
        # client risk rating may be missing (e.g., only 1,2). 
        # To draw, need to fill the missing client risk rating with 0 
        print("before",tbl)
        tbl=tbl.reindex(pd.MultiIndex.from_product(tbl.index.levels), fill_value=0)
        print("after pd.MultiIndex.from_product",tbl)

I have used pd.MultiIndex.from_product. It does not work when all data is missing one index. For example, population has Moderate, 2021-03 and 2021-04 have Low and Moderate. After pd.MultiIndex.from_product, population has Low and Moderate, but all are missing High. My question is to have every month with risk 1,2,3. It seems the index values are from data.

Solution

You can use pd.MultiIndex.from_product to create a full index:

>>> df
                             1         2         3
(Population)       1  0.436954  0.897747  0.387058
                   2  0.464940  0.611953  0.133941
2021-08(Refreshed) 1  0.496111  0.282798  0.048384
                   2  0.163582  0.213310  0.504647
                   3  0.008980  0.651175  0.400103

>>> df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value=0)
                             1         2         3
(Population)       1  0.436954  0.897747  0.387058
                   2  0.464940  0.611953  0.133941
                   3  0.000000  0.000000  0.000000  # New record
2021-08(Refreshed) 1  0.496111  0.282798  0.048384
                   2  0.163582  0.213310  0.504647
                   3  0.008980  0.651175  0.400103

Update

I wonder df=df.reindex([1,2,3],level='rating',fill_value=0) doesn't work because the new index values [1,2,3] cannot fill the missing values for the previous rating index. By using the from_product, it creates the product of two index.

In fact it works. I mean it has an effect but not the one you expect. The method reindex the level not the values. Let me show you:

# It seems there is not effect because you don't see 3 and 4 as expected?
>>> df.reindex([1, 2, 3, 4], level='ratings')
                                   0         1         2
                   ratings                              
(Population)       1        0.536154  0.671380  0.839362
                   2        0.729484  0.512379  0.440018
2021-08(Refreshed) 1        0.279990  0.295757  0.405536
                   2        0.864217  0.798092  0.144219
                   3        0.214566  0.407581  0.736905

# But yes something happens
>>> df.reindex([1, 2, 3, 4], level='ratings').index.levels
FrozenList([['(Population)', '2021-08(Refreshed)'], [1, 2, 3, 4]])
                              The level has been reindexed ---^

# It's different from values
>>> df.reindex([1, 2, 3, 4], level='ratings').index.get_level_values('ratings')
Int64Index([1, 2, 1, 2, 3], dtype='int64', name='ratings')

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, July 22, 2022

[FIXED] In pandas, how to reindex(fill 0) in level 2 in multiindex

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels