Issue
I have a dataframe pivot table with 2 level of index: month and rating. The rating should be 1,2,3 (not to be confused with the columns 1,2,3). I found that for some months, the rating could be missing. E.g, (Population and 2021-10) only has rating 1,2. I need every month to have ratings 1,2,3. So I need to fill 0 for the missing rating index.
tbl = pd.pivot_table(self.df, values=['ID'], index=['month', 'risk'],
columns=["Factor"], aggfunc='count', fill_value=0)
tbl = tbl.droplevel(None, axis=1).rename_axis(None, axis=1).rename_axis(index={'month': None,
'Risk': 'Client Risk Rating'})
# show Low for rating 1, Moderate for rating 2, Potential High for rating 3
rating = {1: 'Low',
2: 'Moderate',
3: 'Potential High'
}
pop = {'N': 'Refreshed Clients', 'Y': 'Population'}
tbl.rename(index={**rating,**pop}, inplace=True)
tbl = tbl.applymap(lambda x: x.replace(',', '')).astype(np.int64)
tbl = tbl.div(tbl.sum(axis=1), axis=0)
# client risk rating may be missing (e.g., only 1,2).
# To draw, need to fill the missing client risk rating with 0
print("before",tbl)
tbl=tbl.reindex(pd.MultiIndex.from_product(tbl.index.levels), fill_value=0)
print("after pd.MultiIndex.from_product",tbl)
I have used pd.MultiIndex.from_product. It does not work when all data is missing one index. For example, population has Moderate, 2021-03 and 2021-04 have Low and Moderate. After pd.MultiIndex.from_product, population has Low and Moderate, but all are missing High. My question is to have every month with risk 1,2,3. It seems the index values are from data.
Solution
You can use pd.MultiIndex.from_product
to create a full index:
>>> df
1 2 3
(Population) 1 0.436954 0.897747 0.387058
2 0.464940 0.611953 0.133941
2021-08(Refreshed) 1 0.496111 0.282798 0.048384
2 0.163582 0.213310 0.504647
3 0.008980 0.651175 0.400103
>>> df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value=0)
1 2 3
(Population) 1 0.436954 0.897747 0.387058
2 0.464940 0.611953 0.133941
3 0.000000 0.000000 0.000000 # New record
2021-08(Refreshed) 1 0.496111 0.282798 0.048384
2 0.163582 0.213310 0.504647
3 0.008980 0.651175 0.400103
Update
I wonder
df=df.reindex([1,2,3],level='rating',fill_value=0)
doesn't work because the new index values [1,2,3] cannot fill the missing values for the previous rating index. By using the from_product, it creates the product of two index.
In fact it works. I mean it has an effect but not the one you expect. The method reindex the level not the values. Let me show you:
# It seems there is not effect because you don't see 3 and 4 as expected?
>>> df.reindex([1, 2, 3, 4], level='ratings')
0 1 2
ratings
(Population) 1 0.536154 0.671380 0.839362
2 0.729484 0.512379 0.440018
2021-08(Refreshed) 1 0.279990 0.295757 0.405536
2 0.864217 0.798092 0.144219
3 0.214566 0.407581 0.736905
# But yes something happens
>>> df.reindex([1, 2, 3, 4], level='ratings').index.levels
FrozenList([['(Population)', '2021-08(Refreshed)'], [1, 2, 3, 4]])
The level has been reindexed ---^
# It's different from values
>>> df.reindex([1, 2, 3, 4], level='ratings').index.get_level_values('ratings')
Int64Index([1, 2, 1, 2, 3], dtype='int64', name='ratings')
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.