Issue
Dataframe:
product1 | product2 | product3 | product4 | product5 |
---|---|---|---|---|
straws | orange | melon | chair | bread |
melon | milk | book | coffee | cake |
bread | melon | coffe | chair | book |
CountProduct1 | CountProduct2 | CountProduct3 | Countproduct4 | Countproduct5 |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 1 | 1 |
2 | 3 | 2 | 2 | 2 |
RatioProduct1 | RatioProduct2 | RatioProduct3 | Ratioproduct4 | Ratioproduct5 |
---|---|---|---|---|
0.28 | 0.54 | 0.33 | 0.35 | 0.11 |
0.67 | 0.25 | 0.13 | 0.11 | 0.59 |
2.5 | 1.69 | 1.9 | 2.5 | 1.52 |
I want to create five others columns that keep my initial ratio of each item along the dataframe.
Output:
InitialRatio1 | InitialRatio2 | InitialRatio3 | InitialRatio4 | InitialRatio5 |
---|---|---|---|---|
0.28 | 0.54 | 0.33 | 0.35 | 0.11 |
0.33 | 0.25 | 0.13 | 0.31 | 0.59 |
0.11 | 0.33 | 0.31 | 0.35 | 0.13 |
Solution
Check the code again. Do you have an error in product3 = coffe and product4 = coffee? Fixed coffe to coffee. As a result, 0.31 should not be.
import pandas as pd
pd.set_option('display.max_rows', None) # print everything rows
pd.set_option('display.max_columns', None) # print everything columns
df = pd.DataFrame(
{
'product1':['straws', 'melon', 'bread'],
'product2':['orange', 'milk', 'melon'],
'product3':['melon', 'book', 'coffee'],
'product4':['chair', 'coffee', 'chair'],
'product5':['bread', 'cake', 'book'],
'time':[1,2,3],
'Count1':[1,2,2],
'Count2':[1,1,3],
'Count3':[1,1,2],
'Count4':[1,1,2],
'Count5':[1,1,2],
'ratio1':[0.28, 0.67, 2.5],
'ratio2':[0.54, 0.25, 1.69],
'ratio3':[0.33, 0.13, 1.9],
'ratio4':[0.35, 0.11, 2.5],
'ratio5':[0.11, 0.59, 1.52],
})
print(df)
product = df[['product1', 'product2', 'product3', 'product4', 'product5']].stack().reset_index()
count = df[['Count1', 'Count2', 'Count3', 'Count4', 'Count5']].stack().reset_index()
ratio = df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']].stack().reset_index()
print(ratio)
arr = pd.unique(product[0])
aaa = [i for i in range(len(arr)) if product[product[0] == arr[i]].count()[0] > 1]
for i in aaa:
prod_ind = product[product[0] == arr[i]].index
val_ratio = ratio.loc[prod_ind[0], 0]
ratio.loc[prod_ind, 0] = val_ratio
print(ratio.pivot_table(index='level_0', columns='level_1', values=[0]))
Output:
level_1 ratio1 ratio2 ratio3 ratio4 ratio5
level_0
0 0.28 0.54 0.33 0.35 0.11
1 0.33 0.25 0.13 0.11 0.59
2 0.11 0.33 0.11 0.35 0.13
To work with data, they need to be turned into one column using stack().reset_index(). Create a list of unique products arr. Further in the list aaa I get indexes of arr, which are more than one.
prod_ind = product[product[0] == arr[i]].index
In a loop, I get indexes of products that are more than one.
val_ratio = ratio.loc[prod_ind[0], 0]
Get the first value of the product.
ratio.loc[prod_ind, 0] = val_ratio
Set this value for all products. To access the values, explicit loc indexing is used, where the row indices are in square brackets on the left, and the names of the columns on the right. Read more here.
In pivot_table I create back the table. To insert the processed data into the original dataframe, simply use the following:
table = ratio.pivot_table(index='level_0', columns='level_1', values=[0])
df[['ratio1', 'ratio2', 'ratio3', 'ratio4', 'ratio5']] = table
print(df)
Answered By - inquirer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.