Tuesday, December 12, 2023

[FIXED] Get answer using a formula in python with pandas

December 12, 2023 dataframe, loops, pandas, python, python-3.x No comments

Issue

I have some data like below, about a 100k rows with different batches.

batch    b       c
 A      10       2     
 A      20       4     
 A      30       6     
 B      5        1     
 B      10       2   
 B      15       3   
 B      20       4    
 .       .       .     
 .       .       . 
 .       .       .

What I am trying to do is to is, using a formula, I want only 1 line of data frame per group with the help of a summation formula.

The formula is same throughout all batches as below

           (b1xc1)+(b2xc2)+(b3xc3)   10x2 + 20x4 + 30x6    20 + 80 + 180     280
 batch A = _______________________ = __________________ = _______________ = _____ = 23.33
               c1 + c2 + c3             2 + 4 + 6              12            12

So the new_df will be:

batch   new_value
 A        23.3 
 B        15.0
 .         .
 .         .
 .         .

Please Note: there are zero values in col C, so might be possible divide by zero may occur.

With my approach I created a new column bxc and then summing up values by group and dividing by sum of c for that group, but I think it is not much effective way. Is there an alternate way to achieve this? Thank you :)

Solution

To make it easier to follow I am recreating sample data as follows:

import pandas as pd

data = {'batch': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'b': [10, 20, 30, 5, 10, 15, 20],
        'c': [2, 4, 6, 1, 2, 3, 4]}

df = pd.DataFrame(data)

Then I would recommend using a custom function to calculate the value and to avoid division by zero:

def calculate_new_value(group):
    numerator = (group['b'] * group['c']).sum()
    denominator = group['c'].sum()
    
    if denominator == 0:
        return 0
    
    return round(numerator / denominator, 1)

Applying this function with apply after groupby and setting your index with reset_index to new_value results in:

print(df.groupby('batch').apply(calculate_new_value).reset_index(name='new_value'))

  batch  new_value
0     A  23.3
1     B  15.0

Answered By - sm3sher

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, December 12, 2023

[FIXED] Get answer using a formula in python with pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels