Wednesday, September 28, 2022

[FIXED] Write a function to perform calculations on multiple columns in a Pandas dataframe

September 28, 2022 numpy, pandas, python No comments

Issue

I have the following dataframe (the real one has a lot more columns and rows, so just using this as an example):

 {'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 1.2, 1: 4.6, 2: 35.4, 3: 0.005},
 'smell': {0: 20.0, 1: 23.0, 2: 2.1, 3: 5.3},
 'shape': {0: 0.004, 1: 0.2, 2: 0.12, 3: 11.0},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}}

I'd like to write a function to perform calculations on the dataframe, for specific columns. The calculation is in the code below. As I'd only want to apply the code to specific columns, I've set up a list of columns, and as there is a pre-defined 'factor' we need to take into account in the calculation, I set this up too:

cols = ['taste', 'smell', 'shape']
factor = 72

def multiply_columns(row):
    return ((row[cols] / row['volume']) * (factor * row['volume'] / row['weight']) / 1000)

Then, I apply the function to the dataframe, and I want to overwrite the original column values with the new ones, so I do this:

for cols in df.columns:
    df[cols] = df[cols].apply(multiply_columns)

But I get the following error:

~\AppData\Local\Temp/ipykernel_8544/3939806184.py in multiply_columns(row)
      3 
      4 def multiply_columns(row):
----> 5     return ((row[cols] / row['volume']) * (factor * row['volume'] / row['weight']) / 1000)
      6 
      7 

TypeError: string indices must be integers

But the values I'm using in the calculation aren't strings:

sample        object
sample id      int64
replicate      int64
taste        float64
smell        float64
shape        float64
volume         int64
weight       float64
dtype: object

The desired output would be:

{'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 0.0074, 1: 0.028366667, 2: 0.2183, 3: 3.08333e-05},
 'smell': {0: 0.123333333, 1: 0.141833333, 2: 0.01295, 3: 0.032683333},
 'shape': {0: 2.46667e-05, 1: 0.001233333, 2: 0.00074, 3: 0.067833333},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}}

Can anyone kindly show me the errors of my ways

Solution

This has a few issues.

If you wanted to index elements in row, the index you're using is a string (the column name) rather than an integer (like an index). To get an index for the column names you're interested in, you could use this:

cols = ['taste', 'smell', 'shape']
cols_idx = [df.columns.get_loc(col) for col in cols]

However, if I understand your question, you could perform this operation on columns directly with the understanding that the operation will be performed on each row. See a test case that worked for me:

import pandas as pd

df = pd.DataFrame({'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 1.2, 1: 4.6, 2: 35.4, 3: 0.005},
 'smell': {0: 20.0, 1: 23.0, 2: 2.1, 3: 5.3},
 'shape': {0: 0.004, 1: 0.2, 2: 0.12, 3: 11.0},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}})

cols = ['taste', 'smell', 'shape']

factor = 72

for col in cols:
    df[col] = ((df[col] / df['volume']) * (factor * df['volume'] / df['weight']) / 1000)

Note that your line

for cols in df.columns:

indicated you should run this operation on every column (cols became the index and was no longer your list).

Answered By - Meredith F

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, September 28, 2022

[FIXED] Write a function to perform calculations on multiple columns in a Pandas dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels