Issue
i have 2 dataframes
df1 is collected data in the form
Index 0 1 2 ...
0 (float,int) (float,int) (float,int) (float,int)
1 (float,int) (float,int) (float,int) (float,int)
... (float,int) (float,int) (float,int) (float,int)
df2 is an empty df build like this:
df2 = pd.DataFrame(index=df1.index, columns = np.arange(min, max, step).tolist())
Index float0 float1 float2 ...
0
1
...
My problem is I need to compare for every entry in df1 the listed float number to the column names in df2 and sort its corresponding int number into df2 and add it to any preexisting value there.
So far I got:
for j in range(len(df1)): # for every row
for i in range(len(df1.columns)): # for every column
# the following line is only pseudo code which I can't figure out how to phrase
y = df2 column to which df1[i][j][0] is closest in value
df2[y][j] = df2[y][j] + df1[i][j][1]
So for example if:
df1 =
Index 0 1 2 ...
0 (.2,3) (.4,5) (.4,4) (.6,2)
1 (.5,2) (.8,8) (.8,5) (.2,9)
... (.4,3) (.2,7) (.3,4) (.7,1)
df2 =
Index .24 .47 .79 ...
0
1
...
df2(filled) =
Index .24 .47 .79 ...
0 3 5+4 2
1 9 2 8+5
...
Solution
You could try to iterate over each cell in df1, unpack the tuple to get the float and int values, find the closest column in df2, and then update df2 accordingly. Here's how you can do it:
import pandas as pd
import numpy as np
# Assuming df1 is already defined
# Define df2 with the given structure
# df2 = pd.DataFrame(index=df1.index, columns=np.arange(min_value, max_value, step).tolist())
def find_closest_column(value, columns):
"""Find the column name in df2 that is closest to the given value."""
return min(columns, key=lambda x: abs(x - value))
# Initialize df2 with zeros (or any default value you prefer)
df2 = df2.fillna(0)
for row_index in df1.index:
for col_index in df1.columns:
float_val, int_val = df1.at[row_index, col_index] # Unpack the tuple from df1
closest_col = find_closest_column(float_val, df2.columns.astype(float))
df2.at[row_index, closest_col] += int_val # Accumulate the int values in df2
# df2 now contains the accumulated values
This script will modify df2 such that for each tuple in df1, it finds the closest column in df2 and accumulates the integer part of the tuple in df2.
Note: Ensure that the column names in df2 are floats. If they are not, you might need to convert them using df2.columns.astype(float) in the find_closest_column function. Also, this solution assumes that df1 and df2 are properly aligned in terms of indices.
Answered By - Kadir
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.