Saturday, November 25, 2023

[FIXED] python: sort a value into correct dataframe column based on the columns numeric name

November 25, 2023 numpy, pandas, python No comments

Issue

i have 2 dataframes

df1 is collected data in the form

Index       0            1            2           ...
0       (float,int)  (float,int)  (float,int)  (float,int)
1       (float,int)  (float,int)  (float,int)  (float,int)
...     (float,int)  (float,int)  (float,int)  (float,int)

df2 is an empty df build like this:

df2 = pd.DataFrame(index=df1.index, columns = np.arange(min, max, step).tolist())

Index       float0      float1       float2           ...
0
1
...

My problem is I need to compare for every entry in df1 the listed float number to the column names in df2 and sort its corresponding int number into df2 and add it to any preexisting value there.

So far I got:

for j in range(len(df1)): # for every row
    for i in range(len(df1.columns)): # for every column
        # the following line is only pseudo code which I can't figure out how to phrase
        y = df2 column to which df1[i][j][0] is closest in value
        df2[y][j] = df2[y][j] + df1[i][j][1]

So for example if:

df1 =

Index      0       1       2      ...
0       (.2,3)  (.4,5)  (.4,4)  (.6,2)
1       (.5,2)  (.8,8)  (.8,5)  (.2,9)
...     (.4,3)  (.2,7)  (.3,4)  (.7,1)

df2 =

Index     .24     .47     .79     ...
0
1
...

df2(filled) =

Index     .24     .47     .79     ...
0          3      5+4      2
1          9       2      8+5
...

Solution

You could try to iterate over each cell in df1, unpack the tuple to get the float and int values, find the closest column in df2, and then update df2 accordingly. Here's how you can do it:

import pandas as pd
import numpy as np

# Assuming df1 is already defined
# Define df2 with the given structure
# df2 = pd.DataFrame(index=df1.index, columns=np.arange(min_value, max_value, step).tolist())

def find_closest_column(value, columns):
    """Find the column name in df2 that is closest to the given value."""
    return min(columns, key=lambda x: abs(x - value))

# Initialize df2 with zeros (or any default value you prefer)
df2 = df2.fillna(0)

for row_index in df1.index:
    for col_index in df1.columns:
        float_val, int_val = df1.at[row_index, col_index]  # Unpack the tuple from df1
        closest_col = find_closest_column(float_val, df2.columns.astype(float))
        df2.at[row_index, closest_col] += int_val  # Accumulate the int values in df2

# df2 now contains the accumulated values

This script will modify df2 such that for each tuple in df1, it finds the closest column in df2 and accumulates the integer part of the tuple in df2.

Note: Ensure that the column names in df2 are floats. If they are not, you might need to convert them using df2.columns.astype(float) in the find_closest_column function. Also, this solution assumes that df1 and df2 are properly aligned in terms of indices.

Answered By - Kadir

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 25, 2023

[FIXED] python: sort a value into correct dataframe column based on the columns numeric name

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels