Saturday, March 5, 2022

[FIXED] Collapse certain columns horizontally

March 05, 2022 dataframe, pandas, python No comments

Issue

I have:

haves = pd.DataFrame({'Product':['R123','R234'],
                        'Price':[1.18,0.23],
                        'CS_Medium':[1, 0],
                        'CS_Small':[0, 1],
                        'SC_A':[1,0],
                        'SC_B':[0,1],
                        'SC_C':[0,0]})
print(haves)

given a list of columns, like so:

list_of_starts_with = ["CS_", "SC_"]

I would like to arrive here:

wants = pd.DataFrame({'Product':['R123','R234'],
                        'Price':[1.18,0.23],
                        'CS':['Medium', 'Small'],
                        'SC':['A', 'B'],})

print(wants)

I am aware of wide_to_long but don't think it is applicable here?

Solution

Based on the list of columns (assuming the starts_with is enough to identify them), it is possible to do the changes in bulk:

def preprocess_column_names(list_of_starts_with, column_names):
    "Returns a list of tuples (merged_column_name, options, columns)"
    columns_to_transform = []
    for starts_with in list_of_starts_with:
        len_of_start = len(starts_with)
        columns = [col for col in column_names if col.startswith(starts_with)]
        options = [col[len_of_start:] for col in columns]
        merged_column_name = starts_with[:-1]  # Assuming that the last char is not needed
        columns_to_transform.append((merged_column_name, options, columns))
    return columns_to_transform


def merge_columns(df, merged_column_name, options, columns):
    for col, option in zip(columns, options):
        df.loc[df[col] == 1, merged_column_name] = option
    return df.drop(columns=columns)

def merge_all(df, columns_to_transform):
    for merged_column_name, options, columns in columns_to_transform:
        df = merge_columns(df, merged_column_name, options, columns)
    return df

And to run:

columns_to_transform = preprocess_column_names(list_of_starts_with, haves.columns)
wants = merge_all(haves, columns_to_transform)

If your column names are not surprising (such as Index_ being in list_of_starts_with) the above code should solve the problem with a reasonable performance.

Answered By - nonDucor

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Collapse certain columns horizontally

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels