Issue
places_within_catchment
has a list of place_ids.
Based on the example below, from places_within_catchment
, if place_id
3 has the highest sales, remove rows where place_id = 2
.
So, going immediately to the third row (since row number 2 has been deleted), remove rows where place_id = 2
and place_id = 5
.
How can I do this in a Python function?
I have tried:
result_df = pd.DataFrame(columns=new_df_copy.columns)
for index, row in new_df_copy.iterrows():
max_avg_sales = 0
max_id = 0
# Find the id with the highest avg_sales within places_within_catchment
for place_id in row['places_within_catchment']:
if place_id in new_df_copy['id'].values:
avg_sales = new_df_copy.loc[new_df_copy['id'] == place_id, 'avg_sales'].values[0]
if avg_sales > max_avg_sales:
max_avg_sales = avg_sales
max_id = place_id
# Append the row with the maximum avg_sales to the result DataFrame
result_df = pd.concat([result_df, new_df_copy.loc[new_df_copy['id'] == max_id]], ignore_index=True)
# Display the result DataFrame
print(result_df)
This is the code to reproduce the table below:
# Data for the new DataFrame
data = {
'place_id': list(range(1, 6)),
'avg_sales': [500.4, 200.4, 600.25, 200.93, 60.1],
'places_within_catchment': [[2, 3], [1, 3, 4, 5], [1, 2, 5], [1], [1, 3]]
}
# Create the DataFrame
new_df = pd.DataFrame(data)
# Display the result
print(new_df)
this is more on the expected output and its logic
Solution
I hope I've understood you right, you can try:
data = {
"place_id": list(range(1, 6)),
"avg_sales": [500.4, 200.4, 600.25, 200.93, 60.1],
"places_within_catchment": [[2, 3], [1, 3, 4, 5], [1, 2, 5], [1], [1, 3]],
}
df = pd.DataFrame(data)
df["places_within_catchment"] = df["places_within_catchment"].apply(set)
df = df.set_index("place_id")
removed = set()
for idx, p in zip(df.index, df.places_within_catchment):
if idx in removed:
continue
to_check = p - removed
if len(to_check) == 0:
continue
removed |= p - {df.loc[df.index.isin(to_check), "avg_sales"].idxmax()}
final_df = df.loc[~df.index.isin(removed)]
print(final_df.reset_index())
Prints:
place_id avg_sales places_within_catchment
0 1 500.40 {2, 3}
1 3 600.25 {1, 2, 5}
2 4 200.93 {1}
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.