Wednesday, December 15, 2021

[FIXED] How to calculate a column using the most common words calculated from another dataframe in Python?

December 15, 2021 dataframe, nlp, numpy, pandas, python No comments

Issue

Example of the dataframe:

cup = {'Description': ['strawberry cupcake', 'blueberry cupcake', 'strawberry cookie', 'grape organic cookie', 'blueberry organic cookie', 'lemon organic cupcake'], 
'Days_Sold': [3, 4, 1, 2, 2, 1]}

cake = pd.DataFrame(data=cup)

cake

I calculated the most common words of the dataframe (with stop words removed)

from collections import Counter

Counter(" ".join(cake['Description']).split()).most_common()

I put this into a new dataframe and reset the index

count = pd.DataFrame(Counter(" ".join(cake['Description']).split()).most_common())

count.columns = ['Words', 'Values']

count.index= np.arange(1, len(count)+1)

count.head()

The Values is in the 'count' dataframe. The Days_Sold is in the 'cake' dataframe. What I would like to do now is if the common word in the 'count' dataframe shows up, like cupcake, how long would this take for me to sell the product using the 'cake' dataframe, and that would go through every common word in the 'count' dataframe until it's done? The answer should come out to be (3+4+1) 8 for cupcake.

My actual dataframe is over 3000 lines (and not exactly about cakes). The description is longer. I need over 40 common words, adjustable to my need.

This is why I can't be typing in each word. I believe this requires a 'nested for loop'. But I am stuck on it.

for day in cake:

    for top in count:

       top= count.Words

    day= cake.loc[cake['CleanDescr'] == count, ['Days_Sold']]

The error says: 'int' object is not iterable

Thank you!

Update:

Thank you so much to everyone helping me on this large project. I am posting my solution to #3, adjusted from the answer by Mark Moretto.

# Split and explode Description
df = cake.iloc[:, 0].str.lower().str.split(r"\W+").explode().reset_index()
df

# Merge counts to main DataFrame
df_freq = pd.merge(df, count, on="Description")
df_freq

# Left join cake DataFrame onto df_freq by index values.
df_freq = (pd.merge(df_freq, cake, left_on = "index", right_index = True)
            .loc[:, ["Description_x", "Values", "Days_Sold"]]
            .rename(columns={"Description_x": "Description"})
            )
df_freq

# Group by Description and return max result for value fields
df_metrics = df_freq.groupby("Description").mean().round(4)
df_metrics

df_metrics.head(5).sort_values(by='Values', ascending=False)
#print(df_metrics)

Solution

Another way, though I didn't really remove any words by frequency or anything.

# <...your starter code for dataframe creation...>

# Split and explode Description
df = cake.iloc[:, 0].str.lower().str.split(r"\W+").explode().reset_index()

# Get count of words
df_counts = (df.groupby("Description")
            .size()
            .reset_index()
            .rename(columns={0: "word_count"})
            )

# Merge counts to main DataFrame
df_freq = pd.merge(df, df_counts, on="Description")

# Left join cake DataFrame onto df_freq by index values.
df_freq = (pd.merge(df_freq, cake, left_on = "index", right_index = True)
            .loc[:, ["Description_x", "word_count", "Days_Sold"]]
            .rename(columns={"Description_x": "Description"})
            )

# Group by Description and return max result for value fields
df_metrics = df_freq.groupby("Description").max()

print(df_metrics)

Output:

             word_count  Days_Sold
Description
blueberry             2          4
cookie                3          2
cupcake               3          4
grape                 1          2
lemon                 1          1
organic               3          2
strawberry            2          3

Answered By - Mark Moretto

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 15, 2021

[FIXED] How to calculate a column using the most common words calculated from another dataframe in Python?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels