Issue
I am having dataframe idf
as below.
feature_name idf_weights
2488 kralendijk 11.221923
3059 night 0
1383 ebebf 0
I have another Dataframe df
message Number of Words in each message
0 night kralendijk ebebf 3
I want to add idf weights
from idf
for each word in the "df" dataframe in a new column.
The output will look like the below:
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 1
Here is what I've tried so far, but it's giving the total count of words instead of word having idf_weight>0
:
words_weights = dict(idf[['feature_name', 'idf_weights']].values)
df['> zero'] = df['message'].apply(lambda x: count([words_weights.get(word, 11.221923) for word in x.split()]))
Output
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 3
Thank you.
Solution
Try using a list comprehension:
# set up a dictionary for easy feature->weight indexing
d = idf.set_index('feature_name')['idf_weights'].to_dict()
# {'kralendijk': 11.221923, 'night': 0.0, 'ebebf': 0.0}
df['> zero'] = [sum(d.get(w, 0)>0 for w in x.split()) for x in df['message']]
## OR, slighlty faster alternative
# df['> zero'] = [sum(1 for w in x.split() if d.get(w, 0)>0) for x in df['message']]
output:
message Number of Words in each message > zero
0 night kralendijk ebebf 3 1
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.