Issue
I want to calculate the average value of the column 'score2' for each 'show_id', and insert this new value in a new field called 'avg_score'.
this below is my json structure:
{'show_id': 's1026',
'type': 'TV Show',
'title': 'BoJack Horseman',
'country': 'United States',
'IMDB_Rating': 8.7,
'No_of_Votes': 113345,
'tweet': [{'_id': '60dae6af60d63d1fa250cb25',
'text': '@iitstrasha the end of the f***ing world, that 70s show, BoJack Horseman',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 2.0,},
{'_id': '60dae6b060d63d1fa251d422',
'text': "@longbothom BoJack Horseman, peaky blinders, The Crown, Orange is the new black, sherlock, Finding 'Ohana",
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 0.0},
{'_id': '60dae6b360d63d1fa258134a',
'text': 'merlin-is-dead: I’ve been rewatching BoJack Horseman for the first time since it’s finale. I felt like drawing a Di',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 0.8150000000000001, 'pos': 0.185, 'compound': 0.3612}",
'score2': 0.185}]}
And this is my desired json structure
{'show_id': 's1026',
'type': 'TV Show',
'title': 'BoJack Horseman',
'country': 'United States',
'IMDB_Rating': 8.7,
'No_of_Votes': 113345,
'avg_score' : MEAN OF ALL SCORE2,
'tweet': [{'_id': '60dae6af60d63d1fa250cb25',
'text': '@iitstrasha the end of the f***ing world, that 70s show, BoJack Horseman',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 2.0},
{'_id': '60dae6b060d63d1fa251d422',
'text': "@longbothom BoJack Horseman, peaky blinders, The Crown, Orange is the new black, sherlock, Finding 'Ohana",
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 0.0},
{'_id': '60dae6b360d63d1fa258134a',
'text': 'merlin-is-dead: I’ve been rewatching BoJack Horseman for the first time since it’s finale. I felt like drawing a Di',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 0.8150000000000001, 'pos': 0.185, 'compound': 0.3612}",
'score2': 0.185}]}
For each title of series or film there are as many rows as there are tweets. Thanks to all who reply and have a good evening.
i am not familiar with json language, for this reason i would create, non nested field called 'avg_score'. Whit this method i can lead back to pandas dataframe and work on it.
i tried to use this query but it seems not working:
results = database.aggregate([
{ '$group': {'_id': '$title', 'vote': { '$sum': "$tweet.score2" }} }
])
[result for result in results]
Solution
You have to use $addFields
pipeline stafe instead and perform the required calculation.
results = database.aggregate([
{
"$addFields": {
"avg_score": {
"$avg": {
"$map": {
"input": "$tweet",
"in": "$$this.score2"
}
},
}
}
},
])
Mongo Playground Sample Execution
Answered By - hhharsha36
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.