Wednesday, December 1, 2021

[FIXED] Sort chat conversation in Pandas

December 01, 2021 group-by, pandas, sorting No comments

Issue

I have a dataset of chat conversations that looks like this (where the message_id is an index for all of the messages in the database).:

| message_id | to_user | from_user | message      |
|------------|---------|-----------|--------------|
| 123        | al      | sal       | hi           |
| 871        | al      | hal       | hey          |
| 989        | al      | bob       | me too       |
| 900        | sal     | sal       | hello        |
| 107        | bob     | al        | i'm bob      |
| 242        | sal     | al        | how are you? |
| 101        | al      | bob       | hi, i'm al   |
| 898        | sal     | al        | i'm good     |

What I want to do is sort this table to reflect a conversation between two people. So it would first group all of the conversations from to_user and each from_user that they chatted with, and then for each conversation between a to_user and from_user sort them by their message_id so it would reflect the back and forth conversation.

| message_id | to_user | from_user | message      |
|------------|---------|-----------|--------------|
| 101        | al      | bob       | hi, i'm al   |
| 107        | bob     | al        | i'm bob      |
| 989        | al      | bob       | me too       |
| 123        | al      | sal       | hi           |
| 242        | sal     | al        | how are you? |
| 871        | al      | sal       | hey          |
| 898        | sal     | al        | i'm good     |
| 900        | sal     | al        | hello        |

How would I accomplish this in Pandas?

Solution

We can use np.sort to sort values across rows so that we have columns that specify the participants, but not direction, then sort by conversation and message id with DataFrame.sort_values:

df[['person_a', 'person_b']] = np.sort(df[['to_user', 'from_user']])
df = df.sort_values(['message_id', 'person_a', 'person_b'], ignore_index=True)

   message_id to_user from_user       message person_a person_b
0         101      al       bob    hi, i'm al       al      bob
1         107     bob        al       i'm bob       al      bob
2         989      al       bob        me too       al      bob
3         123      al       sal            hi       al      sal
4         242     sal        al  how are you?       al      sal
5         871      al       sal           hey       al      sal
6         898     sal        al      i'm good       al      sal
7         900     sal        al         hello       al      sal

We can drop these additional columns after we're done with them:

df[['person_a', 'person_b']] = np.sort(df[['to_user', 'from_user']])
df = df.sort_values(
    ['message_id', 'person_a', 'person_b'], ignore_index=True
).drop(columns=['person_a', 'person_b'])

df:

   message_id to_user from_user       message
0         101      al       bob    hi, i'm al
1         107     bob        al       i'm bob
2         989      al       bob        me too
3         123      al       sal            hi
4         242     sal        al  how are you?
5         871      al       sal           hey
6         898     sal        al      i'm good
7         900     sal        al         hello

Setup and imports (edited to match output):

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'message_id': [123, 871, 989, 900, 107, 242, 101, 898],
    'to_user': ['al', 'al', 'al', 'sal', 'bob', 'sal', 'al', 'sal'],
    'from_user': ['sal', 'sal', 'bob', 'al', 'al', 'al', 'bob', 'al'],
    'message': ['hi', 'hey', 'me too', 'hello', "i'm bob", 'how are you?',
                "hi, i'm al", "i'm good"]
})

Answered By - Henry Ecker

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 1, 2021

[FIXED] Sort chat conversation in Pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels