Wednesday, July 20, 2022

[FIXED] How to clean survey data in pandas

July 20, 2022 data-cleaning, dataframe, numpy, pandas, python No comments

Issue

Input:

Output:

here's the data:

d = {'Morning': ["Didn't answer", "Didn't answer", "Didn't answer", 'Morning', "Didn't answer"], 'Afternoon': ["Didn't answer", 'Afternoon', "Didn't answer", 'Afternoon', "Didn't answer"], 'Night': ["Didn't answer", 'Night', "Didn't answer", 'Night', 'Night'], 'Sporadic': ["Didn't answer", "Didn't answer", 'Sporadic', "Didn't answer", "Didn't answer"], 'Constant': ["Didn't answer", "Didn't answer", "Didn't answer", 'Constant', "Didn't answer"]}

         Morning      Afternoon          Night       Sporadic       Constant
0  Didn't answer  Didn't answer  Didn't answer  Didn't answer  Didn't answer
1  Didn't answer      Afternoon          Night  Didn't answer  Didn't answer
2  Didn't answer  Didn't answer  Didn't answer       Sporadic  Didn't answer
3        Morning      Afternoon          Night  Didn't answer       Constant
4  Didn't answer  Didn't answer          Night  Didn't answer  Didn't answer

I want the output to be:

d = {"Time of the day": ["Didn't answer", "['Afternoon', 'Night']", "Sporadic", "['Morning', 'Afternoon', 'Night', 'Constant']", "Night"]}

                                 Time of the day
0                                  Didn't answer
1                         ['Afternoon', 'Night']
2                                       Sporadic
3  ['Morning', 'Afternoon', 'Night', 'Constant']
4                                          Night

so if there's no answer in every column in a row, the value in the new data frame would be "Didn't answer" and if there's at least one answer like "night", the value in the new data frame would be "night" and if there are multiple answers like " Morning", "Night" the value in the new data frame would be a list of the answers

Solution

You can use:

df["ToD"] = (df.replace("Didn't answer", np.nan).stack().groupby(level=0)
               .apply(lambda x: [i for i in x] if len(x) > 1 else x.iloc[0])
               .reindex(df.index, fill_value="Didn't answer"))

Output:

>>> df["ToD"]
0                            Didn't answer
1                       [Afternoon, Night]
2                                 Sporadic
3    [Morning, Afternoon, Night, Constant]
4                                    Night
Name: ToD, dtype: object

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 20, 2022

[FIXED] How to clean survey data in pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels