Tuesday, November 23, 2021

[FIXED] Barplot from unorganised data - dataframe creation?

November 23, 2021 matplotlib, pandas, python, python-2.7, seaborn No comments

Issue

From the table below, I need to create 4 different barplots, corresponding to the 4 diffeent places TST1 TST2 TST3 TST4 TST5

Each barplot should have 8 ticks for NOT_DONE INCOMP UNTESTED 30 35 40 45 50 in that order if possible. The ticks will correspond to the number of time each "value" appears for that given place. (The places are one of 4 options: L1 L2 L3 L4)

However:

Only the values in the right-most column are to be considered meaning if no values are found in TST5, then the program should check TST4 etc until it finds a value. If no value is found in either of these 5 columns then no value is counted. If a value is found then it does not matter what is to the left of it.

My thought process for that would be to create a new column dataframe with the values I need (so the most right values for each row) and their corresponding place. I am new to all this and unsure how to do it so any help in which direction to go would be greatly appreciated.

I am required to use python 2.7, I am also using seaborn for the plotting.

+-------+----------+----------+----------+--------+----------+
| PLACE | TST1     | TST2     | TST3     | TST4   | TST5     |
+-------+----------+----------+----------+--------+----------+
| L1    |          | NOT_DONE |          |        | 50       |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          | 35       |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          | INCOMP   |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    | UNTESTED |          |          | INCOMP |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          | 30       |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          | INCOMP   | 40       |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          |          |          |        | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          | INCOMP   |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          | 50       |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          | UNTESTED | 35     | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L2    |          | 40       |          | INCOMP |          |
+-------+----------+----------+----------+--------+----------+
| L3    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L1    |          |          |          |        |          |
+-------+----------+----------+----------+--------+----------+
| L4    |          | NOT_DONE |          | 30     | NOT_DONE |
+-------+----------+----------+----------+--------+----------+

Solution

I am required to use python 2.7, I am also using seaborn for the plotting.

Tested on python 2.7.18 and pandas 0.24.2 (though it works fine in python 3):

Propagate the right-most values (ignoring PLACE) using ffill along columns:

df['TST'] = df.drop(columns='PLACE').ffill(axis='columns').iloc[:, -1]

Group by PLACE and get their value_counts:

data = df.groupby('PLACE')['TST'].value_counts().reset_index(name='COUNT')

#   PLACE       TST  COUNT
# 0    L1        35      1
# 1    L1        50      1
# 2    L2    INCOMP      2
# 3    L2        50      1
# 4    L3    INCOMP      2
# 5    L3        40      1
# 6    L3  NOT_DONE      1
# 7    L4        30      1
# 8    L4  NOT_DONE      1
# 9    L4  UNTESTED      1

Then pass this data into catplot (use the order param to set your preferred tick order):

incompletes = ['NOT_DONE', 'INCOMP', 'UNTESTED']
ticks = incompletes + sorted(data.TST.unique())[:len(incompletes)]

g = sns.catplot(x='TST', y='COUNT', col='PLACE', col_wrap=2,
                data=data, order=ticks, kind='bar')
g.set_xticklabels(rotation=90)

Versions:

>>> sys.version
2.7.18 (default, Mar 15 2021, 14:29:03) \n[GCC 10.2.0]
>>> pandas.__version__
0.24.2
>>> matplotlib.__version__
2.2.5
>>> seaborn.__version__
0.9.1

Answered By - tdy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 23, 2021

[FIXED] Barplot from unorganised data - dataframe creation?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels