Issue
From the table below, I need to create 4 different barplots, corresponding to the 4 diffeent places TST1
TST2
TST3
TST4
TST5
Each barplot should have 8 ticks for NOT_DONE
INCOMP
UNTESTED
30
35
40
45
50
in that order if possible. The ticks will correspond to the number of time each "value" appears for that given place. (The places are one of 4 options: L1
L2
L3
L4
)
However:
Only the values in the right-most column are to be considered meaning if no values are found in TST5, then the program should check TST4 etc until it finds a value. If no value is found in either of these 5 columns then no value is counted. If a value is found then it does not matter what is to the left of it.
My thought process for that would be to create a new column dataframe with the values I need (so the most right values for each row) and their corresponding place. I am new to all this and unsure how to do it so any help in which direction to go would be greatly appreciated.
I am required to use python 2.7, I am also using seaborn for the plotting.
+-------+----------+----------+----------+--------+----------+
| PLACE | TST1 | TST2 | TST3 | TST4 | TST5 |
+-------+----------+----------+----------+--------+----------+
| L1 | | NOT_DONE | | | 50 |
+-------+----------+----------+----------+--------+----------+
| L1 | | | 35 | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | INCOMP | | |
+-------+----------+----------+----------+--------+----------+
| L2 | UNTESTED | | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | 30 | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | INCOMP | 40 | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | | | | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | INCOMP | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | 50 | | | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | UNTESTED | 35 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L2 | | 40 | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| L3 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L1 | | | | | |
+-------+----------+----------+----------+--------+----------+
| L4 | | NOT_DONE | | 30 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
Solution
I am required to use python 2.7, I am also using seaborn for the plotting.
Tested on python 2.7.18 and pandas 0.24.2 (though it works fine in python 3):
Propagate the right-most values (ignoring
PLACE
) usingffill
alongcolumns
:df['TST'] = df.drop(columns='PLACE').ffill(axis='columns').iloc[:, -1]
Group by
PLACE
and get theirvalue_counts
:data = df.groupby('PLACE')['TST'].value_counts().reset_index(name='COUNT') # PLACE TST COUNT # 0 L1 35 1 # 1 L1 50 1 # 2 L2 INCOMP 2 # 3 L2 50 1 # 4 L3 INCOMP 2 # 5 L3 40 1 # 6 L3 NOT_DONE 1 # 7 L4 30 1 # 8 L4 NOT_DONE 1 # 9 L4 UNTESTED 1
Then pass this
data
intocatplot
(use theorder
param to set your preferred tick order):incompletes = ['NOT_DONE', 'INCOMP', 'UNTESTED'] ticks = incompletes + sorted(data.TST.unique())[:len(incompletes)] g = sns.catplot(x='TST', y='COUNT', col='PLACE', col_wrap=2, data=data, order=ticks, kind='bar') g.set_xticklabels(rotation=90)
Versions:
>>> sys.version
2.7.18 (default, Mar 15 2021, 14:29:03) \n[GCC 10.2.0]
>>> pandas.__version__
0.24.2
>>> matplotlib.__version__
2.2.5
>>> seaborn.__version__
0.9.1
Answered By - tdy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.