Issue
I am trying to build a bar chart with the bars shown in a descending order.
In my code, the numpy array is a result of using SelectKmeans() to select the best features in a machine learning problem depending on their variance.
import numpy as np
import matplotlib.pyplot as plt
flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()
print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths' as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228 23.95686725 10.71979245 13.38566487 219.41737141
8.19261323 27.69341779 64.96469182 218.77495366 22.7037686 ]
So I want to show these values on my bar chart in descending order, with int_rate on top.
fimportance_sorted = np.sort(fimportance)
fimportance_sorted
array([250.14120228, 219.41737141, 218.77495366, 64.96469182,
27.69341779, 23.95686725, 22.7037686 , 13.38566487,
10.71979245, 8.19261323])
# this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()
Next I have tried this.
plt.barh([x for x in range(len(fimportance))], fimportance)
I understand I need to map these indices to the flist values somehow and then sort them. Maybe by creating an array and then mapping my list labels instead of its index. here I am stuck.
for i,v in enumerate(fimportance):
arr = np.array([i,v])
.....
Thank you for your help with this problem.
Solution
the values and indices are messed up
That's because you sorted fimportance
(fimportance_sorted = np.sort(fimportance)
), but the order of labels in flist
remained unchanged, so now labels don't correspond to the values in fimportance_sorted
.
You can use numpy.argsort
to get the indices that would put fimportance
into sorted order and then index both flist
and fimportance
with these indices:
>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
... 8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([ 8.19261323, 10.71979245, 13.38566487, 22.7037686 ,
23.95686725, 27.69341779, 64.96469182, 218.77495366,
219.41737141, 250.14120228])
idx
is the order in which you need to put elements of fimportance
to sort it. The order of flist
must match the order of fimportance
, so index both with idx
.
As a result, elements of np.array(flist)[idx]
correspond to elements of fimportance[idx]
.
Answered By - ForceBru
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.