Issue
I have the following Pandas DF:
A B
0 0.0 114422.0
1 99997.0 174382.0
2 0.0 24863.0
3 0.0 91559.0
4 0.0 94248.0
5 0.0 66020.0
6 0.0 61543.0
7 0.0 69267.0
8 0.0 6253.0
9 0.0 93002.0
10 0.0 13891.0
11 0.0 49261.0
12 0.0 20050.0
13 0.0 24710.0
14 0.0 10034.0
15 0.0 24508.0
16 0.0 18249.0
17 0.0 50646.0
18 0.0 150033.0
19 0.0 68424.0
20 0.0 125526.0
21 0.0 110526.0
22 40000.0 217450.0
23 0.0 75543.0
24 145000.0 305310.0
25 12000.0 98583.0
26 0.0 262202.0
27 0.0 277680.0
28 0.0 101420.0
29 0.0 109480.0
30 0.0 65230.0
which I tried to normalize (columnswise) with scikit-learn's RobustScaler:
array_scaled = RobustScaler().fit_transform(df)
df_scaled = pd.DataFrame(array_scaled, columns = df.columns)
However, in the resulted df_scaled
the first column has not been scaled (or changed) at all:
A B
0 0.0 0.515555
1 99997.0 1.310653
2 0.0 -0.672042
3 0.0 0.212380
4 0.0 0.248037
5 0.0 -0.126280
6 0.0 -0.185647
7 0.0 -0.083223
8 0.0 -0.918819
9 0.0 0.231515
10 0.0 -0.817536
11 0.0 -0.348512
12 0.0 -0.735864
13 0.0 -0.674070
14 0.0 -0.868681
15 0.0 -0.676749
16 0.0 -0.759746
17 0.0 -0.330146
18 0.0 0.987774
19 0.0 -0.094401
20 0.0 0.662799
21 0.0 0.463892
22 40000.0 1.881756
23 0.0 0.000000
24 145000.0 3.046823
25 12000.0 0.305522
26 0.0 2.475190
27 0.0 2.680435
28 0.0 0.343142
29 0.0 0.450021
30 0.0 -0.136755
I do not understand this. I expect column A to be scaled (and centered) too by the interquartile range (like in case of column B). What is the explanation here?
Solution
your middle 50% of values in A
are all zero, thus the IQR as well as the overall median are both zero - effectively leading to no change when the median is removed as well as no change when the data is scaled according to the quantile range.
Answered By - Michael Hodel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.