Issue
I have data set of over 100,000 rows and 10 columns. these 10 columns should have numeric values but 1% contents in these 10 columns are alpha and alphanumeric.
how do I use FOR loop or any faster method/function to change the values of all alpha and alphanumeric cells to mean of each column or to any numeric values?
e.g. column a b c & d
a b c d
1 2 5 f5
5 e5 9 6
tg 56 8 r5
q2 4 75 g
above dataset is just an example.
I am looking for any solution you may have.
Solution
You can use pd.to_numeric
, more details here. This will make the column numeric.
You can add the key-word argument errors = 'coerce'
, which will replace unconvertible values like the ones containing alphanumeric characters with NaN
. You can then replace these NaNs with the mean value of the column later, using DataFrame.fillna
.
pd.to_numeric
only works on Series, so you would have to do it on each column, but you can also apply it to the entire DataFrame like this:
df = df.apply(pd.to_numeric, errors = "coerce")
Full example:
df = df.apply(pd.to_numeric, errors = "coerce")
df = df.fillna(df.mean())
Answered By - Alfred Rodenboog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.