Wednesday, July 13, 2022

[FIXED] How to replace None only with empty string using pandas?

July 13, 2022 pandas, python No comments

Issue

the code below generates a df:

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None (real None in python, not str) inside with ''(empty string).

The expected df is

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+

what I did is

df = df.replace([None], [''], regex=True)

But I got

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+

all the dates becomes big numbers
Even NaT and NaN are replaced, which I don't want.

How can I achieve that correctly and efficently?

Solution

It looks like None is being promoted to NaN and so you cannot use replace like usual, the following works:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the None values using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

Answered By - EdChum

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 13, 2022

[FIXED] How to replace None only with empty string using pandas?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels