Issue
I wanted to split data values if \n was present in a column. This is a snippet of my original data column (note I casted this column to a object
datatype):
0 881567905
1 881046000
2 881046025
3 882935053
4 881006805
5 882130610
6 882036810
7 882428300
8 882428400
9 884343355\n183055900
I split the data on '\n'
and then returned back a list anywhere the data had multiple list elements created using the np.where()
function. While it did accomplish that to a degree, it also created random Nan
values.
0 881567905
1 881046000
2 881046025
3 NaN
4 NaN
5 NaN
6 NaN
7 882428300
8 882428400
9 [884343355, 183055900]
As you can see, there is not really any length, datatype, or structural difference in the values that did not transform and the ones that were replaced with NaN. The code I used to split and replace was:
file_no = df['file_no'].str.split("\n")
df['file_no'] = np.where(file_no.str.len()==1,file_no.str[0],file_no)
I used it on other columns that were very similarly structured and it did not create these NaN
values. I also reloaded my environment in case I had messed a smaller step up but the only code prior to this was this:
df = r'Z:\clients.xlsx'
df = pd.read_excel(path,sheet_name="Master List",header=0,engine="openpyxl")
df = df.rename(columns={'Our File #':'file_no', 'ID #':'ID'})
df = df.astype({'file_no':'object'})
df = df[df.file_no.notnull()]
Does anyone have any ideas why these NaN
values may be replacing those pandas values?
Solution
I think you're being caught by the fact that regex
is True
by default when the splitter is more than a single character on pd.Series.str.split
.
# Force conversion to strings
df.file_no = df.file_no.astype(str)
# Split on `\\n` with regex=False - you may actually just need `\n`
df.file_no = df.file_no.str.split("\\n", regex=False)
# Your np.where was fine, but can be simplified with `pd.Series.mask`
df.file_no = df.file_no.mask(df.file_no.str.len().eq(1), df.file_no.str[0])
print(df)
Output:
file_no
0 881567905
1 881046000
2 881046025
3 882935053
4 881006805
5 882130610
6 882036810
7 882428300
8 882428400
9 [884343355, 183055900]
Answered By - BeRT2me
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.