Issue
import pandas as pd, numpy as np
d = [{'Cabin': 'F G13'},{'Cabin': 'A32 A45'},{'Cabin': 'F23 F36'},{'Cabin': 'B24'},{'Cabin': nan}]
df = pd.DataFrame(d)
def deck_list(row):
if row['Cabin']!=row['Cabin']:
cabinid = 'NONE'
else:
cabinsubstr = row['Cabin'].split(' ')
for i in cabinsubstr:
if i.find('F ') != -1:
cabinid = i[0][0]
break
if i.find('F ') == 0:
cabinid = i[1][0]
break
return cabinid
df['Deck_ID'] = df.apply(deck_list, axis=1)
Am I missing something? I've written something akin to this plenty of times and I've never gotten this error but maybe it's something really stupid?
Solution
Another way to write this, using vectorized string methods is:
import pandas as pd
import numpy as np
nan = np.nan
df = pd.DataFrame([{'Cabin': 'F G13'},
{'Cabin': 'A32 A45'},
{'Cabin': 'F23 F36'},
{'Cabin': 'B24'},
{'Cabin': nan}])
cabin_parts = df['Cabin'].str.split(' ', expand=True)
conditions = [pd.isnull(df['Cabin']),
df['Cabin'].str.startswith('F').astype(bool),
~df['Cabin'].str.contains('F').astype(bool)]
choices = [None,
cabin_parts[1].str[0],
cabin_parts[0].str[0]]
df['Deck_ID'] = np.select(conditions, choices)
which yields
Cabin Deck_ID
0 F G13 G
1 A32 A45 A
2 F23 F36 F
3 B24 B
4 NaN None
Alternatively, if I understand the Cabin
--> Deck_ID
naming pattern correctly, perhaps
df['Deck_ID'] = df['Cabin'].str.extract(r'(\D\d*)?\s*(\D\d+)', expand=True)[1].str[0]
would suffice, since
In [86]: df['Cabin'].str.extract(r'(\D\d*)?\s*(\D\d+)', expand=True)
Out[86]:
0 1
0 F G13
1 A32 A45
2 F23 F36
3 NaN B24
4 NaN NaN
The regex pattern (\D\d*)?\s*(\D\d+)
has the following meaning:
(\D\d*)? first capturing group: 0-or-1 (nondigit followed by 0-or-more digits)
\s* 0-or-more whitespace
(\D\d+) second capturing group: (nondigit followed by 1-or-more digits)
Answered By - unutbu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.