Issue
I have a dataframe like this:
data = {
'name': ['101 blueberry 2023', '102 big cat 2023', '103 small white dog 2023'],
'number': [116, 118, 119]}
df = pd.DataFrame(data)
df
output:
name number
0 101 blueberry 2023 116
1 102 big cat 2023 118
2 103 small white dog 2023 119
I would like to change the first and last numbers in the name
column. For example, the first number in name
to the number in the number
column, and the last number in name
to '2024'. So finally it would look like:
name number
0 116 blueberry 2024 116
1 118 big cat 2024 118
2 119 small white dog 2024 119
I have tried splitting name
into a list and changing the first and last elements of the list.
df['name_pieces'] = df['name'].split(' ')
df
output:
name number name_pieces
0 101 blueberry 2023 116 [101, blueberry, 2023]
1 102 big cat 2023 118 [102, big, cat, 2023]
2 103 small white dog 2023 119 [103, small, white, dog, 2023]
I can access the first item of the lists using .str
, but I cannot change the item.
df['name_pieces'].str[0]
output:
0 101
1 102
2 103
but trying to assign the first value of the list gives an error
df['name_pieces'].str[0] = df['number']
output:
TypeError: 'StringMethods' object does not support item assignment
How can I replace the first and last value of name
inside this dataframe?
Solution
Don't bother with the lists. You can just extract the part of the strings you want and join the other parts.
df.assign(name=
df['number'].astype(str)
+ df['name'].str.extract(r'( .* )', expand=False)
+ '2024'
)
name number
0 116 blueberry 2024 116
1 118 big cat 2024 118
2 119 small white dog 2024 119
This regex gets the longest part of the string surrounded by spaces, i.e the part between the first space and last space.
Here's a variation if you'd rather think about name
primarily:
df.assign(name=
df['name'].str.extract(r'( .* )', expand=False)
.radd(df['number'].astype(str))
.add('2024')
)
Answered By - wjandrea
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.