Saturday, January 13, 2024

[FIXED] Change first and last elements of strings or lists inside a dataframe

January 13, 2024 dataframe, list, pandas, python No comments

Issue

I have a dataframe like this:

data = {
    'name': ['101 blueberry 2023', '102 big cat 2023', '103 small white dog 2023'],
    'number': [116, 118, 119]}
df = pd.DataFrame(data)
df

output:

                       name  number          
0        101 blueberry 2023     116         
1          102 big cat 2023     118           
2  103 small white dog 2023     119

I would like to change the first and last numbers in the name column. For example, the first number in name to the number in the number column, and the last number in name to '2024'. So finally it would look like:

                       name  number   
0        116 blueberry 2024     116          
1          118 big cat 2024     118          
2  119 small white dog 2024     119

I have tried splitting name into a list and changing the first and last elements of the list.

df['name_pieces'] = df['name'].split(' ')
df

output:

                      name  number                     name_pieces
0        101 blueberry 2023     116          [101, blueberry, 2023]
1          102 big cat 2023     118           [102, big, cat, 2023]
2  103 small white dog 2023     119  [103, small, white, dog, 2023]

I can access the first item of the lists using .str, but I cannot change the item.

df['name_pieces'].str[0]

output:

0    101
1    102
2    103

but trying to assign the first value of the list gives an error

df['name_pieces'].str[0] = df['number']

output:

TypeError: 'StringMethods' object does not support item assignment

How can I replace the first and last value of name inside this dataframe?

Solution

Don't bother with the lists. You can just extract the part of the strings you want and join the other parts.

df.assign(name=
    df['number'].astype(str)
    + df['name'].str.extract(r'( .* )', expand=False)
    + '2024'
)

                       name  number
0        116 blueberry 2024     116
1          118 big cat 2024     118
2  119 small white dog 2024     119

This regex gets the longest part of the string surrounded by spaces, i.e the part between the first space and last space.

Here's a variation if you'd rather think about name primarily:

df.assign(name=
    df['name'].str.extract(r'( .* )', expand=False)
    .radd(df['number'].astype(str))
    .add('2024')
)

Answered By - wjandrea

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 13, 2024

[FIXED] Change first and last elements of strings or lists inside a dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels