Friday, January 28, 2022

[FIXED] separating a string that contains decimals and words and creating columns from the unique values in that string using Pandas/Python

January 28, 2022 dataframe, pandas, python-3.x No comments

Issue

I have this string:

'Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00'

The first thing I need to do is separating values into the following form:

Storage:9.22 Checkoff:6.90 In_Elevation:0.00 Out_Charge:0.00 Freight_other:0.00

I will be looping through multiple rows with similar values, so I will have to make sure as soon as I see the name (and is unique) I create a new column and assign the value I found for that specific row, so at the end it should look something like this:

----------------------------------------------------------------
| Storage| Checkoff | In_Elevation | Out_Charge | Freight_other| 
---------------------------------------------------------------
|  9.22  |  6.90    |    0.00      |   0.00     |   0.00       |
----------------------------------------------------------------

I've been using a couple of examples at least to start separating the string but it does not give me what I really need:

This is one:

word = ""
value = ""
    
for i in  range(0, len(df['Original'])):
    for j in df['Original'][i]:
        if j.isalpha():
            word = word + j
        elif j.isdecimal():
            value = value + j
        elif j.isascii():
            #print(j)
            None

But this is the result:

StorageCheckoffInElevationOutChargeFreightotherStorageCheckoffMiscellaneousChargesPremiumFreightStorageCheckoffOptionPremiumsforMinimumPriceContractsFITRUCKDiscountsFORAILCarryCostStorageCheckoffFreightInElevationOutChargeFreightotherStorageCheckoffFreightWeighingChgsFORAILCheckoffInElevationOutChargeFreightotherStorageCheckoffFreightMiscellaneousChargesStorageCheckoffInElevationOutChargeFreightotherInElevationOutChargeDiscountsFreightother
922690000000000061014372018602158602167642563191927552232584968331307341840509672628262873068122661185213241367192248181900000000074061234124424074596189800000000000000016635000

and for the columns added to the dataframe I'm using this code snippet:

cols = [i for i in new[0].unique()]
df1 = pd.DataFrame( index=range(len(cols)), columns=cols)
df1

Which might work, but I still need the separation of the string correctly. None of the methods I have use really seems to give me the desirable output. When I use regex, the program separates words from values, but then there is no way to map which value correspond to what word.

As always, any hint or suggestion will be greatly appreciated.

Solution

Use Series.str.extractall with capturing groups to get the word and the numeric value (allowing for parenthesis to indicate negative values), which are separated by a colon. Then, pivot this DataFrame into the appropriate format. Since the extract pairs labels with values, they can even occur out of order in separate strings, like in the sample I created below.

Sample data

import pandas as pd
s = pd.Series(['Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00',
               'Checkoff:6.97Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55',
               'Checkoff:(2.00)Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55'])

Code

df = s.str.extractall(r'(.*?):([\(\)0-9.]+)').reset_index()
df = df.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None)

print(df)
#  Checkoff Freightother InElevation OutCharge Storage
#0     6.90         0.00        0.00      0.00    9.22
#1     6.97        56.55        0.00     10.00   19.22
#2   (2.00)        56.55        0.00     10.00   19.22

Answered By - ALollz

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, January 28, 2022

[FIXED] separating a string that contains decimals and words and creating columns from the unique values in that string using Pandas/Python

Issue

Solution

Sample data

Code

0 comments:

Post a Comment

Popular Posts

Labels