Issue
I have this string:
'Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00'
The first thing I need to do is separating values into the following form:
Storage:9.22 Checkoff:6.90 In_Elevation:0.00 Out_Charge:0.00 Freight_other:0.00
I will be looping through multiple rows with similar values, so I will have to make sure as soon as I see the name (and is unique) I create a new column and assign the value I found for that specific row, so at the end it should look something like this:
----------------------------------------------------------------
| Storage| Checkoff | In_Elevation | Out_Charge | Freight_other|
---------------------------------------------------------------
| 9.22 | 6.90 | 0.00 | 0.00 | 0.00 |
----------------------------------------------------------------
I've been using a couple of examples at least to start separating the string but it does not give me what I really need:
This is one:
word = ""
value = ""
for i in range(0, len(df['Original'])):
for j in df['Original'][i]:
if j.isalpha():
word = word + j
elif j.isdecimal():
value = value + j
elif j.isascii():
#print(j)
None
But this is the result:
StorageCheckoffInElevationOutChargeFreightotherStorageCheckoffMiscellaneousChargesPremiumFreightStorageCheckoffOptionPremiumsforMinimumPriceContractsFITRUCKDiscountsFORAILCarryCostStorageCheckoffFreightInElevationOutChargeFreightotherStorageCheckoffFreightWeighingChgsFORAILCheckoffInElevationOutChargeFreightotherStorageCheckoffFreightMiscellaneousChargesStorageCheckoffInElevationOutChargeFreightotherInElevationOutChargeDiscountsFreightother
922690000000000061014372018602158602167642563191927552232584968331307341840509672628262873068122661185213241367192248181900000000074061234124424074596189800000000000000016635000
and for the columns added to the dataframe I'm using this code snippet:
cols = [i for i in new[0].unique()]
df1 = pd.DataFrame( index=range(len(cols)), columns=cols)
df1
Which might work, but I still need the separation of the string correctly. None of the methods I have use really seems to give me the desirable output. When I use regex, the program separates words from values, but then there is no way to map which value correspond to what word.
As always, any hint or suggestion will be greatly appreciated.
Solution
Use Series.str.extractall
with capturing groups to get the word and the numeric value (allowing for parenthesis to indicate negative values), which are separated by a colon. Then, pivot
this DataFrame into the appropriate format. Since the extract pairs labels with values, they can even occur out of order in separate strings, like in the sample I created below.
Sample data
import pandas as pd
s = pd.Series(['Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00',
'Checkoff:6.97Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55',
'Checkoff:(2.00)Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55'])
Code
df = s.str.extractall(r'(.*?):([\(\)0-9.]+)').reset_index()
df = df.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None)
print(df)
# Checkoff Freightother InElevation OutCharge Storage
#0 6.90 0.00 0.00 0.00 9.22
#1 6.97 56.55 0.00 10.00 19.22
#2 (2.00) 56.55 0.00 10.00 19.22
Answered By - ALollz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.