Issue
I have the following working code. I need to add a percentage column to monitor changes. I dont know much on how to do it in pandas. I need ideas on what part needs to be modified.
import pandas as pd
dl = []
with open('sampledata.txt') as f:
for line in f:
parts = line.split()
# Cleaning data here.. Conversions to int/float etc,
if not parts[3][:2].startswith('($'):
parts.insert(3,'0')
if len(parts) > 5:
temp = ' '.join(parts[4:])
parts = parts[:4] + [temp]
parts[1] = int(parts[1])
parts[2] = float(parts[2].replace(',', ''))
parts[3] = float(parts[3].strip('($)'))
dl.append(parts)
headers = ['col1', 'col2', 'col3', 'col4', 'col5']
df = pd.DataFrame(dl,columns=headers)
df = df.groupby(['col1','col5']).sum().reset_index()
df = df.sort_values('col2',ascending=False)
df['col4'] = '($' + df['col4'].astype(str) + ')'
df = df[headers]
print(df)
sampledata.txt #-- Sample Data Source file
alpha 1 54,00.01 ABC DSW2S
bravo 3 500,000.00 ACDEF
charlie 1 27,722.29 ($250.45) DGAS-CAS
delta 2 11 ($10) SWSDSASS-CCSSW
echo 5 143,299.00 ($101) ACS34S1
lima 6 45.00181 ($38.9) FGF5GGD-DDD
falcon 3 0.1234 DSS2SFS3
echo 8 145,300 ($125.01) ACS34S1
charlie 10 252,336,733.383 ($492.06) DGAS-CAS
romeo 12 980 ASDS SSSS SDSD
falcon 5 9.19 DSS2SFS3
Current Output: #-- working result
col1 col2 col3 col4 col5
4 echo 13 2.885990e+05 ($226.01) ACS34S1
7 romeo 12 9.800000e+02 ($0.0) ASDS SSSS SDSD
2 charlie 11 2.523645e+08 ($742.51) DGAS-CAS
5 falcon 8 9.313400e+00 ($0.0) DSS2SFS3
6 lima 6 4.500181e+01 ($38.9) FGF5GGD-DDD
1 bravo 3 5.000000e+05 ($0.0) ACDEF
3 delta 2 1.100000e+01 ($10.0) SWSDSASS-CCSSW
0 alpha 1 5.400010e+03 ($0.0) ABC DSW2S
Improved Output: #-- with Additional Column for %
col1 col2 col3 col4 col5 col6
4 echo 13 2.885990e+05 ($226.01) ACS34S1 60% #-- (5 + 8) = 13
7 romeo 12 9.800000e+02 ($0.0) ASDS SSSS SDSD 0%
2 charlie 11 2.523645e+08 ($742.51) DGAS-CAS 900% #-- (1 + 10) = 11
5 falcon 8 9.313400e+00 ($0.0) DSS2SFS3 66.67% #-- (3 + 5) = 8
6 lima 6 4.500181e+01 ($38.9) FGF5GGD-DDD 0%
1 bravo 3 5.000000e+05 ($0.0) ACDEF 0%
3 delta 2 1.100000e+01 ($10.0) SWSDSASS-CCSSW 0%
0 alpha 1 5.400010e+03 ($0.0) ABC DSW2S 0%
Solution
You can add the following lines just after your code: The function compute_percentage() is using the list variable dl.
def compute_percentage(row):
vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
if float(int(i)) == i:
i = int(i)
return str(i) + '%'
df['col6'] = df.apply(compute_percentage, axis=1)
Output:
col1 col2 col3 col4 col5 col6
4 echo 13 2.885990e+05 ($226.01) ACS34S1 60%
7 romeo 12 9.800000e+02 ($0.0) ASDS SSSS SDSD 0%
2 charlie 11 2.523645e+08 ($742.51) DGAS-CAS 900%
5 falcon 8 9.313400e+00 ($0.0) DSS2SFS3 66.67%
6 lima 6 4.500181e+01 ($38.9) FGF5GGD-DDD 0%
1 bravo 3 5.000000e+05 ($0.0) ACDEF 0%
3 delta 2 1.100000e+01 ($10.0) SWSDSASS-CCSSW 0%
0 alpha 1 5.400010e+03 ($0.0) ABC DSW2S 0%
Answered By - Eric Marchand
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.