Issue
I have a very complex situation to append the rows with one agg.
function of sum of population
based on different columns. Find below:
Please make sure I have multiple rows in all column
such as "year"
in range (2019,2040)
& "con"
have mulitple countries.
import pandas as pd
d = { 'year': [2019,2020,2019,2020,2019,2020], 'age': [10,10,20,20,30,30], 'con': ['UK','UK','UK','US','US','US'],'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df
year age con population
2019 10 UK 1
2020 10 UK 2
2021 20 UK 300
2019 20 US 400
2020 30 US 1000
2021 30 US 2000
output required:
year age con population
2019 10 UK 1
2020 10 UK 2
2019 10 UK 300
2020 20 US 400
2019 20 US 1000
2020 20 US 2000
2019 10-20 UK child 301 #addition of row 1 + row 3
2020 10-20 UK child 402 #addition of 1+2
2019 20-30 UK teen 1000+ age30 population
I am looking for a loop function so I apply on con
col
I am trying, FAILED!!!
variable_list = ['UK', 'US']
ranges = [[0,10], [10,20], [20,30]]
categories = ["Child", "teen", "work"]
year = [x for x in range(2019,2022)]
q = df#df.loc[(df["Kategorie 1"].str.strip()==BASE)]
q["age2"] = pd.to_numeric(q["age"])
sums_years = {}
for variable in variable_list:
c = 0
u = q.loc[q["cat2"]==variable]
for r in ranges:
cat = "Germany: " + categories[c]
for year in date:
group = str(r[0])+'-'+str(r[1])
n = variable + "_" + group
if n not in sums_years:
sums_years[n] = {}
s = u.loc[(u['year']==year) & (u["age"]>=r[0]) & (u["age"]<=r[1]), 'population'].sum()
```
and also like for one condition
df_uk = df[df.con=='UK'].reset_index(drop=True)
div =['child','teen','working']
c = [div[i] for i in range(len(df_uk))] #list to get element from div
y = [i+2018 for i in range(1,len(df_uk)+1)] #list of 2019,2020,2021
x = [[[0,10], [10,20], [20,30]] for i in range(1,len(df_uk)+1)]
d={'year':y, 'age':x, 'con':c, 'population': (df_uk['value'] + #adds_something).values}
df_new = pd.DataFrame(data=d)
df = pd.concat([df, df_new], ignore_index=True)
sorry if its a mess.. I asked people but no help... I am sure there can be easy and better loop function. Please Help!!!! Is there any better way to melt the dataframe and do all calcuation.. or to restructure the dataframe.
Solution
d = { 'year': [2019,2020,2021,2020,2019,2021],
'age': [10,20,30,10,20,30],
'con': ['UK','UK','UK','US','US','US'],
'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df2 = df.copy()
criteria = [df2['age'].between(0, 10),
df2['age'].between(11, 20),
df2['age'].between(21, 30)]
values = ['child', 'teen', 'work']
df2['con'] = df2['con']+'_'+np.select(criteria, values, 0)
df2['population'] = df.groupby(['con', 'age']).sum()\
.groupby(level=0).cumsum()\
.reset_index()['population']
final = pd.concat([df, df2])
Answered By - tzinie
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.