Sunday, October 10, 2021

[FIXED] How to apply for loop to append rows based on multiple calculation?

October 10, 2021 dataframe, for-loop, function, pandas, python No comments

Issue

I have a very complex situation to append the rows with one agg. function of sum of population based on different columns. Find below:

Please make sure I have multiple rows in all column such as "year" in range (2019,2040) & "con" have mulitple countries.

import pandas as pd
d = { 'year': [2019,2020,2019,2020,2019,2020], 'age': [10,10,20,20,30,30], 'con': ['UK','UK','UK','US','US','US'],'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df

 year   age con population
 2019   10  UK      1
 2020   10  UK      2
 2021   20  UK      300
 2019   20  US      400
 2020   30  US      1000
 2021   30  US      2000

output required:

year    age   con     population
2019    10     UK          1
2020    10     UK          2
2019    10     UK          300
2020    20     US          400
2019    20     US          1000
2020    20     US          2000
2019    10-20  UK child    301         #addition of row 1 + row 3  
2020    10-20  UK child    402         #addition of 1+2
2019    20-30  UK teen     1000+ age30 population

I am looking for a loop function so I apply on con col

I am trying, FAILED!!!

variable_list = ['UK', 'US']
ranges = [[0,10], [10,20], [20,30]]


categories = ["Child", "teen", "work"]

year = [x for x in range(2019,2022)]

q = df#df.loc[(df["Kategorie 1"].str.strip()==BASE)]
q["age2"] = pd.to_numeric(q["age"])

sums_years = {}


                   
for variable in variable_list:
  c = 0
  u = q.loc[q["cat2"]==variable]  
  for r in ranges:
    cat = "Germany: " + categories[c]
    for year in date:
      group = str(r[0])+'-'+str(r[1])
      n = variable + "_" + group
      if n not in sums_years:
        sums_years[n] = {}

      s = u.loc[(u['year']==year) & (u["age"]>=r[0]) & (u["age"]<=r[1]), 'population'].sum()
     ```

and also like for one condition

df_uk = df[df.con=='UK'].reset_index(drop=True)
div =['child','teen','working']
c = [div[i] for i in range(len(df_uk))] #list to get element from div
y = [i+2018 for i in range(1,len(df_uk)+1)] #list of 2019,2020,2021
x = [[[0,10], [10,20], [20,30]] for i in range(1,len(df_uk)+1)]

d={'year':y, 'age':x, 'con':c, 'population': (df_uk['value'] + #adds_something).values}

df_new = pd.DataFrame(data=d)

df = pd.concat([df, df_new], ignore_index=True)

sorry if its a mess.. I asked people but no help... I am sure there can be easy and better loop function. Please Help!!!! Is there any better way to melt the dataframe and do all calcuation.. or to restructure the dataframe.

Solution

d = { 'year': [2019,2020,2021,2020,2019,2021], 
      'age': [10,20,30,10,20,30], 
      'con': ['UK','UK','UK','US','US','US'],
      'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df2 = df.copy()

criteria = [df2['age'].between(0, 10), 
            df2['age'].between(11, 20), 
            df2['age'].between(21, 30)]

values = ['child', 'teen', 'work']

df2['con'] = df2['con']+'_'+np.select(criteria, values, 0)
df2['population'] = df.groupby(['con', 'age']).sum()\
                      .groupby(level=0).cumsum()\
                      .reset_index()['population']

final = pd.concat([df, df2])

Answered By - tzinie

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 10, 2021

[FIXED] How to apply for loop to append rows based on multiple calculation?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels