Issue
i have one question: I have a big dataframe with over 1000 columns.
For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, ...2035
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
I want to sort all headers into the following:
2019 Material Cost, 2020 Material cost, 2021 Material Cost,...,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, ... ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.
Any help here? Thanks in advance
Solution
Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).
costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df = df.reindex(columns=columns)
For example:
df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df.reindex(columns=columns).columns
Returns
Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')
Answered By - T C Molenaar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.