Issue
I have DataFrame with 20000 rows and 1600 columns. Each row represent an observed object and each column is a date. Example:
df2 = pd.DataFrame(np.array([[1, 2, 3, 4, 5], [6, np.NaN, np.NaN, np.NaN, 10], [np.NaN, np.NaN, 14, 13, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]),
columns=['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04', '2016-01-05'],
index=[1, 2, 3, 4, 5])
I want to get new DataFrame, which should include elements of .describe() function and couple more (first value, last value and number of observations / number of dates since first observation
I've made this:
for i in df2.index:
df[i] = df2.T[i].describe()
But it is very slow, so I am looking for some faster solutions and help with other columns
Expected result is
count mean std min max first_v last_v density
1 5 3 1.581139 1 5 1 5 1
2 2 8 2.828427 6 10 6 10 0.4
3 3 14 1.000000 13 15 14 15 1
4 5 18 1.581139 16 20 16 20 1
5 5 23 1.581139 21 25 21 25 1
Solution
Instead of your loop just use:
df = df2.T.describe()
Answered By - bitflip
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.