Issue
I want to look at differences across columns (or rows) in a large dataframe. For rows I used the diff() method, but I cannot find the equivalent for columns.
This is a workaround
df.transpose().diff().transpose()
Is there a more efficient alternative?
Solution
Pandas DataFrames are excellent for manipulating table-like data whose columns have different dtypes.
If subtracting across columns and rows both make sense, then it means all the values are the same kind of quantity. That might be an indication that you should be using a NumPy array instead of a Pandas DataFrame.
In any case, you can use arr = df.values
to extract a NumPy array of the underlying data from the DataFrame. If all the columns share the same dtype, then the NumPy array will have the same dtype. (When the columns have different dtypes, df.values
has object
dtype).
Then you can compute the differences along rows or columns using np.diff(arr, axis=...)
:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(12).reshape(3,4), columns=list('ABCD'))
# A B C D
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
np.diff(df.values, axis=0) # difference of the rows
# array([[4, 4, 4, 4],
# [4, 4, 4, 4]])
np.diff(df.values, axis=1) # difference of the columns
# array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])
Answered By - unutbu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.