Issue
I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.
import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
['Mar-2017', 1, 2],
['Apr-2017', 1, 2],
['May-2017', 1, 2]],
columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes
Out:
date_str object
col1 int64
col2 int64
date_col datetime64[ns]
dtype: object
Solution
Pandas has a cool function called select_dtypes
, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64
. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
, to filter by numerical columns only: [np.number]
.
df.select_dtypes(include=[np.datetime64])
Out:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In:
df.select_dtypes(include=[np.number])
Out:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
Answered By - Charlie Haley
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.