Issue
I have a Pandas dataframe in Python (3.6) with numeric and categorical attributes. I want to pull a list of numeric columns for use in other parts of my code. My question is what is the most efficient way of doing this?
This seems to be the standard answer:
num_cols = df.select_dtypes([np.number]).columns.tolist()
But I'm worried that select_dtypes()
can be slow and this seem to add a middle step that I'm hoping isn't necessary (subsetting the data before pulling back the column names of just the numeric attributes).
Any ideas on a more efficient way of doing this? (I know there is a private method _get_numeric_data()
that could also be used, but wasn't able to find out how that works and I don't love using a private method as a long-term solution).
Solution
df.select_dtypes
is for selecting data, it makes a copy of your data, which you essentially discard, by then only selecting the columns. This is an inefficent way. Just use something like:
df.columns[[np.issubdtype(dt, np.number) for dt in df.dtypes]]
Answered By - juanpa.arrivillaga
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.