Issue
I'm concatenating two dataframes, so I want to one dataframe is located to another. But first I did some transformation to initial dataframe:
scaler = MinMaxScaler()
real_data = pd.DataFrame(scaler.fit_transform(df[real_columns]), columns = real_columns)
And then concatenate:
categorial_data = pd.get_dummies(df[categor_columns], prefix_sep= '__')
train = pd.concat([real_data, categorial_data], axis=1, ignore_index=True)
I dont know why, but number of rows increased:
print(df.shape, real_data.shape, categorial_data.shape, train.shape)
(1700645, 23) (1700645, 16) (1700645, 130) (1703915, 146)
What happened and how fix the problem?
As you can see number of columns for train equals to sum of columns real_data and categorial_data
Solution
The problem is that sometimes when you perform several operations on a single dataframe object, the index persists in the memory. So using df.reset_index() will solve your problem.
Answered By - saket ram
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.