Issue
I tried insert a data into a specific locations in dataframe with two options.
Option 1 uses fixed colum label and variable index label and
Option 2 uses fixed index label and variable colum label and
then Option 1 has no error but Option 2 has warning
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
Option 1 : no warning
import pandas as pd
df=pd.DataFrame()
for col in range(200):
df.loc[str(col),'A']=str(col)
Option 2 : warning
df1=pd.DataFrame()
for col in range(200):
df1.loc['A',str(col)]=str(col)
Solution
Pandas is really not designed to add/insert data repeatedly in a loop. This creates many expensive intermediates.
Rather loop with a low level object (list/dictionary) and construct the DataFrame once in the end.
First code:
df = pd.DataFrame({'A': {str(i):i for i in range(200)}})
Second code:
df = pd.DataFrame.from_dict({'A': {str(i):i for i in range(200)}}, orient='index')
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.