Issue
Suppose I want to build a pandas dataframe using multiple indexing.
I start defining the expected columns of the dataframe:
df = pd.DataFrame(columns=["val",])
Then I build some entries and their indexing:
for j in range(1,5):
tuples = [(str(j), i) for i in range(10)]
vals = [0,1,2,3,j,j,4,4,1,1]
At each iteration of the for loop I would like to update the dataframes with the new values. The method _append
does not seem to support indexes specification and I've read that the .loc
method is much more efficient.
So I was trying something like:
for i2, el in enumerate(tuples):
df.loc[el] = vals[i2] #el is a tuple
But this is not working as I expected: If I try to execute the command with a single multi index and a single value, similar to:
df.loc[('1', 3)] = 4
I get a dataframe that looks like:
val 3
1 NaN 4.0
whereas I was expecting something like:
val
1 3 4.0
How to specify the value for a multiindex in a pandas dataframe?
Solution
The parentheses in df.loc[('1', 3)]
don't make it a MultiIndex. In fact it's equivalent to df.loc['1', 3]
, meaning row '1'
, column 3
.
You would need to use:
df.loc[('1', 3), 'val'] = 4
But the Index cannot be altered dynamically.
You must define the MultiIndex from the beginning:
df = pd.DataFrame(columns=["val",],
index=pd.MultiIndex(levels=[[], []], codes=[[], []]))
df.loc[('1', 3), 'val'] = 4
Output:
val
1 3 4
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.