Issue
I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:
id var1 var2 newvar1 var3 var4 newvar2
1 x y dt None f None
Dataframe that I want:
id var1 var2 newvar1 var4 newvar2
1 x y dt f None
I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:
for i in range(1,300):
df.drop(df.loc[df['var'+str(i)] == 'None' ].index, inplace=True)
Error obtained:
KeyError: 'var208'
I also tried:
df.drop(df.loc[df['var'+str(i) for i in range(1,300)] == 'None'].index, inplace=True)
SyntaxError: invalid syntax
Could anyone help me improve my code?
Solution
Your error occurs because you have no column with that name.
You can use df.columns to get a list of available columns, check if the name .startswith("var")
and use df[col].isnull().all()
to check if all values are None.
import pandas as pd
df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
data=[[1, "x", "y", "dt", None, "f", None]])
df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)
Answered By - bitflip
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.