Issue
my input:
mydata = (pd.DataFrame({'xyz':['Label','NG-ICV','NG-ICV','NG-ICV','ICV','ICV','ICV','NG-ICV','NG-ICV','Label',],
'foo':['Label','Label','Label','Label','Label','Label','Label','Label','Label','Label'],
'bar':['Label','Label','Label','Label','NG-ICV','NG-ICV','ICV','ICV','NG-ICV','NG-ICV']}))
My necessary code to calculate ratio between two patterns and finding in which column is a biggest ratio:
patNG = "NG-ICV"
pat = "ICV"
x=mydata.apply(lambda x: x.str.contains(patNG, regex=True).sum(), axis=0)
y=mydata.apply(lambda x: x.str.contains(pat, regex=True).sum(), axis=0)
z=x/y
z=z.dropna()
mask=z.index.values.tolist()
Now I want select row with index, within two condition column name from mask
and select only rows where patNG
and pat
is True(where row value equal to 'ICV' or 'NG-ICV'). So I expect for example for column xyz
like this:
xyz
4 NG-ICV
5 NG-ICCV
6 ICV
7 ICV
8 NG-ICV
9 NG-ICV
My code:
mydata.loc[ :,[mask[0] in i for i in mydata.columns] ]
And here I am stuck, how add one more condition where values of row also equal to my two patterns('pat' and 'patNG')?
P.S. I must to use regex and pattern, because my data is not one label such ICV its long string with other labels.
Solution
Following your logic, you should keep the initial detection of pattern before you sum. Then apply a mask on the detected columns to select your rows:
X = mydata.apply(lambda x: x.str.contains(patNG, regex=True), axis=0)
Y = mydata.apply(lambda x: x.str.contains(pat, regex=True), axis=0)
x = X.sum()
y = Y.sum()
z = (x/y).dropna()
mask = z.index.values.tolist()
mydata.loc[(X|Y)[mask].all(1)]
output:
xyz foo bar
4 ICV Label NG-ICV
5 ICV Label NG-ICV
6 ICV Label ICV
7 NG-ICV Label ICV
8 NG-ICV Label NG-ICV
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.