Issue
The data has many columns but the ones in question are as follows:
MR Version
GB1 Package
GB5 Package
GB9 3.5
GB5 3.3
GB1 Package
GB9 1.5
GB359 9.1
GB1 Package
GB99 5.5
...
MR (model) names are repeating and the Package
in Version column is also repeating.
I need to first access all rows with Version == Package
,
- then take their MR model name for instance
GB5
- then find all other rows with the same MR model name and
- finally check if those other rows (with same MR model name) have a value of Version column different from Package(!= Package). Those who have I need to classify as good and those who have not I need to classify as bad.
For instance, from the example data above MR model GB5
has both a Package and non Package cells hence this model is good, and model GB1 has only Package values in the version column hence it is bad.
For MRs that have only integer values in the Version column such as GB9
we do not care in this task.
Usually those entries are next to each other and there is two of ever model, usually, so I developed a loop to successfully solve my problem below by selecting every two rows from the dataframe, but now I discovered that in some cases these entries are not next to each other so I need a better solution which escapes me. Any help is greatly appreciated, Thank you all. In my code below MR is replaced by Author but it does not matter.
good_aut = []
bad_aut = []
for i, g in merged_df.groupby(merged_df.index // 2): # takes every two rows
if g.iloc[0]['Version'] == 'Package': # if row 1 is a package citation
if g.iloc[0]['Author'] == g.iloc[1]['Author']: # check if row 1 and 2 authors match
if g.iloc[1]['Version'] != 'Package': # finally check if row 2 citation is not package, hence it is GAP citation
print(g)
good_aut.append(g.iloc[0]['Author']) # if all conditions are met we add this author to the good list, once for every occurence
else:
bad_aut.append(g.iloc[0]['Author'])
else:
bad_aut.append(g.iloc[0]['Author'])
Solution
It is not clear. Do you expect Package
to be present in addition to other values?
if yes
You can groupby MR
and check if Package
is present together with other values:
def good_or_bad(s):
s=set(s)
if 'Package' in s and len(s.difference(['Package']))>0:
return 'good'
return 'bad'
df.groupby('MR')['Version'].apply(good_or_bad)
output:
MR
GB1 bad
GB359 bad
GB5 good
GB9 bad
GB99 bad
Name: Version, dtype: object
if no
You can groupby MR
and check if values other than Package
are present:
(df.groupby('MR')['Version']
.apply(lambda s: len(set(s).difference(['Package']))>0)
.map({True: 'good', False: 'bad'})
)
output:
MR
GB1 bad
GB359 good
GB5 good
GB9 good
GB99 good
Name: Version, dtype: object
I want all three possibilities
def good_or_bad(s):
s=set(s)
if len(s.difference(['Package']))>0:
if 'Package' in s:
return 'good'
return 'other'
return 'bad'
df.groupby('MR')['Version'].apply(good_or_bad)
output:
MR
GB1 bad
GB359 other
GB5 good
GB9 other
GB99 other
Name: Version, dtype: object
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.