Issue
In python, I have a dataset like this below, where column1
and column2
are objects and not strings:
data = {'id': ['first_value', 'first_value', 'second_value', 'third_value'],
'column1': [a0, a0 a1, a2, a87],
'column2': [a0=4, a0=2;a1=8;a7=9, a2=0, a33=9]
}
And I want to keep on 'column2
' the values that:
a. are also in 'column1'
.
b. the values that are bigger than zero after the "=" sign on 'column2'
(so, for example, 'a2=0'
will be removed from the result because is zero (...=0)
, while a1=8
will stay because it's 8 (...=8)
)
Therefore in this case my result would be:
data = {'id': ['first_value', 'first_value'],
'column1': ['aO', 'a0;a1'],
'column2': ['a0=4', 'a0=2;a1=8']
}
I have tried many times but I just cannot find how to :(
Any help is appreciated!
Solution
You can try apply
on rows
def filter_column(row):
keys = row['column1'].split(' ')
dicts = dict([kv.split('=') for kv in row['column2'].split(';')])
lists = [f'{k}={v}' for k, v in dicts.items()
if k in keys and int(v) > 0]
return ';'.join(lists)
df['column3'] = df.apply(filter_column, axis=1)
print(df)
id column1 column2 column3
0 first_value a0 a0=4 a0=4
1 first_value a0 a1 a0=2;a1=8;a7=9 a0=2;a1=8
2 second_value a2 a2=0
3 third_value a87 a33=9
------------
print(df[df['column3'].ne('')])
id column1 column2 column3
0 first_value a0 a0=4 a0=4
1 first_value a0 a1 a0=2;a1=8;a7=9 a0=2;a1=8
Answered By - Ynjxsjmh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.