Issue
I have three dfs related to the order from customers, and some steps for preprocessing of that. So, in one df I define some units in the gramaz df. which if we have that in the order, we should delete them. Like, if we have the pepper in the order, we should change that to:
pepper kg--> pepper
But, some time I have an order which assigns a number before the units.
pepper 100 kg--> pepper
So, I want to remove that number exactly before the units. (in the order we have other numbers also, I don't want to delete them, only the number before the units are important and should be deleted). here is the dfs and here is the output I want.
gramaz = pd.DataFrame({'unit':['cc', 'lit','gr', 'kg' ]})
gramaz
unit
0 cc
1 lit
2 gr
3 kg
order = pd.DataFrame({'order':['100 glass','8 sodium 2 gr', 'br kk cc','mgk 100 lit','red pepper', 'black pepper', 'a 10 kg' ]})
order
0 100 glass
1 8 sodium 2 gr
2 br kk cc
3 mgk 100 lit
4 red pepper
5 black pepper
6 a 10 kg
the output:
order
0 100 glass
1 8 sodium
2 br kk
3 mgk
4 red pepper
5 black pepper
6 a
I tried to use the chatgpt , I can remove the units, however, I don't know how to remove the numbers!
My dfs are at order 10k rows. Thanks
Solution
You can use regular expression to replace the values, e.g.:
pat = r"\s*(?:\d+\.?\d*\s*)?\b(?:" + "|".join(gramaz["unit"]) + r")\b\s*"
order["new_order"] = order["order"].str.replace(pat, "", regex=True)
print(order)
Prints:
order new_order
0 100 glass 100 glass
1 8 sodium 2 gr 8 sodium
2 br kk cc br kk
3 mgk 100 lit mgk
4 red pepper red pepper
5 black pepper black pepper
6 a 10 kg a
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.